HK
Heykuki News
Top
New
Best
Ask
Show
Jobs
Toggle theme
Top
New
Best
Ask
Show
Jobs
Request
1.
▲
Our LLM-controlled office robot can't pass butter
andonlabs.com
117 comments
8 months ago
lukaspetersson
229 points
2.
▲
We let AIs run radio stations
andonlabs.com
270 comments
a month ago
lukaspetersson
375 points
3.
▲
We gave an AI a 3 year retail lease and asked it to make a profit
andonlabs.com
286 comments
2 months ago
lukaspetersson
199 points
4.
▲
The Evolution of Bengt Betjänt
andonlabs.com
7 comments
4 months ago
lukaspetersson
54 points
5.
▲
Our AI started a cafe in Stockholm
andonlabs.com
48 comments
2 months ago
lukaspetersson
48 points
6.
▲
We gave an AI a 3 year retail lease and asked it to make a profit
andonlabs.com
discuss
2 months ago
lukaspetersson
34 points
7.
▲
We let four AIs run radio stations
andonlabs.com
2 comments
a month ago
nickoates
7 points
8.
▲
Vending-Bench: Testing long-term coherence in agents
andonlabs.com
2 comments
a year ago
tosh
5 points
9.
▲
Opus 4.6 on Vending-Bench – Not Just a Helpful Assistant
andonlabs.com
1 comment
5 months ago
lukaspetersson
5 points
10.
▲
Fable 5 on Vending-Bench: Misbehaving, with Plausible Deniability
andonlabs.com
1 comment
14 days ago
lukaspetersson
3 points
11.
▲
We let four AIs run radio stations. Here's what happened.
andonlabs.com
1 comment
a month ago
thm
3 points
12.
▲
Vending-Bench: Testing long-term coherence in agents
andonlabs.com
1 comment
a year ago
andromaton
3 points
13.
▲
Opus 4.8 on Vending-Bench: Better Alignment, Worse Performance
andonlabs.com
discuss
a month ago
tomjakubowski
3 points
14.
▲
Blueprint Bench: First signs of 3D spatial intelligence in LLMs
andonlabs.com
1 comment
2 months ago
lukaspetersson
2 points
15.
▲
Bengt Hires a Human–Towards a Happy Future with AI Employers
andonlabs.com
1 comment
4 months ago
lukaspetersson
2 points
16.
▲
Releasing Vending-Bench 2 for measuring model performance on running a business
andonlabs.com
discuss
4 months ago
lr0
2 points
17.
▲
Vending-Bench 2
andonlabs.com
discuss
4 months ago
samdung
2 points
18.
▲
Claude isn't the best Computer-use agent
andonlabs.com
discuss
a year ago
lukaspetersson
2 points
19.
▲
Gemini 3 is #1 on Vending-Bench 2
andonlabs.com
discuss
7 months ago
lukaspetersson
1 points
20.
▲
Misaligned Vending Machines [pdf]
andonlabs.com
discuss
10 months ago
bulla
1 points
21.
▲
Vending-Bench: Testing long-term coherence in agents
andonlabs.com
discuss
a year ago
vector_spaces
1 points
22.
▲
Vending-Bench: Testing long-term coherence in agents
andonlabs.com
discuss
a year ago
gdeglin
1 points
23.
▲
Claude Fable 5: mid-tier results on coding tasks
endorlabs.com
249 comments
12 days ago
bugvader
410 points