HK
Heykuki News
Top
New
Best
Ask
Show
Jobs
Toggle theme
Top
New
Best
Ask
Show
Jobs
Request
1.
▲
Show HN: Testing hypotheses through prediction is the next step towards AGI
github.com/Judahmeek
4 comments
7 months ago
judahmeek
3 points
2.
▲
Show HN: Significance-Hypothesis-Based-ARC-AGI-2-puzzle-solver
github.com/Judahmeek
1 comment
7 months ago
judahmeek
2 points
3.
▲
Measuring the impact of AI on experienced open-source developer productivity
metr.org
485 comments
a year ago
dheerajvs
775 points
4.
▲
Many SWE-bench-Passing PRs would not be merged
metr.org
153 comments
3 months ago
mustaphah
278 points
5.
▲
Measuring AI Ability to Complete Long Tasks
metr.org
193 comments
6 months ago
spicypete
247 points
6.
▲
We are changing our developer productivity experiment design
metr.org
61 comments
4 months ago
ej88
88 points
7.
▲
Measuring the Impact of AI on Experienced Open-Source Developer Productivity [pdf]
metr.org
2 comments
a year ago
ColinEberhardt
18 points
8.
▲
Measuring AI Ability to Complete Long Tasks – METR
metr.org
1 comment
a year ago
gk1
7 points
9.
▲
Measuring AI Ability to Complete Long Tasks
metr.org
discuss
a year ago
stared
4 points
10.
▲
When LLM agents can do a task, they can often do so at a fraction of human cost
metr.org
discuss
2 years ago
cpainter
4 points
11.
▲
Measuring the Self-Reported Impact of Early-2026 AI on Tech Worker Productivity
metr.org
1 comment
a month ago
willmarch
3 points
12.
▲
The Impact of Early-2025 AI on Open-Source Developer Productivity
metr.org
1 comment
9 months ago
jvdvegt
3 points
13.
▲
Measuring the Impact of AI on Experienced OSS Developer Productivity [pdf]
metr.org
1 comment
a year ago
nreece
3 points
14.
▲
We spent 2 hours working in the future
metr.org
discuss
3 months ago
gmays
3 points
15.
▲
Measuring AI Ability to Complete Long Tasks (2x every 7 months)
metr.org
discuss
9 months ago
tmoertel
3 points
16.
▲
Bounty: Diverse hard tasks for LLM agents
metr.org
discuss
2 years ago
RoboTeddy
3 points
17.
▲
AI's Version of Moore's Law
metr.org
1 comment
a year ago
aazo11
2 points
18.
▲
Frontier Risk Report (February to March 2026) – METR
metr.org
discuss
a month ago
paraschopra
2 points
19.
▲
Task-Completion Time Horizons of Frontier AI Models (Includes Opus 4.6)
metr.org
discuss
4 months ago
admp
2 points
20.
▲
Task-Completion Time Horizons of Frontier AI Models – METR
metr.org
discuss
4 months ago
rootforce
2 points
21.
▲
METR AI Benchmark: Clarifying Limitations of Time Horizon
metr.org
discuss
5 months ago
mustaphah
2 points
22.
▲
Measuring AI Ability to Complete Long Tasks
metr.org
discuss
9 months ago
Gedxx
2 points
23.
▲
Measuring AI Ability to Complete Long Tasks – METR
metr.org
discuss
a year ago
diginova
2 points
24.
▲
Measuring the Impact of Early-2025 AI on Experienced OpenSource Dev Productivity [pdf]
metr.org
discuss
a year ago
davikr
2 points
25.
▲
Recent Frontier Models Are Reward Hacking
metr.org
discuss
a year ago
surprisetalk
2 points
26.
▲
Measuring AI Ability to Complete Long Tasks
metr.org
discuss
a year ago
pabo
2 points
27.
▲
METR: Model Evaluation and Threat Research
metr.org
discuss
2 years ago
Olshansky
2 points
28.
▲
AI Cheats [pdf]
metr.org
discuss
a month ago
brian_herman
1 points
29.
▲
Task-Completion Time Horizons of Frontier AI Models
metr.org
discuss
a month ago
nsoonhui
1 points
30.
▲
Research note: Fine-tuning experiments on CoT controllability
metr.org
discuss
2 months ago
mooreds
1 points
More