Search: metr.org | Heykuki News

HK

Heykuki News

Top New Best Ask Show Jobs

Top New Best Ask Show Jobs

1.

Show HN: Testing hypotheses through prediction is the next step towards AGI

github.com/Judahmeek

7 months ago

3 points

2.

Show HN: Significance-Hypothesis-Based-ARC-AGI-2-puzzle-solver

github.com/Judahmeek

7 months ago

2 points

3.

Measuring the impact of AI on experienced open-source developer productivity

a year ago

775 points

4.

Many SWE-bench-Passing PRs would not be merged

3 months ago

278 points

5.

Measuring AI Ability to Complete Long Tasks

6 months ago

247 points

6.

We are changing our developer productivity experiment design

4 months ago

88 points

7.

Measuring the Impact of AI on Experienced Open-Source Developer Productivity [pdf]

a year ago

18 points

8.

Measuring AI Ability to Complete Long Tasks – METR

a year ago

7 points

9.

Measuring AI Ability to Complete Long Tasks

a year ago

4 points

10.

When LLM agents can do a task, they can often do so at a fraction of human cost

2 years ago

4 points

11.

Measuring the Self-Reported Impact of Early-2026 AI on Tech Worker Productivity

a month ago

3 points

12.

The Impact of Early-2025 AI on Open-Source Developer Productivity

9 months ago

3 points

13.

Measuring the Impact of AI on Experienced OSS Developer Productivity [pdf]

a year ago

3 points

14.

We spent 2 hours working in the future

3 months ago

3 points

15.

Measuring AI Ability to Complete Long Tasks (2x every 7 months)

9 months ago

3 points

16.

Bounty: Diverse hard tasks for LLM agents

2 years ago

3 points

17.

AI's Version of Moore's Law

a year ago

2 points

18.

Frontier Risk Report (February to March 2026) – METR

a month ago

2 points

19.

Task-Completion Time Horizons of Frontier AI Models (Includes Opus 4.6)

4 months ago

2 points

20.

Task-Completion Time Horizons of Frontier AI Models – METR

4 months ago

2 points

21.

METR AI Benchmark: Clarifying Limitations of Time Horizon

5 months ago

2 points

22.

Measuring AI Ability to Complete Long Tasks

9 months ago

2 points

23.

Measuring AI Ability to Complete Long Tasks – METR

a year ago

2 points

24.

Measuring the Impact of Early-2025 AI on Experienced OpenSource Dev Productivity [pdf]

a year ago

2 points

25.

Recent Frontier Models Are Reward Hacking

a year ago

2 points

26.

Measuring AI Ability to Complete Long Tasks

a year ago

2 points

27.

METR: Model Evaluation and Threat Research

2 years ago

2 points

28.

AI Cheats [pdf]

a month ago

1 points

29.

Task-Completion Time Horizons of Frontier AI Models

a month ago

1 points

30.

Research note: Fine-tuning experiments on CoT controllability

2 months ago

1 points