Search: stet.sh | Heykuki News

HK

Heykuki News

Top New Best Ask Show Jobs

Top New Best Ask Show Jobs

1.

I used autoresearch to improve my AGENTS.md, measured against real tasks

a month ago

8 points

2.

GPT-5.5 vs. GPT-5.4 vs. Opus 4.7 on 56 real coding tasks from 2 open source repo

2 months ago

4 points

3.

I benchmarked Opus 4.8 vs. GPT 5.5 on 2 open source repos

19 days ago

3 points

4.

Your AI coding benchmark is hiding a 2x quality gap

3 months ago

3 points

5.

I evaluated GLM 5.2 against the frontier on tasks from real repos

2 days ago

2 points

6.

GPT-5.5 low vs. medium vs. high vs. xhigh: the reasoning curve on 26 real tasks

a month ago

2 points

7.

I ran Opus 4.7 vs. Old Opus 4.6 vs. New Opus 4.6 on 28 Zod tasks

2 months ago

2 points

8.

A brief investigation into the GPT-5.5 regression claims

a month ago

1 points

9.

The Opus 4.7 reasoning curve - Medium is the best default?

a month ago

1 points

10.

Coding evals are broken. CI is green while AI code quality goes unmeasured

2 months ago

1 points

11.

Agents.md is the highest-leverage code you're not testing

2 months ago

1 points