HK
Heykuki News
Top
New
Best
Ask
Show
Jobs
Toggle theme
Top
New
Best
Ask
Show
Jobs
Request
1.
▲
I used autoresearch to improve my AGENTS.md, measured against real tasks
stet.sh
7 comments
a month ago
bisonbear
8 points
2.
▲
GPT-5.5 vs. GPT-5.4 vs. Opus 4.7 on 56 real coding tasks from 2 open source repo
stet.sh
discuss
2 months ago
bisonbear
4 points
3.
▲
I benchmarked Opus 4.8 vs. GPT 5.5 on 2 open source repos
stet.sh
discuss
19 days ago
bisonbear
3 points
4.
▲
Your AI coding benchmark is hiding a 2x quality gap
stet.sh
discuss
3 months ago
bisonbear
3 points
5.
▲
I evaluated GLM 5.2 against the frontier on tasks from real repos
stet.sh
2 comments
2 days ago
bisonbear
2 points
6.
▲
GPT-5.5 low vs. medium vs. high vs. xhigh: the reasoning curve on 26 real tasks
stet.sh
discuss
a month ago
bisonbear
2 points
7.
▲
I ran Opus 4.7 vs. Old Opus 4.6 vs. New Opus 4.6 on 28 Zod tasks
stet.sh
discuss
2 months ago
bisonbear
2 points
8.
▲
A brief investigation into the GPT-5.5 regression claims
stet.sh
discuss
a month ago
bisonbear
1 points
9.
▲
The Opus 4.7 reasoning curve - Medium is the best default?
stet.sh
discuss
a month ago
bisonbear
1 points
10.
▲
Coding evals are broken. CI is green while AI code quality goes unmeasured
stet.sh
discuss
2 months ago
bisonbear
1 points
11.
▲
Agents.md is the highest-leverage code you're not testing
stet.sh
discuss
2 months ago
bisonbear
1 points