HK
Heykuki News
Top
New
Best
Ask
Show
Jobs
Toggle theme
Top
New
Best
Ask
Show
Jobs
Request
1.
▲
Top model scores may be skewed by Git history leaks in SWE-bench
github.com/SWE-bench
153 comments
9 months ago
mustaphah
466 points
2.
▲
SWE-Bench Pro
github.com/scaleapi
28 comments
9 months ago
tosh
101 points
3.
▲
SWE-bench verified agents may look at future repository state
github.com/SWE-bench
discuss
10 months ago
brrrrrm
4 points
4.
▲
Show HN: 97% on SWE-bench Verified with subscription-token agents
github.com/kimjune01
discuss
a month ago
kimjune01
2 points
5.
▲
I made a viewer for the SWE-Bench dataset
github.com/mwufi
discuss
2 years ago
randomcatuser
1 points
6.
▲
Show HN: Codex context bloat? 87% avg reduction on SWE-bench Verified traces
npmjs.com
2 comments
2 months ago
george_ciobanu
10 points
7.
▲
Show HN: Mini-swe-agent achieves 65% on SWE-bench in 100 lines of python
github.com/SWE-agent
4 comments
a year ago
lieret
7 points
8.
▲
Show HN: New eval from SWE-bench team evalutes LMs based on goals not tickets
codeclash.ai
1 comment
8 months ago
lieret
5 points
9.
▲
Show HN: Mcpbr – does your MCP help? Test it on SWE-bench and 25 evals
github.com/greynewell
discuss
5 months ago
greynewell
4 points
10.
▲
SWE-gen: Scaling SWE-bench task generation
github.com/abundant-ai
discuss
5 months ago
coffeecoder123
4 points
11.
▲
SWE-Bench for Taxes
github.com/column-tax
discuss
a year ago
michaelrbock
3 points
12.
▲
Show HN: Loki Mode hit 99.67% SWE-Bench – MAF built a SaaS overnight
github.com/asklokesh
5 comments
5 months ago
slogansand
2 points
13.
▲
talkie-coder: From 1930 to SWE-bench
github.com/RicardoDominguez
discuss
2 months ago
Philpax
2 points
14.
▲
Some critical issues with the SWE-bench-Pro environments
github.com/SWE-agent
discuss
3 months ago
snoopyswe
2 points
15.
▲
Show HN: New SWE-bench leaderboard compares LMs without fancy agent scaffolds
swebench.com
discuss
a year ago
lieret
2 points
16.
▲
Show HN: Sales Agent Benchmark – SWE-Bench for sales AI agents (open source)
sales-agent-benchmarks.fly.dev
discuss
4 months ago
a1j9o94
1 points
17.
▲
Show HN: Statewright – Visual state machines that make AI agents reliable
github.com/statewright
59 comments
a month ago
azurewraith
126 points
18.
▲
Show HN: AgentKit – JavaScript Alternative to OpenAI Agents SDK with Native MCP
github.com/inngest
15 comments
a year ago
tonyhb
64 points
19.
▲
Show HN: Anterion – Open-source AI software engineer (SWE-agent and OpenDevin)
github.com/MiscellaneousStuff
2 comments
2 years ago
miscstuffz
4 points
20.
▲
Show HN: Qwen3-Coder API – 480B open-source code LLM
netmind.ai
1 comment
a year ago
elricwan
3 points
21.
▲
Show HN: Gemini 2.5 is the best model for Kotlin and Android dev
firebender.com
discuss
a year ago
aman-firebender
3 points
22.
▲
Show HN: Tarmac – Know what Claude Code will cost before you run it
github.com/CodeSarthak
1 comment
4 months ago
sarthakaggarwal
2 points
23.
▲
Show HN: Measuring how AI agent teams improve issue resolution on SWE-Verified
arxiv.org
discuss
4 months ago
NBenkovich
2 points
24.
▲
Show HN: Salacia – The First Runtime OS for Agentic Coding
1 comment
4 months ago
alfredhua
1 points
25.
▲
Show HN: Repowise – Codebase intelligence for AI coding agents (open source)
github.com/repowise-dev
discuss
3 months ago
raghavchamadiya
1 points
26.
▲
Show HN: Tracecore: Benchmark AI Agents on Deterministic Coding Tasks
github.com/justindobbs
discuss
4 months ago
extra_cookin
1 points
27.
▲
Show HN: Frouter – Live-ping and auto-configure free AI models for coding agents
github.com/jyoung105
discuss
4 months ago
jyoung105
1 points