Search: github.com/SWE-bench | Heykuki News

HK

Heykuki News

Top New Best Ask Show Jobs

Top New Best Ask Show Jobs

1.

Top model scores may be skewed by Git history leaks in SWE-bench

github.com/SWE-bench

9 months ago

466 points

2.

github.com/scaleapi

9 months ago

101 points

3.

SWE-bench verified agents may look at future repository state

github.com/SWE-bench

10 months ago

4 points

4.

Show HN: 97% on SWE-bench Verified with subscription-token agents

github.com/kimjune01

a month ago

2 points

5.

I made a viewer for the SWE-Bench dataset

github.com/mwufi

2 years ago

1 points

6.

Show HN: Codex context bloat? 87% avg reduction on SWE-bench Verified traces

2 months ago

10 points

7.

Show HN: Mini-swe-agent achieves 65% on SWE-bench in 100 lines of python

github.com/SWE-agent

a year ago

7 points

8.

Show HN: New eval from SWE-bench team evalutes LMs based on goals not tickets

8 months ago

5 points

9.

Show HN: Mcpbr – does your MCP help? Test it on SWE-bench and 25 evals

github.com/greynewell

5 months ago

4 points

10.

SWE-gen: Scaling SWE-bench task generation

github.com/abundant-ai

5 months ago

4 points

11.

SWE-Bench for Taxes

github.com/column-tax

a year ago

3 points

12.

Show HN: Loki Mode hit 99.67% SWE-Bench – MAF built a SaaS overnight

github.com/asklokesh

5 months ago

2 points

13.

talkie-coder: From 1930 to SWE-bench

github.com/RicardoDominguez

2 months ago

2 points

14.

Some critical issues with the SWE-bench-Pro environments

github.com/SWE-agent

3 months ago

2 points

15.

Show HN: New SWE-bench leaderboard compares LMs without fancy agent scaffolds

a year ago

2 points

16.

Show HN: Sales Agent Benchmark – SWE-Bench for sales AI agents (open source)

sales-agent-benchmarks.fly.dev

4 months ago

1 points

17.

Show HN: Statewright – Visual state machines that make AI agents reliable

github.com/statewright

a month ago

126 points

18.

Show HN: AgentKit – JavaScript Alternative to OpenAI Agents SDK with Native MCP

github.com/inngest

a year ago

64 points

19.

Show HN: Anterion – Open-source AI software engineer (SWE-agent and OpenDevin)

github.com/MiscellaneousStuff

2 years ago

4 points

20.

Show HN: Qwen3-Coder API – 480B open-source code LLM

a year ago

3 points

21.

Show HN: Gemini 2.5 is the best model for Kotlin and Android dev

a year ago

aman-firebender

3 points

22.

Show HN: Tarmac – Know what Claude Code will cost before you run it

github.com/CodeSarthak

4 months ago

sarthakaggarwal

2 points

23.

Show HN: Measuring how AI agent teams improve issue resolution on SWE-Verified

4 months ago

2 points

24.

Show HN: Salacia – The First Runtime OS for Agentic Coding

4 months ago

1 points

25.

Show HN: Repowise – Codebase intelligence for AI coding agents (open source)

github.com/repowise-dev

3 months ago

raghavchamadiya

1 points

26.

Show HN: Tracecore: Benchmark AI Agents on Deterministic Coding Tasks

github.com/justindobbs

4 months ago

1 points

27.

Show HN: Frouter – Live-ping and auto-configure free AI models for coding agents

github.com/jyoung105

4 months ago

1 points