HK
Heykuki News
Top
New
Best
Ask
Show
Jobs
Toggle theme
Top
New
Best
Ask
Show
Jobs
Request
1.
▲
Show HN: LLM Function Calling Library to Interact with File, Shell, Git and Code
swekit.dev
discuss
2 years ago
soham123
5 points
2.
▲
Show HN: New SWE-bench leaderboard compares LMs without fancy agent scaffolds
swebench.com
discuss
a year ago
lieret
2 points
3.
▲
Show HN: Agent Benchmark Repository and Viewer
explorer.invariantlabs.ai
discuss
2 years ago
marcfisc
2 points
4.
▲
MiniMax M2.5 is beating Claude Opus 4.6 and MiniMax is 17x-20x cheaper
swebench.com
9 comments
4 months ago
thelinuxkid
6 points
5.
▲
Show HN: Randomly switching between LMs at every step boosts SWE-bench score
swebench.com
1 comment
10 months ago
lieret
5 points
6.
▲
SWE-bench just published an updated list of top AI Agents
swebench.com
discuss
a year ago
laxyz
4 points
7.
▲
Amazon Q Developer Agent is now SOTA on SWE-bench
swebench.com
discuss
2 years ago
brendanfalk
4 points
8.
▲
New leader on swe-bench multimodal
swebench.com
discuss
a year ago
katrin777
3 points
9.
▲
Refact.ai is the new open-source SOTA on SWE-bench Verified and Lite
swebench.com
discuss
a year ago
bystrakowa
3 points
10.
▲
New #1 SOTA on Swe-bench is using Claude 3.7 and O1
swebench.com
discuss
a year ago
knes
3 points
11.
▲
SWE-Bench Can Language Models Resolve Real-World GitHub Issues?
swebench.com
discuss
3 years ago
EvgeniyZh
3 points
12.
▲
Gru.ai Got 35.67% on SWEbench
swebench.com
discuss
2 years ago
BabelCLoud
2 points
13.
▲
SWE-Bench: Can Language Models Resolve Real-World GitHub Issues?
swebench.com
discuss
3 years ago
cjsaltlake
2 points
14.
▲
SWE-bench
swebench.com
discuss
a year ago
katrin777
1 points
15.
▲
SWE-Bench: Can Language Models Resolve Real-World GitHub Issues?
swebench.com
discuss
2 years ago
goranmoomin
1 points
16.
▲
Can Language Models Resolve Real-World GitHub Issues?
swebench.com
discuss
3 years ago
throw2321
1 points