HK
Heykuki News
Top
New
Best
Ask
Show
Jobs
Toggle theme
Top
New
Best
Ask
Show
Jobs
Request
991.
▲
Show HN: Sigma Runtime – 550-cycle identity stability benchmark on GPT-5.2
github.com/sigmastratum
discuss
6 months ago
teugent
2 points
992.
▲
Benchmarking LLMs on whether they can play FizzBuzz
github.com/venkatasg
discuss
6 months ago
_venkatasg
2 points
993.
▲
Running a 270M LLM on Android (architecture and benchmarks)
discuss
7 months ago
ayushranjan99
2 points
994.
▲
TypeNet Benchmark for development of authentication keystroke technologies
github.com/BiDAlab
discuss
9 months ago
mooreds
2 points
995.
▲
AutoCodeBench: Large Language Models Are Automatic Code Benchmark Generators
github.com/Tencent-Hunyuan
discuss
9 months ago
ngrilly
2 points
996.
▲
Show HN: Little Fluffy Clouds: Combine a bunch of small adjacent networks
github.com/kstrauser
discuss
9 months ago
kstrauser
2 points
997.
▲
Behavior: Robot manipulation benchmark based on 1000 household tasks
github.com/StanfordVL
discuss
9 months ago
transpute
2 points
998.
▲
Show HN: LLM‑Simple‑Eval – Easily Benchmark LLMs for Your Use Case
github.com/grigio
discuss
10 months ago
grigio
2 points
999.
▲
PostgreSQL vs. ClickHouse: Learnings from building my first database benchmark
github.com/514-labs
discuss
a year ago
oatsandsugar
2 points
1000.
▲
Show HN: New SWE-bench leaderboard compares LMs without fancy agent scaffolds
swebench.com
discuss
a year ago
lieret
2 points