HK
Heykuki News
Top
New
Best
Ask
Show
Jobs
Toggle theme
Top
New
Best
Ask
Show
Jobs
Request
961.
▲
Show HN: mlx-chronos - benchmark MLX inference engines on Apple Silicon
github.com/igurss
discuss
5 hours ago
igurss
2 points
962.
▲
Benchmark unlimited Claude.md files against eachother
github.com/emiliolugo
discuss
10 hours ago
emiliolugo
2 points
963.
▲
Show HN: InferBench – Benchmark local LLM engines with one click
github.com/JoniMartin27
discuss
20 days ago
JoniMartin
2 points
964.
▲
BrowseComp-Plus: A More Fair and Transparent Benchmark of Deep-Research Agent
github.com/texttron
discuss
21 days ago
colonCapitalDee
2 points
965.
▲
Show HN: AgentThreatBench – Benchmark for AI Agent Memory Security
github.com/OWASP
discuss
25 days ago
vgudur297
2 points
966.
▲
Prompter – Compare and benchmark Ollama models side-by-side in your terminal
github.com/whonixnetworks
discuss
a month ago
whonixnetworks
2 points
967.
▲
Show HN: 97% on SWE-bench Verified with subscription-token agents
github.com/kimjune01
discuss
a month ago
kimjune01
2 points
968.
▲
Show HN: Verdict – model evals on your own data, not someone else's benchmark
github.com/aevyraai
discuss
2 months ago
agunapal
2 points
969.
▲
talkie-coder: From 1930 to SWE-bench
github.com/RicardoDominguez
discuss
2 months ago
Philpax
2 points
970.
▲
Open macro placement benchmark and $20k challenge (HRT-sponsored)
github.com/partcleda
discuss
3 months ago
anonymousmoos
2 points
971.
▲
Show HN: WMB-100K – Open benchmark for AI memory systems at 100K turns
github.com/Irina1920
discuss
3 months ago
wontopos
2 points
972.
▲
Show HN: OpenClaw Arena – Benchmark models on real tasks, rank by perf and cost
app.uniclaw.ai
discuss
3 months ago
skysniper
2 points
973.
▲
An open source benchmarking framework for IT automation
github.com/itbench-hub
discuss
3 months ago
pranay01
2 points
974.
▲
Mitata: Benchmark tooling that loves you
github.com/evanwashere
discuss
3 months ago
jcbhmr
2 points
975.
▲
Help me improving this benchmark for vector engines
github.com/M4iKZ
discuss
3 months ago
M4iKZ
2 points
976.
▲
Some critical issues with the SWE-bench-Pro environments
github.com/SWE-agent
discuss
3 months ago
snoopyswe
2 points
977.
▲
BetterKV – A multithreaded Rust Redis alternative, 10-30x faster in benchmarks
discuss
3 months ago
1jmdev
2 points
978.
▲
Show HN: ModelSweep - Open-Source Benchmarking for Local LLMs
github.com/leonickson1
discuss
3 months ago
leonickson
2 points
979.
▲
FratBench – Social Calibration Benchmark (OAI Scores Dead Last) [pdf]
github.com/richar-wang
discuss
3 months ago
richardwang5
2 points
980.
▲
TLAi+ Benchmarks for Evaluating LLMs
github.com/tlaplus
discuss
4 months ago
alhazrod
2 points
981.
▲
An Nginx Engineer Took over AI's Benchmark Tool
github.com/hongzhidao
discuss
5 months ago
zhidao9
2 points
982.
▲
KiteSQL: Rust-native embedded SQL with TPC-C benchmarks and WASM support
github.com/KipData
discuss
5 months ago
Jacques2Marais
2 points
983.
▲
WorkBench-Pro – PC benchmark designed for developer workflows
github.com/johanmcad
discuss
5 months ago
johanmcad
2 points
984.
▲
Benchmark Comparison: JSONL vs. TOON output for JSON-render efficiency
github.com/vercel-labs
discuss
5 months ago
lafalce
2 points
985.
▲
Show HN: Rerankers – Models, benchmarks, and papers for RAG
github.com/agentset-ai
discuss
5 months ago
midamurat
2 points
986.
▲
Show HN: sc-membench for modern memory bandwidth and latency benchmarks
github.com/spareCores
discuss
5 months ago
daroczig
2 points
987.
▲
Show HN: Long-horizon LLM coherence benchmark (500 cycles)
zenodo.org
discuss
5 months ago
teugent
2 points
988.
▲
Epiplexity to Beat DeepMind's Alchemy Meta RL Benchmark
github.com/RandMan444
discuss
6 months ago
Phillip98798
2 points
989.
▲
Show HN: JSONBench, a Benchmark for Data Analytics on JSON
github.com/ClickHouse
discuss
6 months ago
saisrirampur
2 points
990.
▲
Stop benchmarking LLMs. Make them fight
github.com/AGI-Eval-Official
discuss
6 months ago
jinqueeny
2 points
More