HK
Heykuki News
Top
New
Best
Ask
Show
Jobs
Toggle theme
Top
New
Best
Ask
Show
Jobs
Request
961.
▲
TLAi+ Benchmarks for Evaluating LLMs
github.com/tlaplus
discuss
4 months ago
alhazrod
2 points
962.
▲
An Nginx Engineer Took over AI's Benchmark Tool
github.com/hongzhidao
discuss
5 months ago
zhidao9
2 points
963.
▲
KiteSQL: Rust-native embedded SQL with TPC-C benchmarks and WASM support
github.com/KipData
discuss
5 months ago
Jacques2Marais
2 points
964.
▲
WorkBench-Pro – PC benchmark designed for developer workflows
github.com/johanmcad
discuss
5 months ago
johanmcad
2 points
965.
▲
Benchmark Comparison: JSONL vs. TOON output for JSON-render efficiency
github.com/vercel-labs
discuss
5 months ago
lafalce
2 points
966.
▲
Show HN: Rerankers – Models, benchmarks, and papers for RAG
github.com/agentset-ai
discuss
5 months ago
midamurat
2 points
967.
▲
Show HN: sc-membench for modern memory bandwidth and latency benchmarks
github.com/spareCores
discuss
5 months ago
daroczig
2 points
968.
▲
Show HN: Long-horizon LLM coherence benchmark (500 cycles)
zenodo.org
discuss
5 months ago
teugent
2 points
969.
▲
Epiplexity to Beat DeepMind's Alchemy Meta RL Benchmark
github.com/RandMan444
discuss
6 months ago
Phillip98798
2 points
970.
▲
Show HN: JSONBench, a Benchmark for Data Analytics on JSON
github.com/ClickHouse
discuss
6 months ago
saisrirampur
2 points
971.
▲
Stop benchmarking LLMs. Make them fight
github.com/AGI-Eval-Official
discuss
6 months ago
jinqueeny
2 points
972.
▲
Show HN: Sigma Runtime – 550-cycle identity stability benchmark on GPT-5.2
github.com/sigmastratum
discuss
6 months ago
teugent
2 points
973.
▲
Benchmarking LLMs on whether they can play FizzBuzz
github.com/venkatasg
discuss
6 months ago
_venkatasg
2 points
974.
▲
Running a 270M LLM on Android (architecture and benchmarks)
discuss
7 months ago
ayushranjan99
2 points
975.
▲
TypeNet Benchmark for development of authentication keystroke technologies
github.com/BiDAlab
discuss
9 months ago
mooreds
2 points
976.
▲
AutoCodeBench: Large Language Models Are Automatic Code Benchmark Generators
github.com/Tencent-Hunyuan
discuss
9 months ago
ngrilly
2 points
977.
▲
Behavior: Robot manipulation benchmark based on 1000 household tasks
github.com/StanfordVL
discuss
9 months ago
transpute
2 points
978.
▲
Show HN: LLM‑Simple‑Eval – Easily Benchmark LLMs for Your Use Case
github.com/grigio
discuss
10 months ago
grigio
2 points
979.
▲
PostgreSQL vs. ClickHouse: Learnings from building my first database benchmark
github.com/514-labs
discuss
a year ago
oatsandsugar
2 points
980.
▲
Show HN: New SWE-bench leaderboard compares LMs without fancy agent scaffolds
swebench.com
discuss
a year ago
lieret
2 points
981.
▲
Show HN: VDBbench 1.0: open-source benchmarking for VectorDBs
github.com/zilliztech
discuss
a year ago
Fendy
2 points
982.
▲
MAIR: A Benchmark for Evaluating Instructed Retrieval
github.com/sunnweiwei
discuss
a year ago
fzliu
2 points
983.
▲
Show HN: Comprehensive Benchmark Suite for Story Visualization
github.com/ViStoryBench
discuss
a year ago
hzwer
2 points
984.
▲
Show HN: Benchmarks agree with the complexity analysis of the TopoSort algorithm
github.com/williamw520
discuss
a year ago
ww520
2 points
985.
▲
Show HN: I built an open-source benchmark that evaluates LLMs through gameplay
llmshowdown.io
discuss
a year ago
jmogi
2 points
986.
▲
QuickBench: A Zero-Dependency Linux Benchmark for CPU, Memory, and Storage
github.com/bearstech
discuss
a year ago
kadrek
2 points
987.
▲
Elimination Game Benchmark: Social Reasoning, Strategy, and Deception in LLMs
github.com/lechmazur
discuss
a year ago
amichail
2 points
988.
▲
Latest Benchmarks Show 10x Faster Prefix Queries vs. Etcd
discuss
2 years ago
absolute7
2 points
989.
▲
C++ Showing std:swap faster than XOR trick to swap numbers via naive benchmark
github.com/vladov3000
discuss
2 years ago
signa11
2 points
990.
▲
Benchmarks Comparing PyTorch and MLX on Apple Silicon GPUs
github.com/LucasSte
discuss
2 years ago
tosh
2 points
More