HK
Heykuki News
Top
New
Best
Ask
Show
Jobs
Toggle theme
Top
New
Best
Ask
Show
Jobs
Request
421.
▲
Lambda calculus - compiler, type inference, and evaluator in less than 100 LOC
gist.github.com
discuss
a year ago
tearflake
2 points
422.
▲
Show HN: I built an open-source benchmark that evaluates LLMs through gameplay
llmshowdown.io
discuss
a year ago
jmogi
2 points
423.
▲
Show HN: GenderBench – Evaluation suite for gender biases in LLMs
genderbench.readthedocs.io
discuss
a year ago
matus-pikuliak
2 points
424.
▲
SIMD library for evaluating elementary functions, vectorized libm and DFT
github.com/shibatch
discuss
2 years ago
ashvardanian
2 points
425.
▲
Show HN: Mandoline – Custom LLM Evaluations for Real-World Use Cases
mandoline.ai
discuss
2 years ago
kmckiern
2 points
426.
▲
UpTrain is an open-source unified platform to evaluate and improve Gen AI apps
github.com/uptrain-ai
discuss
2 years ago
mafro
2 points
427.
▲
Optimal Evaluation in 1 Minute (or 10 Minutes) (or 10 Years)
gist.github.com
discuss
2 years ago
LightMachine
2 points
428.
▲
Evaluating LLMs locally, on a laptop, with Llama 3 and Ollama
github.com/rasbt
discuss
2 years ago
rasbt
2 points
429.
▲
Show HN: Paramount – OSS package for *Human* Evals of AI support
github.com/ask-fini
discuss
2 years ago
hakimk
2 points
430.
▲
SDMetrics: Library for evaluating synthetic data quality
github.com/sdv-dev
discuss
2 years ago
skadamat
2 points
431.
▲
Promptfoo – Testing and Evaluation for LLMs
github.com/promptfoo
discuss
3 years ago
tin7in
2 points
432.
▲
Google DeepMind's research on uncertain ground truth in AI eval
github.com/google-deepmind
discuss
3 years ago
minraws
2 points
433.
▲
Show HN: Reference-free evaluation of LLM-powered chatbots
github.com/parea-ai
discuss
3 years ago
Joschkabraun
2 points
434.
▲
Ragas – Framework for RAG Evaluation
github.com/explodinggradients
discuss
3 years ago
izik
2 points
435.
▲
AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models
github.com/ruixiangcui
discuss
3 years ago
accrual
2 points
436.
▲
RAGElo: Toolkit for evaluating RAG agents using tournament-style Elo ranking
github.com/zetaalphavector
discuss
3 years ago
barefeg
2 points
437.
▲
Starwhale: A new MLOps platform for Model Evaluation
github.com/star-whale
discuss
3 years ago
liutianweidlut
2 points
438.
▲
ChainForge now supports chat evaluation
github.com/ianarawjo
discuss
3 years ago
fatso784
2 points
439.
▲
Show HN: CLI for testing and evaluating LLM prompts and outputs
github.com/promptfoo
discuss
3 years ago
typpo
2 points
440.
▲
OSS for training, serving, and evaluating LLM based ChatBots
github.com/lm-sys
discuss
3 years ago
yujian
2 points
441.
▲
Show HN: XV - Expression Evaluator for C
github.com/tidwall
discuss
3 years ago
tidwall
2 points
442.
▲
Croner: Trigger functions or evaluate cron expressions in JavaScript or TS
github.com/Hexagon
discuss
3 years ago
kiyanwang
2 points
443.
▲
Haskell library for evaluating whether chess moves are allowed
github.com/ArnoVanLumig
discuss
3 years ago
tosh
2 points
444.
▲
Show HN: Brace Lang – parse brace groups and evaluate them however you want
github.com/xaedes
discuss
4 years ago
xaedes
2 points
445.
▲
Show HN: Convert VHDL to Verilog using GHDL (+ first evaluation)
github.com/stnolting
discuss
4 years ago
youre_the_voice
2 points
446.
▲
SIMD Library for Evaluating Elementary Functions, Vectorized Libm and DFT
github.com/shibatch
discuss
4 years ago
brrrrrm
2 points
447.
▲
PicoMath: Fast math evaluation library (C++ header-only)
github.com/Nitrillo
discuss
4 years ago
nitrillo
2 points
448.
▲
Parse and evaluate MS Excel formula in JavaScript
github.com/LesterLyu
discuss
4 years ago
eatonphil
2 points
449.
▲
Show HN: ANECompat, evaluate CoreML model compatibility with Apple Neural Engine
github.com/fredyshox
discuss
4 years ago
fredyshox
2 points
450.
▲
Paper Walkthrough: Is Automated Topic Model Evaluation Broken
github.com/acatovic
discuss
4 years ago
armcat
2 points
More