HK
Heykuki News
Top
New
Best
Ask
Show
Jobs
Toggle theme
Top
New
Best
Ask
Show
Jobs
Request
541.
▲
RawBench: A minimal prompt evaluation framework
github.com/0xsomesh
discuss
a year ago
handfuloflight
1 points
542.
▲
Assayer: Python-RQ watchdog for ML model checkpoint monitoring and evaluation
github.com/amoudgl
discuss
a year ago
amoudgl
1 points
543.
▲
Show HN: Digit-Class Prime Product Framework (Prime Factorization Evals for LMs)
github.com/arthurcolle
discuss
a year ago
arthurcolle
1 points
544.
▲
E2E LLM evals, with less focus on metrics and more focus on binary assertions
github.com/openchatai
discuss
a year ago
gharbat
1 points
545.
▲
Ask HN: What RAG evaluations do you care about?
discuss
a year ago
ArnavAgrawal03
1 points
546.
▲
NoLiMa: Long-Context Evaluation Beyond Literal Matching
github.com/adobe-research
discuss
a year ago
llm_nerd
1 points
547.
▲
Evaluating and Training Multi-Modal Large Language Models for Action Recognition
github.com/AdaptiveMotorControlLab
discuss
a year ago
moatmoat
1 points
548.
▲
An Implementation of Eval() for Rust
github.com/evcxr
discuss
a year ago
jcbhmr
1 points
549.
▲
I built a Python pipeline to evaluate the Exosome Complex in AlphaFold &CombFold
github.com/christopheragnus
discuss
2 years ago
christopher8827
1 points
550.
▲
Litmus: LLM Testing and Evaluation Tool for AI App Development on Google Cloud
github.com/google
discuss
2 years ago
joburgalex
1 points
551.
▲
Llama Stack by Meta – Inference, Safety, Memory, Agentic System, Evaluation
github.com/meta-llama
discuss
2 years ago
vikrantrathore
1 points
552.
▲
Unibench: Vision-Language Model Evaluation
github.com/facebookresearch
discuss
2 years ago
zerojames
1 points
553.
▲
LLM Evaluation Methods
github.com/alopatenko
discuss
2 years ago
pltig
1 points
554.
▲
Show HN: Serializable infix expressions and a Python evaluator
github.com/shrir
discuss
2 years ago
sb13
1 points
555.
▲
FreeEval: A Framework for Trustworthy and Efficient Evaluation of LLMs
github.com/WisdomShell
discuss
2 years ago
PaulHoule
1 points
556.
▲
Llama.cpp: Improve CPU prompt eval speed
github.com/ggerganov
discuss
2 years ago
tosh
1 points
557.
▲
Evaluate LLMs in Real Time with Street Fighter III
github.com/OpenGenerativeAI
discuss
2 years ago
magoghm
1 points
558.
▲
Evaluating Claude 3 for Converting Screenshots to Code
github.com/abi
discuss
2 years ago
abi
1 points
559.
▲
Show HN: Hiring when you don't know exactly how to evalute candidates
github.com/joelparkerhenderson
discuss
2 years ago
jph
1 points
560.
▲
Multi-bitrate JPEG compression perceptual evaluation dataset 2023
github.com/google-research
discuss
2 years ago
ksec
1 points
561.
▲
Show HN: Lone Arena – Self-hosted LLM human evaluation, you be the judge
github.com/Contextualist
discuss
2 years ago
Contextualist
1 points
562.
▲
IFEval: Evaluator for LLMs
github.com/Rohan2002
discuss
2 years ago
simonpure
1 points
563.
▲
Genealogos takes outputs from Nix evaluation tools and produces SBoM files
github.com/tweag
discuss
2 years ago
ghuntley
1 points
564.
▲
Show HN: Open-source evaluations for web agents
github.com/reworkd
discuss
3 years ago
asim-shrestha
1 points
565.
▲
PhaseLLM Eval: run batch LLM jobs and evals via visual front-end (MIT licensed)
github.com/wgryc
discuss
3 years ago
cl42
1 points
566.
▲
Thudm/AgentBench: A Comprehensive Benchmark to Evaluate LLMs as Agents
github.com/THUDM
discuss
3 years ago
freediver
1 points
567.
▲
AgentBench: A Comprehensive Benchmark to Evaluate LLMs as Agents
github.com/THUDM
discuss
3 years ago
swyx
1 points
568.
▲
Evaluate Multiple LLMs Easily
github.com/ray-project
discuss
3 years ago
fzliu
1 points
569.
▲
Show HN: ChainForge, a visual tool for evaluating LLM responses
github.com/ianarawjo
discuss
3 years ago
fatso784
1 points
570.
▲
Lazy evaluation and infinite streams in C++
github.com/apresta
discuss
14 years ago
jimmy2times
1 points
More