HK
Heykuki News
Top
New
Best
Ask
Show
Jobs
Toggle theme
Top
New
Best
Ask
Show
Jobs
Request
61.
▲
Clojure protected REPL
github.com/stacksideflow
discuss
7 years ago
stacksideflow2
3 points
62.
▲
Evaluate Selections in Sublime Text
github.com/jbrooksuk
2 comments
13 years ago
jbrooksuk
2 points
63.
▲
Estonia publishes its e-voting source code
github.com/vvk-ehk
1 comment
13 years ago
duggieawesome
2 points
64.
▲
Evaluation of Various MLX Quantizations
github.com/deepsweet
1 comment
a month ago
d-_-b
2 points
65.
▲
Should we chaos test our agents?
github.com/Corbell-AI
1 comment
a month ago
himmi-01
2 points
66.
▲
Open-source LLM-as-judge eval suite with root cause analysis and failure mining
github.com/colingfly
1 comment
3 months ago
colinfly
2 points
67.
▲
Evaluating LLMs with CommonGen-Lite
github.com/allenai
1 comment
2 years ago
georgehill
2 points
68.
▲
Evals Skills for AI Agents
github.com/latitude-dev
discuss
2 months ago
paulaq
2 points
69.
▲
Show HN: Claude Code skills for building LLM evals
github.com/latitude-dev
discuss
2 months ago
paulaq
2 points
70.
▲
Show HN: LLM‑Simple‑Eval – Easily Benchmark LLMs for Your Use Case
github.com/grigio
discuss
10 months ago
grigio
2 points
71.
▲
Evaluating Large Language Models Using LLM-as-a-Judge
github.com/aws-samples
discuss
2 years ago
mooreds
2 points
72.
▲
GPT-4-turbo-2024-04-09 "wins" simple evals benchmark
github.com/openai
discuss
2 years ago
zurfer
2 points
73.
▲
A survey on evaluation of large language models
github.com/MLGroupJLU
discuss
3 years ago
hhs
2 points
74.
▲
OpenFF – Automated estimation of physical properties
github.com/openforcefield
discuss
5 years ago
alex_hirner
2 points
75.
▲
Show HN: IR_evaluation – Information retrieval evaluation metrics in pure Python
github.com/plurch
2 comments
a year ago
plurch
1 points
76.
▲
Pulze AI Evals
github.com/pulzeai-oss
1 comment
a year ago
fbnbr
1 points
77.
▲
BSD_Evals: Open-source LLM evaluation tool
github.com/brettdidonato
1 comment
2 years ago
bsdpython
1 points
78.
▲
AgentSafeLabs – Launched Open-source Security framework for AI agents
github.com/AgentSafeLabs
discuss
a month ago
waqarjaved
1 points
79.
▲
Show HN: EleutherAI / Lm-Evaluation-Harness
github.com/EleutherAI
discuss
a month ago
marvinified
1 points
80.
▲
Webgrid Eval: LLM vision + tool-use on Neuralink's cursor control task
github.com/ofou
discuss
4 months ago
ofou
1 points
81.
▲
Network Evaluation Service
github.com/hendemic
discuss
a year ago
gregsadetsky
1 points
82.
▲
OpenAI: Simple-Evals
github.com/openai
discuss
2 years ago
tosh
1 points
83.
▲
ReactEval: Evaluating LLMs on front-end code generation
github.com/gitwitorg
discuss
2 years ago
jamesmurdza
1 points
84.
▲
Language Model Evaluation Harness
github.com/EleutherAI
discuss
3 years ago
tosh
1 points
85.
▲
Nextdoor's Cloud Security Posture Management (CSPM) Evaluation Matrix
github.com/Nextdoor
discuss
3 years ago
scapecast
1 points
86.
▲
Show HN: EvalGPT – Code interpreter and agent framework inspired by Google Borg
github.com/index-labs
discuss
3 years ago
jiayuanzhang
1 points
87.
▲
Trait-Eval – Rust
github.com/doctorn
discuss
6 years ago
blopeur
1 points
88.
▲
Show HN: Little tool to evaluate your cryptocurrency trades on Poloniex
github.com/enricobacis
discuss
9 years ago
enricobacis
1 points
89.
▲
Show HN: Freeact – A Lightweight Library for Code-Action Based Agents
github.com/gradion-ai
5 comments
a year ago
cstub
122 points
90.
▲
Show HN: Ellipsis – Automated PR reviews and bug fixes
ellipsis.dev
64 comments
2 years ago
hunterbrooks
121 points
More