HK
Heykuki News
Top
New
Best
Ask
Show
Jobs
Toggle theme
Top
New
Best
Ask
Show
Jobs
Request
31.
▲
Show HN: EvalView pytest style tests for AI agents (budgets, hallucinations)
github.com/hidai25
1 comment
6 months ago
hidai25
1 points
32.
▲
Evaluatly is now open source and free
github.com/evaluatly
1 comment
6 years ago
gpnt
1 points
33.
▲
Show HN: EvalLens – Open-source tool to evaluate structured LLM outputs
github.com/simonrendona
discuss
3 months ago
simonrendon
1 points
34.
▲
Evalien – Node.js event loop agent harness
github.com/agentbellnorm
discuss
4 months ago
agentbellnorm
1 points
35.
▲
Show HN: Evals skill for agents – no tooling, just Markdown and subagents
github.com/adriancooney
discuss
5 months ago
adriancooney
1 points
36.
▲
Evalite: Evaluate your LLM-powered apps with TypeScript
github.com/mattpocock
discuss
6 months ago
handfuloflight
1 points
37.
▲
Triilman25/evaluation-machine-for-classification-models
github.com/triilman25
discuss
a year ago
triilman
1 points
38.
▲
Eval Villain Update released Find those dangerous JavaScript functions
github.com/swoops
discuss
2 years ago
tony-ds
1 points
39.
▲
GitHub Action for Cluster API
github.com/evalsocket
discuss
6 years ago
evalsocket
1 points
40.
▲
Eval – a bot that executes arbitrary JavaScript and posts the result on Plemora
github.com/CosineP
discuss
6 years ago
aeroplain
1 points
41.
▲
Evaldb: Use your favorite language as a database
github.com/turbio
discuss
7 years ago
amasad
1 points
42.
▲
Show HN: Evalfilter: A simple Golang evaluation engine for filtering via scripts
github.com/skx
discuss
7 years ago
stevekemp
1 points
43.
▲
Evaluate JavaScript code blocks from within markdown
github.com/reggi
discuss
11 years ago
thomasreggi
1 points
44.
▲
Show HN: Open-source alternative to ChatGPT Agents for browsing
github.com/trymeka
23 comments
a year ago
ElasticBottle
104 points
45.
▲
Show HN: ColiVara – State of the Art RAG API with Vision Models
github.com/tjmlabs
discuss
2 years ago
jonathan-adly
10 points
46.
▲
Show HN: Pixeebot – a GitHub App that fixes your Sonar findings (Java/Python)
pixee.ai
discuss
2 years ago
nahsra
10 points
47.
▲
Show HN: Neuron – Cognitive Multi-Agent Architecture for Reasoning
discuss
10 months ago
machinemusic
8 points
48.
▲
Show HN: Auto LLM Ranker – Describe a task in English and get ranked models
github.com/gauravvij
discuss
3 months ago
gauravvij137
3 points
49.
▲
Q Evaluation Harness: open-source evals for LLMs on q/kdb+
github.com/KxSystems
discuss
10 months ago
erfan_mhi
2 points
50.
▲
Show HN: FizzBuzz purely in Rust's trait system
github.com/doctorn
30 comments
6 years ago
doctor_n_
120 points
51.
▲
Show HN: Duktape-eval – a eval library built on Duktape and WebAssembly
github.com/maple3142
6 comments
6 years ago
maple3142
41 points
52.
▲
Show HN: Pytest-evals – Simple LLM apps evaluation using pytest
github.com/AlmogBaku
3 comments
a year ago
almogbaku
13 points
53.
▲
Show HN: Agent-evals – Claude skill to build your own evals
github.com/fsilavong
1 comment
2 months ago
sauercrowd
9 points
54.
▲
EvalAI: An open-source alternative of Kaggle
github.com/Cloud-CV
discuss
9 years ago
deshraj
6 points
55.
▲
Estonia's voting system: a python program on GitHub
github.com/vvk-ehk
1 comment
10 years ago
leephillips
5 points
56.
▲
Gbrain-Evals
github.com/garrytan
1 comment
2 months ago
mjtk
4 points
57.
▲
I tested Haiku vs. Sonnet across 3 agent tasks – the cheap model won every time
github.com/aimvik07
discuss
a month ago
aimvik07
3 points
58.
▲
GPT-4o Benchmark Results
github.com/openai
discuss
2 years ago
joak
3 points
59.
▲
OpenAI/Simple-Evals
github.com/openai
discuss
2 years ago
davidbarker
3 points
60.
▲
Show HN: Retrieval Evaluations Framework
github.com/DeployQL
discuss
2 years ago
mtbarta
3 points
More