HK
Heykuki News
Top
New
Best
Ask
Show
Jobs
Toggle theme
Top
New
Best
Ask
Show
Jobs
Request
1.
▲
Ask HN: Are you willing to contribute to OpenAI Evals?
5 comments
3 years ago
nullptr_deref
8 points
2.
▲
Show HN: MCP Bridge – Access Local MCP Servers Remotely
github.com/EvalsOne
1 comment
a year ago
everfly
3 points
3.
▲
Show HN: Iris – first MCP-native eval and observability tool for AI agents
github.com/iris-eval
discuss
3 months ago
iparent
1 points
4.
▲
Show HN: Evalcraft – cassette-based testing for AI agents (pytest, $0/run)
github.com/beyhangl
discuss
4 months ago
beyhang
1 points
5.
▲
Show HN: Visualize OpenAI Evals of GPT-4
github.com/zeno-ml
discuss
3 years ago
confutio
1 points
6.
▲
Evals: a framework for evaluating OpenAI models and a registry of benchmarks
github.com/openai
16 comments
3 years ago
tosh
123 points
7.
▲
Evals in 2025: going beyond simple benchmarks to build models people can use
github.com/huggingface
8 comments
9 months ago
jxmorris12
80 points
8.
▲
Try out Clojure libraries via rebel-readline
github.com/eval
8 comments
3 years ago
todsacerdoti
70 points
9.
▲
Show HN: Fast-agent – Compose MCP enabled Agents and Workflows in minutes
github.com/evalstate
3 comments
a year ago
evalstate
29 points
10.
▲
eval_macro: A New Way to Write Rust Macros
github.com/wdanilo
discuss
a year ago
W4G1
9 points
11.
▲
Show HN: Python lib to run evals across providers: OpenAI, Anthropic, etc.
github.com/crizCraig
1 comment
2 years ago
cr4zy
8 points
12.
▲
Neo Emacs – A GPU-powered Emacs written in Rust with a modern display engine
github.com/eval-exec
1 comment
4 months ago
agarttha
7 points
13.
▲
Neomacs: GPU-accelerated Emacs with inline video, WebKit, and terminal via wgpu
github.com/eval-exec
discuss
4 months ago
evalexec
7 points
14.
▲
Show HN: Open-source dashboard for your domain experts to improve your AI Agents
github.com/getevalkit
discuss
a year ago
mellowcookie
5 points
15.
▲
Exfiltrate Data with NTP
github.com/evallen
discuss
4 years ago
amony
5 points
16.
▲
GPT-4 doesn't pay close attention to detail in some cases
github.com/openai
2 comments
3 years ago
mcaledonensis
3 points
17.
▲
Source code for evaluating decoder-based models: GANs, GMMNs, and VAEs
github.com/tonywu95
1 comment
10 years ago
Dim25
3 points
18.
▲
Show HN: CATArena – Evaluating LLM agents via dynamic enviroment interactions
github.com/AGI-Eval-Official
discuss
6 months ago
jinqueeny
3 points
19.
▲
Example PR to OpenAI evals to get GPT-4 early API access
github.com/openai
discuss
3 years ago
harrisonjackson
3 points
20.
▲
EvalML: An AutoML library written in Python
github.com/alteryx
discuss
5 years ago
merqurio
3 points
21.
▲
I hope to help you evaluate your GenAI App
github.com/shihongDev
2 comments
5 months ago
shloveai
2 points
22.
▲
Show HN: EvalView – Catch agent regressions before you ship (pytest for agents)
github.com/hidai25
1 comment
5 months ago
hidai25
2 points
23.
▲
Stop benchmarking LLMs. Make them fight
github.com/AGI-Eval-Official
discuss
6 months ago
jinqueeny
2 points
24.
▲
Eval Protocol: RL for agents in any language, container, or framework
github.com/eval-protocol
discuss
7 months ago
dphuang2
2 points
25.
▲
Automatic Evals for LLMs
github.com/mlfoundations
discuss
a year ago
saikatsg
2 points
26.
▲
LLM Evaluation Guidebook
github.com/huggingface
discuss
2 years ago
erinys
2 points
27.
▲
HuggingFace/evaluate: A library for easily evaluating ML models and datasets
github.com/huggingface
discuss
4 years ago
occamschainsaw
2 points
28.
▲
Why Neutralinojs Is Better? Comparing with Electron and Node Webkit
github.com/neutralinojs
discuss
8 years ago
delvincasper
2 points
29.
▲
Tell HN: OpenAI charges you even if you're helping them on their Eval plarform
github.com/openai
2 comments
3 years ago
behnamoh
1 points
30.
▲
OpenAI crowd sources LLM benchmarking datasets by offering advanced GPT-4 access
github.com/openai
2 comments
3 years ago
teaearlgraycold
1 points
More