Search: github.com/eval | Heykuki News

HK

Heykuki News

Top New Best Ask Show Jobs

Top New Best Ask Show Jobs

1.

Ask HN: Are you willing to contribute to OpenAI Evals?

3 years ago

8 points

2.

Show HN: MCP Bridge – Access Local MCP Servers Remotely

github.com/EvalsOne

a year ago

3 points

3.

Show HN: Iris – first MCP-native eval and observability tool for AI agents

github.com/iris-eval

3 months ago

1 points

4.

Show HN: Evalcraft – cassette-based testing for AI agents (pytest, $0/run)

github.com/beyhangl

4 months ago

1 points

5.

Show HN: Visualize OpenAI Evals of GPT-4

github.com/zeno-ml

3 years ago

1 points

6.

Evals: a framework for evaluating OpenAI models and a registry of benchmarks

github.com/openai

3 years ago

123 points

7.

Evals in 2025: going beyond simple benchmarks to build models people can use

github.com/huggingface

9 months ago

80 points

8.

Try out Clojure libraries via rebel-readline

github.com/eval

3 years ago

70 points

9.

Show HN: Fast-agent – Compose MCP enabled Agents and Workflows in minutes

github.com/evalstate

a year ago

29 points

10.

eval_macro: A New Way to Write Rust Macros

github.com/wdanilo

a year ago

9 points

11.

Show HN: Python lib to run evals across providers: OpenAI, Anthropic, etc.

github.com/crizCraig

2 years ago

8 points

12.

Neo Emacs – A GPU-powered Emacs written in Rust with a modern display engine

github.com/eval-exec

4 months ago

7 points

13.

Neomacs: GPU-accelerated Emacs with inline video, WebKit, and terminal via wgpu

github.com/eval-exec

4 months ago

7 points

14.

Show HN: Open-source dashboard for your domain experts to improve your AI Agents

github.com/getevalkit

a year ago

5 points

15.

Exfiltrate Data with NTP

github.com/evallen

4 years ago

5 points

16.

GPT-4 doesn't pay close attention to detail in some cases

github.com/openai

3 years ago

3 points

17.

Source code for evaluating decoder-based models: GANs, GMMNs, and VAEs

github.com/tonywu95

10 years ago

3 points

18.

Show HN: CATArena – Evaluating LLM agents via dynamic enviroment interactions

github.com/AGI-Eval-Official

6 months ago

3 points

19.

Example PR to OpenAI evals to get GPT-4 early API access

github.com/openai

3 years ago

harrisonjackson

3 points

20.

EvalML: An AutoML library written in Python

github.com/alteryx

5 years ago

3 points

21.

I hope to help you evaluate your GenAI App

github.com/shihongDev

5 months ago

2 points

22.

Show HN: EvalView – Catch agent regressions before you ship (pytest for agents)

github.com/hidai25

5 months ago

2 points

23.

Stop benchmarking LLMs. Make them fight

github.com/AGI-Eval-Official

6 months ago

2 points

24.

Eval Protocol: RL for agents in any language, container, or framework

github.com/eval-protocol

7 months ago

2 points

25.

Automatic Evals for LLMs

github.com/mlfoundations

a year ago

2 points

26.

LLM Evaluation Guidebook

github.com/huggingface

2 years ago

2 points

27.

HuggingFace/evaluate: A library for easily evaluating ML models and datasets

github.com/huggingface

4 years ago

2 points

28.

Why Neutralinojs Is Better? Comparing with Electron and Node Webkit

github.com/neutralinojs

8 years ago

2 points

29.

Tell HN: OpenAI charges you even if you're helping them on their Eval plarform

github.com/openai

3 years ago

1 points

30.

OpenAI crowd sources LLM benchmarking datasets by offering advanced GPT-4 access

github.com/openai

3 years ago

teaearlgraycold

1 points