Search: github.com/eval | Heykuki News

HK

Heykuki News

Top New Best Ask Show Jobs

Top New Best Ask Show Jobs

391.

Fast, portable, non-Turing complete expression evaluation with gradual typing

github.com/google

a month ago

2 points

392.

Show HN: Nexa-Gauge – LLM eval framework, now with self-hosted model support

github.com/harnexa

a month ago

2 points

393.

How many of us are evaling our skills?

github.com/BintzGavin

2 months ago

2 points

394.

Show HN: Verdict – model evals on your own data, not someone else's benchmark

github.com/aevyraai

2 months ago

2 points

395.

Show HN: SkillCompass – open-source quality evaluator for your AI skills

github.com/Evol-ai

2 months ago

2 points

396.

Stockfish removes classical evaluation functions in favor of NNUE only (2023)

github.com/official-stockfish

2 months ago

2 points

397.

Show HN: We Evaluates Medical Research Agent Skills

github.com/aipoch

2 months ago

2 points

398.

Tax Logic Evaluation with Prolog

github.com/mthom

3 months ago

2 points

399.

Show HN: Aludel – LLM eval workbench for Phoenix apps

github.com/ccarvalho-eng

3 months ago

2 points

400.

Show HN: A tool to create and evaluate document processing pipelines for RAG

3 months ago

2 points

401.

I built a local-only eval runner for AI agents (quickbench)

github.com/iamGodofall

3 months ago

2 points

402.

LLM evals test outputs. Rarely whether the model understood first

github.com/NoxionAI

3 months ago

2 points

403.

Dynamic E2E Agentic Simulation and Evaluation with Cypress

github.com/gojiplus

3 months ago

2 points

404.

TLAi+ Benchmarks for Evaluating LLMs

github.com/tlaplus

3 months ago

2 points

405.

Edge – Generate structured evaluation criteria for any domain using a local LLM

github.com/EviAmarates

4 months ago

2 points

406.

Engine-Bench: Evaluating Coding Agents on Writing Game Engine Code

github.com/JoshuaPurtell

5 months ago

2 points

407.

Show HN: Simboba – Evals in under 5 mins

github.com/ntkris

6 months ago

2 points

408.

Show HN: Dokimos – LLM Evaluation Framework for Java

github.com/dokimos-dev

6 months ago

2 points

409.

Chess LLM Benchmark: Evaluating LLMs' ability to play chess

github.com/lightnesscaster

7 months ago

2 points

410.

Show HN: AI PM Evaluation Framework (Open Source)

aipmframework.com

8 months ago

2 points

411.

Codegen Scorer – evaluate the quality of code generated by LLMs

github.com/angular

9 months ago

2 points

412.

Physical_Atari: Platform for evaluating RL algorithms on a physical Atari

github.com/Keen-Technologies

9 months ago

2 points

413.

OpenBench: Provider-agnostic, open-source evaluation infrastructure for LLMs

github.com/groq

10 months ago

2 points

414.

Show HN: KARMA – An evaluation framework for Medical AI systems

10 months ago

2 points

415.

LLM Speedrunner: Eval for frontier models to reproduce scientific findings

github.com/facebookresearch

a year ago

2 points

416.

MAIR: A Benchmark for Evaluating Instructed Retrieval

github.com/sunnweiwei

a year ago

2 points

417.

Doyensec – Security Policy Evaluation Framework

github.com/gravitational

a year ago

2 points

418.

Evaluate Any Model from the HuggingFace Hub on the ImageNet on Free Colab GPUs

github.com/SauravMaheshkar

a year ago

sauravmaheshkar

2 points

419.

Lambda calculus - compiler, type inference, and evaluator in less than 100 LOC

gist.github.com

a year ago

2 points

420.

Show HN: I built an open-source benchmark that evaluates LLMs through gameplay

a year ago

2 points