Search: github.com/eval | Heykuki News

HK

Heykuki News

Top New Best Ask Show Jobs

Top New Best Ask Show Jobs

511.

GEDD – Grounded Eval-Driven Development for AI Agents

github.com/aws-samples

13 days ago

1 points

512.

Show HN: VQAScore – open eval metric/reward model, now for text-to-video

github.com/linzhiqiu

15 days ago

1 points

513.

LLM INQUISITOR: Evaluating how AI models handle long, realistic tasks

github.com/AssimilatedHuman

a month ago

1 points

514.

Show HN: TweakIdea – 14-dimension startup idea evaluation in Claude Code

github.com/eph5xx

2 months ago

1 points

515.

Show HN: Evaluate Python functions at their singularities

github.com/FWDhr

2 months ago

calculusmachine

1 points

516.

Show HN: 2500 vision benchmarks / evals for Vision Language Models

github.com/Overshoot-ai

2 months ago

zakariaelhjouji

1 points

517.

Show HN: An agent skill for eval-driven development of LLM-powered app

github.com/yiouli

3 months ago

1 points

518.

ReqIf OPA SARIF – CI/CD semantically evaluated policy gates

github.com/PromptExecution

3 months ago

elasticventures

1 points

519.

Show HN: Vibe Coding Review Checklist – Evaluate AI-Generated Code Quality

github.com/aiqualitylab

4 months ago

1 points

520.

Show HN: Orangensaft – A mini Python-like language with LLM eval in lang runtime

github.com/jargnar

4 months ago

1 points

521.

Show HN: Praetorian Guard – Free AI tool to self-evaluate your CV (educational)

github.com/simonesan-afk

4 months ago

1 points

522.

MiRAGE: Open-source framework for multimodal RAG evaluation

4 months ago

1 points

523.

The Vocabulary Priming Confound in LLM Evaluation [pdf]

github.com/Palmerschallon

4 months ago

1 points

524.

Open source agents to evaluate, debug, and optimize your prompts

github.com/comet-ml

5 months ago

1 points

525.

Simboba: Evals for your AI product in under 5 mins

github.com/ntkris

6 months ago

1 points

526.

Live-trade-bench: Live evaluation of trading agents

github.com/ulab-uiuc

6 months ago

1 points

527.

Show HN: Dokimos – LLM evaluation framework for Java

github.com/dokimos-dev

6 months ago

1 points

528.

Benchmark that evaluates LLMs using 759 NYT Connections puzzles

github.com/lechmazur

6 months ago

1 points

529.

Show HN: smallevals – Local LLM Evaluation Framework with Tiny 0.6B Models

github.com/mburaksayici

7 months ago

1 points

530.

Open source LLM prompt eval and optimization CLI

github.com/davismartens

7 months ago

1 points

531.

Show HN: StructEval - a structured output evaluation and comparison tool

github.com/jhiker

7 months ago

1 points

532.

Rogue – The AI Agent Evaluator

github.com/qualifire-dev

8 months ago

1 points

533.

Show HN: Local RAG Eval Harness – reproducible benchmarksfor retrieval pipelines

8 months ago

myroslavmokhamm

1 points

534.

TinyExpr: Parser, compiler, and evaluation engine for math expressions

github.com/codeplea

8 months ago

1 points

535.

Benchmark code for evaluating different ASR packages and APIs

github.com/huggingface

9 months ago

1 points

536.

Show HN: PromptDev – Prompt eval and testing for AI agents across providers

github.com/artefactop

10 months ago

1 points

537.

numexpr: fast numerical array expression evaluator for Python

github.com/pydata

10 months ago

1 points

538.

Quality and Safety Evaluations for AI Agents on Azure

github.com/aymenfurter

10 months ago

1 points

539.

Show HN: Hypersigil – Prompt management UI – test, evaluate, deploy

github.com/hypersigilhq

a year ago

1 points

540.

Safe-MCP: Security Analysis Framework for Evaluation of Model Context Protocol

github.com/fkautz

a year ago

1 points