Search: github.com/eval | Heykuki News

HK

Heykuki News

Top New Best Ask Show Jobs

Top New Best Ask Show Jobs

421.

Lambda calculus - compiler, type inference, and evaluator in less than 100 LOC

gist.github.com

a year ago

2 points

422.

Show HN: I built an open-source benchmark that evaluates LLMs through gameplay

a year ago

2 points

423.

Show HN: GenderBench – Evaluation suite for gender biases in LLMs

genderbench.readthedocs.io

a year ago

2 points

424.

SIMD library for evaluating elementary functions, vectorized libm and DFT

github.com/shibatch

2 years ago

2 points

425.

Show HN: Mandoline – Custom LLM Evaluations for Real-World Use Cases

2 years ago

2 points

426.

UpTrain is an open-source unified platform to evaluate and improve Gen AI apps

github.com/uptrain-ai

2 years ago

2 points

427.

Optimal Evaluation in 1 Minute (or 10 Minutes) (or 10 Years)

gist.github.com

2 years ago

2 points

428.

Evaluating LLMs locally, on a laptop, with Llama 3 and Ollama

github.com/rasbt

2 years ago

2 points

429.

Show HN: Paramount – OSS package for *Human* Evals of AI support

github.com/ask-fini

2 years ago

2 points

430.

SDMetrics: Library for evaluating synthetic data quality

github.com/sdv-dev

2 years ago

2 points

431.

Promptfoo – Testing and Evaluation for LLMs

github.com/promptfoo

3 years ago

2 points

432.

Google DeepMind's research on uncertain ground truth in AI eval

github.com/google-deepmind

3 years ago

2 points

433.

Show HN: Reference-free evaluation of LLM-powered chatbots

github.com/parea-ai

3 years ago

2 points

434.

Ragas – Framework for RAG Evaluation

github.com/explodinggradients

3 years ago

2 points

435.

AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models

github.com/ruixiangcui

3 years ago

2 points

436.

RAGElo: Toolkit for evaluating RAG agents using tournament-style Elo ranking

github.com/zetaalphavector

3 years ago

2 points

437.

Starwhale: A new MLOps platform for Model Evaluation

github.com/star-whale

3 years ago

2 points

438.

ChainForge now supports chat evaluation

github.com/ianarawjo

3 years ago

2 points

439.

Show HN: CLI for testing and evaluating LLM prompts and outputs

github.com/promptfoo

3 years ago

2 points

440.

OSS for training, serving, and evaluating LLM based ChatBots

github.com/lm-sys

3 years ago

2 points

441.

Show HN: XV - Expression Evaluator for C

github.com/tidwall

3 years ago

2 points

442.

Croner: Trigger functions or evaluate cron expressions in JavaScript or TS

github.com/Hexagon

3 years ago

2 points

443.

Haskell library for evaluating whether chess moves are allowed

github.com/ArnoVanLumig

3 years ago

2 points

444.

Show HN: Brace Lang – parse brace groups and evaluate them however you want

github.com/xaedes

4 years ago

2 points

445.

Show HN: Convert VHDL to Verilog using GHDL (+ first evaluation)

github.com/stnolting

4 years ago

youre_the_voice

2 points

446.

SIMD Library for Evaluating Elementary Functions, Vectorized Libm and DFT

github.com/shibatch

4 years ago

2 points

447.

PicoMath: Fast math evaluation library (C++ header-only)

github.com/Nitrillo

4 years ago

2 points

448.

Parse and evaluate MS Excel formula in JavaScript

github.com/LesterLyu

4 years ago

2 points

449.

Show HN: ANECompat, evaluate CoreML model compatibility with Apple Neural Engine

github.com/fredyshox

4 years ago

2 points

450.

Paper Walkthrough: Is Automated Topic Model Evaluation Broken

github.com/acatovic

4 years ago

2 points