Search: github.com/eval | Heykuki News

HK

Heykuki News

Top New Best Ask Show Jobs

Top New Best Ask Show Jobs

541.

RawBench: A minimal prompt evaluation framework

github.com/0xsomesh

a year ago

1 points

542.

Assayer: Python-RQ watchdog for ML model checkpoint monitoring and evaluation

github.com/amoudgl

a year ago

1 points

543.

Show HN: Digit-Class Prime Product Framework (Prime Factorization Evals for LMs)

github.com/arthurcolle

a year ago

1 points

544.

E2E LLM evals, with less focus on metrics and more focus on binary assertions

github.com/openchatai

a year ago

1 points

545.

Ask HN: What RAG evaluations do you care about?

a year ago

1 points

546.

NoLiMa: Long-Context Evaluation Beyond Literal Matching

github.com/adobe-research

a year ago

1 points

547.

Evaluating and Training Multi-Modal Large Language Models for Action Recognition

github.com/AdaptiveMotorControlLab

a year ago

1 points

548.

An Implementation of Eval() for Rust

github.com/evcxr

a year ago

1 points

549.

I built a Python pipeline to evaluate the Exosome Complex in AlphaFold &CombFold

github.com/christopheragnus

2 years ago

christopher8827

1 points

550.

Litmus: LLM Testing and Evaluation Tool for AI App Development on Google Cloud

github.com/google

2 years ago

1 points

551.

Llama Stack by Meta – Inference, Safety, Memory, Agentic System, Evaluation

github.com/meta-llama

2 years ago

1 points

552.

Unibench: Vision-Language Model Evaluation

github.com/facebookresearch

2 years ago

1 points

553.

LLM Evaluation Methods

github.com/alopatenko

2 years ago

1 points

554.

Show HN: Serializable infix expressions and a Python evaluator

github.com/shrir

2 years ago

1 points

555.

FreeEval: A Framework for Trustworthy and Efficient Evaluation of LLMs

github.com/WisdomShell

2 years ago

1 points

556.

Llama.cpp: Improve CPU prompt eval speed

github.com/ggerganov

2 years ago

1 points

557.

Evaluate LLMs in Real Time with Street Fighter III

github.com/OpenGenerativeAI

2 years ago

1 points

558.

Evaluating Claude 3 for Converting Screenshots to Code

2 years ago

1 points

559.

Show HN: Hiring when you don't know exactly how to evalute candidates

github.com/joelparkerhenderson

2 years ago

1 points

560.

Multi-bitrate JPEG compression perceptual evaluation dataset 2023

github.com/google-research

2 years ago

1 points

561.

Show HN: Lone Arena – Self-hosted LLM human evaluation, you be the judge

github.com/Contextualist

2 years ago

1 points

562.

IFEval: Evaluator for LLMs

github.com/Rohan2002

2 years ago

1 points

563.

Genealogos takes outputs from Nix evaluation tools and produces SBoM files

github.com/tweag

2 years ago

1 points

564.

Show HN: Open-source evaluations for web agents

github.com/reworkd

3 years ago

1 points

565.

PhaseLLM Eval: run batch LLM jobs and evals via visual front-end (MIT licensed)

github.com/wgryc

3 years ago

1 points

566.

Thudm/AgentBench: A Comprehensive Benchmark to Evaluate LLMs as Agents

github.com/THUDM

3 years ago

1 points

567.

AgentBench: A Comprehensive Benchmark to Evaluate LLMs as Agents

github.com/THUDM

3 years ago

1 points

568.

Evaluate Multiple LLMs Easily

github.com/ray-project

3 years ago

1 points

569.

Show HN: ChainForge, a visual tool for evaluating LLM responses

github.com/ianarawjo

3 years ago

1 points

570.

Lazy evaluation and infinite streams in C++

github.com/apresta

14 years ago

1 points