Search: github.com/eval | Heykuki News

HK

Heykuki News

Top New Best Ask Show Jobs

Top New Best Ask Show Jobs

61.

Clojure protected REPL

github.com/stacksideflow

7 years ago

3 points

62.

Evaluate Selections in Sublime Text

github.com/jbrooksuk

13 years ago

2 points

63.

Estonia publishes its e-voting source code

github.com/vvk-ehk

13 years ago

2 points

64.

Evaluation of Various MLX Quantizations

github.com/deepsweet

a month ago

2 points

65.

Should we chaos test our agents?

github.com/Corbell-AI

a month ago

2 points

66.

Open-source LLM-as-judge eval suite with root cause analysis and failure mining

github.com/colingfly

3 months ago

2 points

67.

Evaluating LLMs with CommonGen-Lite

github.com/allenai

2 years ago

2 points

68.

Evals Skills for AI Agents

github.com/latitude-dev

2 months ago

2 points

69.

Show HN: Claude Code skills for building LLM evals

github.com/latitude-dev

2 months ago

2 points

70.

Show HN: LLM‑Simple‑Eval – Easily Benchmark LLMs for Your Use Case

github.com/grigio

10 months ago

2 points

71.

Evaluating Large Language Models Using LLM-as-a-Judge

github.com/aws-samples

2 years ago

2 points

72.

GPT-4-turbo-2024-04-09 "wins" simple evals benchmark

github.com/openai

2 years ago

2 points

73.

A survey on evaluation of large language models

github.com/MLGroupJLU

3 years ago

2 points

74.

OpenFF – Automated estimation of physical properties

github.com/openforcefield

5 years ago

2 points

75.

Show HN: IR_evaluation – Information retrieval evaluation metrics in pure Python

github.com/plurch

a year ago

1 points

76.

github.com/pulzeai-oss

a year ago

1 points

77.

BSD_Evals: Open-source LLM evaluation tool

github.com/brettdidonato

2 years ago

1 points

78.

AgentSafeLabs – Launched Open-source Security framework for AI agents

github.com/AgentSafeLabs

a month ago

1 points

79.

Show HN: EleutherAI / Lm-Evaluation-Harness

github.com/EleutherAI

a month ago

1 points

80.

Webgrid Eval: LLM vision + tool-use on Neuralink's cursor control task

github.com/ofou

4 months ago

1 points

81.

Network Evaluation Service

github.com/hendemic

a year ago

1 points

82.

OpenAI: Simple-Evals

github.com/openai

2 years ago

1 points

83.

ReactEval: Evaluating LLMs on front-end code generation

github.com/gitwitorg

2 years ago

1 points

84.

Language Model Evaluation Harness

github.com/EleutherAI

3 years ago

1 points

85.

Nextdoor's Cloud Security Posture Management (CSPM) Evaluation Matrix

github.com/Nextdoor

3 years ago

1 points

86.

Show HN: EvalGPT – Code interpreter and agent framework inspired by Google Borg

github.com/index-labs

3 years ago

1 points

87.

Trait-Eval – Rust

github.com/doctorn

6 years ago

1 points

88.

Show HN: Little tool to evaluate your cryptocurrency trades on Poloniex

github.com/enricobacis

9 years ago

1 points

89.

Show HN: Freeact – A Lightweight Library for Code-Action Based Agents

github.com/gradion-ai

a year ago

122 points

90.

Show HN: Ellipsis – Automated PR reviews and bug fixes

2 years ago

121 points