Search: github.com/jhud | Heykuki News

HK

Heykuki News

Top New Best Ask Show Jobs

Top New Best Ask Show Jobs

91.

LoCoMo AI Benchmark: 6.4% of answer key wrong, judge accepts 63% of fake answers

github.com/dial481

3 months ago

3 points

92.

Show HN: Using AI to judge a drinking game – SplitTheG.dev

a year ago

3 points

93.

Apply for the Judicial Innovation Fellowship

github.com/JIFGeorgetown

3 years ago

3 points

94.

Gavel is a project expo judging system

github.com/anishathalye

10 years ago

3 points

95.

Show HN: NyaayWatch – Observability layer for the Indian judiciary

a month ago

3 points

96.

Show HN: Signals – finding the most informative agent traces without LLM judges

3 months ago

3 points

97.

Show HN: Cognition-wheel – parallel LLM fusion with bias masking and judging

github.com/Hormold

a year ago

3 points

98.

Justice: Yet Another Online Judge

7 years ago

3 points

99.

Show HN: Grading Notes for LLM-as-Judge

github.com/shabie

2 years ago

2 points

100.

Show HN: pg_roast – A Postgres extension that harshly judges your database

github.com/samirketema

2 months ago

2 points

101.

Open-source LLM-as-judge eval suite with root cause analysis and failure mining

github.com/colingfly

3 months ago

2 points

102.

Show HN: Yet Another Online Judge Implementation

7 years ago

2 points

103.

Ask HN: Criteria for judging JavaScript project?

11 years ago

2 points

104.

Hey Jude as a vbScript

github.com/mockmyberet

13 years ago

2 points

105.

Codejudge: A lightweight online judge

github.com/sankha93

13 years ago

2 points

106.

Show HN: GEDD – A Systematic Evidence Driven LLM as a Judge Framework

github.com/aws-samples

9 days ago

2 points

107.

Show HN: CoJudge – open-source, offline judge for studying LC-style problems

github.com/cojudge

8 months ago

2 points

108.

Evaluating Large Language Models Using LLM-as-a-Judge

github.com/aws-samples

2 years ago

2 points

109.

Scruples: Corpus of ethical judgments extracted from Reddit

github.com/allenai

6 years ago

2 points

110.

JHU CSSE Covid-19 Data Repo Removes Information on Palestine

github.com/CSSEGISandData

6 years ago

2 points

111.

Novel Coronavirus (Covid-19) Cases, Provided by JHU CSSE

github.com/CSSEGISandData

6 years ago

2 points

112.

Covid-19: Novel Coronavirus (Covid-19) Cases, Provided by JHU CSSE

github.com/CSSEGISandData

6 years ago

DyslexicAtheist

2 points

113.

Coderunner – A judge for your programs,run and test your programs through Python

github.com/codeclassroom

7 years ago

2 points

114.

Show HN: A command line interface to UVA online judge (competitive programming)

github.com/scvalencia

10 years ago

2 points

115.

Show HN: Meaning-Based Judgment Simulation for LLM Interfaces

a year ago

1 points

116.

Show HN: Judgment Boundary – Stop as a First-Class Outcome for AI Systems

github.com/Nick-heo-eg

5 months ago

1 points

117.

Show HN: An AI coaching team in Claude Code that's forbidden from judging you

github.com/ibm777p2

10 days ago

1 points

118.

LLM Position Bias Benchmark: Swapped-Order Pairwise Judging

github.com/lechmazur

2 months ago

1 points

119.

Show HN: Claude-relais – A plan/build/judge loop mixing Claude with Cursor

github.com/clementrog

4 months ago

1 points

120.

Local Agent Bench: Test 11 small LLMs on tool-calling judgment, on CPU, no GPU

github.com/MikeVeerman

4 months ago

1 points