HK
Heykuki News
Top
New
Best
Ask
Show
Jobs
Toggle theme
Top
New
Best
Ask
Show
Jobs
Request
91.
▲
LoCoMo AI Benchmark: 6.4% of answer key wrong, judge accepts 63% of fake answers
github.com/dial481
3 comments
3 months ago
dial481
3 points
92.
▲
Show HN: Using AI to judge a drinking game – SplitTheG.dev
splittheg.dev
2 comments
a year ago
BitNibbleByte
3 points
93.
▲
Apply for the Judicial Innovation Fellowship
github.com/JIFGeorgetown
1 comment
3 years ago
epicfaace
3 points
94.
▲
Gavel is a project expo judging system
github.com/anishathalye
1 comment
10 years ago
mnem
3 points
95.
▲
Show HN: NyaayWatch – Observability layer for the Indian judiciary
nyaaywatch.in
discuss
a month ago
Rudraksh06
3 points
96.
▲
Show HN: Signals – finding the most informative agent traces without LLM judges
arxiv.org
discuss
3 months ago
sparacha
3 points
97.
▲
Show HN: Cognition-wheel – parallel LLM fusion with bias masking and judging
github.com/Hormold
discuss
a year ago
Hormold
3 points
98.
▲
Justice: Yet Another Online Judge
github.com
discuss
7 years ago
liumangchao
3 points
99.
▲
Show HN: Grading Notes for LLM-as-Judge
github.com/shabie
3 comments
2 years ago
shabie
2 points
100.
▲
Show HN: pg_roast – A Postgres extension that harshly judges your database
github.com/samirketema
1 comment
2 months ago
samirketema
2 points
101.
▲
Open-source LLM-as-judge eval suite with root cause analysis and failure mining
github.com/colingfly
1 comment
3 months ago
colinfly
2 points
102.
▲
Show HN: Yet Another Online Judge Implementation
github.com
1 comment
7 years ago
zsgsdesign
2 points
103.
▲
Ask HN: Criteria for judging JavaScript project?
1 comment
11 years ago
octref
2 points
104.
▲
Hey Jude as a vbScript
github.com/mockmyberet
discuss
13 years ago
tommybecker
2 points
105.
▲
Codejudge: A lightweight online judge
github.com/sankha93
discuss
13 years ago
sankha93
2 points
106.
▲
Show HN: GEDD – A Systematic Evidence Driven LLM as a Judge Framework
github.com/aws-samples
discuss
9 days ago
balasvce2026
2 points
107.
▲
Show HN: CoJudge – open-source, offline judge for studying LC-style problems
github.com/cojudge
discuss
8 months ago
ansliy
2 points
108.
▲
Evaluating Large Language Models Using LLM-as-a-Judge
github.com/aws-samples
discuss
2 years ago
mooreds
2 points
109.
▲
Scruples: Corpus of ethical judgments extracted from Reddit
github.com/allenai
discuss
6 years ago
nikochiko
2 points
110.
▲
JHU CSSE Covid-19 Data Repo Removes Information on Palestine
github.com/CSSEGISandData
discuss
6 years ago
jnmandal
2 points
111.
▲
Novel Coronavirus (Covid-19) Cases, Provided by JHU CSSE
github.com/CSSEGISandData
discuss
6 years ago
itbeho
2 points
112.
▲
Covid-19: Novel Coronavirus (Covid-19) Cases, Provided by JHU CSSE
github.com/CSSEGISandData
discuss
6 years ago
DyslexicAtheist
2 points
113.
▲
Coderunner – A judge for your programs,run and test your programs through Python
github.com/codeclassroom
discuss
7 years ago
bhupesh
2 points
114.
▲
Show HN: A command line interface to UVA online judge (competitive programming)
github.com/scvalencia
discuss
10 years ago
scvalencia
2 points
115.
▲
Show HN: Meaning-Based Judgment Simulation for LLM Interfaces
2 comments
a year ago
GENIXUS
1 points
116.
▲
Show HN: Judgment Boundary – Stop as a First-Class Outcome for AI Systems
github.com/Nick-heo-eg
1 comment
5 months ago
echoos
1 points
117.
▲
Show HN: An AI coaching team in Claude Code that's forbidden from judging you
github.com/ibm777p2
discuss
10 days ago
ibm777p2
1 points
118.
▲
LLM Position Bias Benchmark: Swapped-Order Pairwise Judging
github.com/lechmazur
discuss
2 months ago
zone411
1 points
119.
▲
Show HN: Claude-relais – A plan/build/judge loop mixing Claude with Cursor
github.com/clementrog
discuss
4 months ago
crog
1 points
120.
▲
Local Agent Bench: Test 11 small LLMs on tool-calling judgment, on CPU, no GPU
github.com/MikeVeerman
discuss
4 months ago
MikeVeerman
1 points
More