HK
Heykuki News
Top
New
Best
Ask
Show
Jobs
Toggle theme
Top
New
Best
Ask
Show
Jobs
Request
91.
▲
Composo open-sources its LLM-as-Judge technique (83.6% on RewardBench 2)
github.com/composo-ai
discuss
3 months ago
mlukewizard
5 points
92.
▲
Show HN: Open-source expense and budget tracker with SQL API for AI agents
github.com/kirill-markin
discuss
4 months ago
MarkinK
3 points
93.
▲
Italy's Budget
github.com/g0v-it
discuss
5 years ago
simonebrunozzi
3 points
94.
▲
Awesome-LLM-Judges
github.com/haizelabs
discuss
a year ago
leonardtang
2 points
95.
▲
LLM Judges
github.com/haizelabs
discuss
a year ago
leonardtang
2 points
96.
▲
UVa Online Judge Solutions Repo (Work in Progress)
github.com/jcbages
discuss
9 years ago
jcbages
2 points
97.
▲
Stack on a Budget – A collection of services with great free tiers
github.com/255kb
26 comments
10 years ago
rosstex
156 points
98.
▲
Stack on a Budget (Free Tier Driven Development FTDD)
github.com/255kb
discuss
8 days ago
gslin
3 points
99.
▲
Show HN: Lightweight LLM-as-a-Judge Tool
github.com/frequena
discuss
10 months ago
frequena
2 points
100.
▲
Collection of services with great free tiers
github.com/255kb
discuss
10 years ago
andruby
1 points
101.
▲
Keeping multimodal parsing free for all
discuss
a year ago
stonebraker
3 points
102.
▲
Show HN: OpenClaw Arena – Benchmark models on real tasks, rank by perf and cost
app.uniclaw.ai
discuss
3 months ago
skysniper
2 points
103.
▲
SpaceXplore.It
discuss
12 years ago
tuned
2 points
104.
▲
The Divine Judgement: Enforce TypeScript Types at Runtime
github.com/Divine-Software
discuss
3 years ago
LeviticusMB
1 points
105.
▲
Performance Budgets (Budget.json)
github.com/GoogleChrome
discuss
4 years ago
LAC-Tech
1 points
106.
▲
Show HN: Guitos, a free open-source budgeting app
guitos.app
12 comments
3 years ago
rare-magma
53 points
107.
▲
Ranking 1k ShowHN posts by estimated merit using an LLM judge and TrueSkill
github.com/kouhxp
2 comments
a month ago
mrkn1
7 points
108.
▲
Show HN: Tokencap – Token budget enforcement across your AI agents
github.com/pykul
discuss
3 months ago
pykul
7 points
109.
▲
Ratchets: a Rust tool that polices style violations with a flexible budget
github.com/imbue-ai
discuss
7 days ago
nvader
5 points
110.
▲
Show HN: TUI personal monthly budget planner
github.com/eliasdorneles
discuss
a year ago
eliasdorneles
4 points
111.
▲
Type-challenges: Collection of TypeScript type challenges with online judge
github.com/type-challenges
discuss
3 years ago
olalonde
4 points
112.
▲
Raspberry Pi-Based Personal Productivity Nudger
github.com/edmarkovich
discuss
6 years ago
xyzelement
4 points
113.
▲
LoCoMo AI Benchmark: 6.4% of answer key wrong, judge accepts 63% of fake answers
github.com/dial481
3 comments
3 months ago
dial481
3 points
114.
▲
Show HN: Using AI to judge a drinking game – SplitTheG.dev
splittheg.dev
2 comments
a year ago
BitNibbleByte
3 points
115.
▲
Show HN: Signals – finding the most informative agent traces without LLM judges
arxiv.org
discuss
3 months ago
sparacha
3 points
116.
▲
Show HN: Claude Gym – a tiny CLI that nudges you to move while Claude Code runs
discuss
4 months ago
mosesxu
3 points
117.
▲
Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget
github.com/SonyResearch
discuss
a year ago
tylergem
3 points
118.
▲
Justice: Yet Another Online Judge
github.com
discuss
7 years ago
liumangchao
3 points
119.
▲
Show HN: Grading Notes for LLM-as-Judge
github.com/shabie
3 comments
2 years ago
shabie
2 points
120.
▲
MartinLoop – budget caps and audit trails for AI coding agents
github.com/Keesan12
1 comment
a month ago
martinloop
2 points
More