HK
Heykuki News
Top
New
Best
Ask
Show
Jobs
Toggle theme
Top
New
Best
Ask
Show
Jobs
Request
1.
▲
Benchmarking LLM social skills with an elimination game
github.com/lechmazur
60 comments
a year ago
colonCapitalDee
194 points
2.
▲
LLM Hallucination Benchmark: R1, o1, o3-mini, Gemini 2.0 Flash Think Exp 01-21
github.com/lechmazur
3 comments
a year ago
zone411
17 points
3.
▲
Show HN: LLM Debate Benchmark
github.com/lechmazur
3 comments
3 months ago
zone411
9 points
4.
▲
LLM Persuasion Benchmark: Multi-Turn Persuasion Between Models
github.com/lechmazur
discuss
3 months ago
zone411
9 points
5.
▲
Show HN: Bazaar – a new LLM benchmark for economic reasoning under uncertainty
github.com/lechmazur
1 comment
a year ago
zone411
8 points
6.
▲
Show HN: LLM Creative Story‑Writing Benchmark V3
github.com/lechmazur
discuss
9 months ago
zone411
8 points
7.
▲
Show HN: LLM Divergent Thinking Creativity Benchmark
github.com/lechmazur
discuss
a year ago
zone411
8 points
8.
▲
Multi-Agent Step Race Benchmark: LLM Collaboration and Deception Under Pressure
github.com/lechmazur
1 comment
a year ago
zone411
7 points
9.
▲
Show HN: LLM Deceptiveness and Gullibility Benchmark
github.com/lechmazur
1 comment
2 years ago
zone411
7 points
10.
▲
Show HN: Mapping LLM Style and Range in Flash Fiction
github.com/lechmazur
discuss
10 months ago
zone411
7 points
11.
▲
Emergent Price-Fixing by LLM Auction Agents
github.com/lechmazur
discuss
a year ago
zone411
7 points
12.
▲
Public Goods Game Benchmark: Contribute and Punish, a Multi-Agent Benchmark
github.com/lechmazur
discuss
a year ago
zone411
7 points
13.
▲
Show HN: Buyout Game Benchmark: Multi-Agent Bargaining, Transfers, and Takeovers
github.com/lechmazur
discuss
3 months ago
zone411
6 points
14.
▲
Show HN: LLM Round‑Trip Translation Benchmark
github.com/lechmazur
discuss
9 months ago
zone411
6 points
15.
▲
Pact: Head-to-head negotiation benchmark for LLMs
github.com/lechmazur
discuss
10 months ago
zone411
6 points
16.
▲
Show HN: LLM Thematic Generalization Benchmark
github.com/lechmazur
discuss
a year ago
zone411
6 points
17.
▲
LLM Confabulation (Hallucination) Leaderboard
github.com/lechmazur
discuss
2 years ago
zone411
6 points
18.
▲
Elimination Game: Multi-Agent LLM Social Reasoning, Strategy, and Deception
github.com/lechmazur
discuss
a year ago
zone411
5 points
19.
▲
Show HN: LLM Creative Story-Writing Benchmark
github.com/lechmazur
discuss
a year ago
zone411
5 points
20.
▲
Show HN: LLM Sycophancy Benchmark: Opposite-Narrator Contradictions
github.com/lechmazur
discuss
3 months ago
zone411
3 points
21.
▲
Elimination Game Benchmark: Social Reasoning, Strategy, and Deception in LLMs
github.com/lechmazur
discuss
a year ago
amichail
2 points
22.
▲
Step-Game: Assessing LLM Collaboration and Deception Under Pressure
github.com/lechmazur
discuss
a year ago
amichail
2 points
23.
▲
Accurately calculating the number of legal chess positions
github.com/lechmazur
discuss
5 years ago
slyall
2 points
24.
▲
LLM Position Bias Benchmark: Swapped-Order Pairwise Judging
github.com/lechmazur
discuss
2 months ago
zone411
1 points
25.
▲
Benchmark that evaluates LLMs using 759 NYT Connections puzzles
github.com/lechmazur
discuss
6 months ago
ShrugLife
1 points
26.
▲
NYT Connections LLM Benchmark
github.com/lechmazur
discuss
6 months ago
cainxinth
1 points