Search: github.com/lechmazur | Heykuki News

HK

Heykuki News

Top New Best Ask Show Jobs

Top New Best Ask Show Jobs

1.

Benchmarking LLM social skills with an elimination game

github.com/lechmazur

a year ago

colonCapitalDee

194 points

2.

LLM Hallucination Benchmark: R1, o1, o3-mini, Gemini 2.0 Flash Think Exp 01-21

github.com/lechmazur

a year ago

17 points

3.

Show HN: LLM Debate Benchmark

github.com/lechmazur

3 months ago

9 points

4.

LLM Persuasion Benchmark: Multi-Turn Persuasion Between Models

github.com/lechmazur

3 months ago

9 points

5.

Show HN: Bazaar – a new LLM benchmark for economic reasoning under uncertainty

github.com/lechmazur

a year ago

8 points

6.

Show HN: LLM Creative Story‑Writing Benchmark V3

github.com/lechmazur

9 months ago

8 points

7.

Show HN: LLM Divergent Thinking Creativity Benchmark

github.com/lechmazur

a year ago

8 points

8.

Multi-Agent Step Race Benchmark: LLM Collaboration and Deception Under Pressure

github.com/lechmazur

a year ago

7 points

9.

Show HN: LLM Deceptiveness and Gullibility Benchmark

github.com/lechmazur

2 years ago

7 points

10.

Show HN: Mapping LLM Style and Range in Flash Fiction

github.com/lechmazur

10 months ago

7 points

11.

Emergent Price-Fixing by LLM Auction Agents

github.com/lechmazur

a year ago

7 points

12.

Public Goods Game Benchmark: Contribute and Punish, a Multi-Agent Benchmark

github.com/lechmazur

a year ago

7 points

13.

Show HN: Buyout Game Benchmark: Multi-Agent Bargaining, Transfers, and Takeovers

github.com/lechmazur

3 months ago

6 points

14.

Show HN: LLM Round‑Trip Translation Benchmark

github.com/lechmazur

9 months ago

6 points

15.

Pact: Head-to-head negotiation benchmark for LLMs

github.com/lechmazur

10 months ago

6 points

16.

Show HN: LLM Thematic Generalization Benchmark

github.com/lechmazur

a year ago

6 points

17.

LLM Confabulation (Hallucination) Leaderboard

github.com/lechmazur

2 years ago

6 points

18.

Elimination Game: Multi-Agent LLM Social Reasoning, Strategy, and Deception

github.com/lechmazur

a year ago

5 points

19.

Show HN: LLM Creative Story-Writing Benchmark

github.com/lechmazur

a year ago

5 points

20.

Show HN: LLM Sycophancy Benchmark: Opposite-Narrator Contradictions

github.com/lechmazur

3 months ago

3 points

21.

Elimination Game Benchmark: Social Reasoning, Strategy, and Deception in LLMs

github.com/lechmazur

a year ago

2 points

22.

Step-Game: Assessing LLM Collaboration and Deception Under Pressure

github.com/lechmazur

a year ago

2 points

23.

Accurately calculating the number of legal chess positions

github.com/lechmazur

5 years ago

2 points

24.

LLM Position Bias Benchmark: Swapped-Order Pairwise Judging

github.com/lechmazur

2 months ago

1 points

25.

Benchmark that evaluates LLMs using 759 NYT Connections puzzles

github.com/lechmazur

6 months ago

1 points

26.

NYT Connections LLM Benchmark

github.com/lechmazur

6 months ago

1 points