HK
Heykuki News
Top
New
Best
Ask
Show
Jobs
Toggle theme
Top
New
Best
Ask
Show
Jobs
Request
1.
▲
Show HN: CATArena – Evaluating LLM agents via dynamic enviroment interactions
github.com/AGI-Eval-Official
discuss
6 months ago
jinqueeny
3 points
2.
▲
Stop benchmarking LLMs. Make them fight
github.com/AGI-Eval-Official
discuss
6 months ago
jinqueeny
2 points