HK
Heykuki News
Top
New
Best
Ask
Show
Jobs
Toggle theme
Top
New
Best
Ask
Show
Jobs
Request
1.
▲
Show HN: O3 beats Sonnet 4 at coding (in our codebase, wrt our preferences)
discuss
a year ago
kmckiern
2 points
2.
▲
Show HN: Mandoline – Custom LLM Evaluations for Real-World Use Cases
mandoline.ai
discuss
2 years ago
kmckiern
2 points
3.
▲
Refusals (LLM Leaderboard)
mandoline.ai
discuss
2 years ago
kmckiern
2 points
4.
▲
Comparing Refusal Behavior Across Top Language Models
mandoline.ai
discuss
2 years ago
kmckiern
2 points