LoCoMo AI Benchmark: 6.4% of answer key wrong, judge accepts 63% of fake answers

Heykuki News

3 points

3 months ago

3 comments

Threaded

Loading comments...

LoCoMo AI Benchmark: 6.4% of answer key wrong, judge accepts 63% of fake answers | Heykuki News