Study identifies weaknesses in how AI systems are evaluatedoii.ox.ac.uk416 pointspseudolus7 months agoPaper: https://openreview.net/pdf?id=mdA5lVvNcURelated: https://www.theregister.com/2025/11/07/measuring_ai_models_h...