Claude Opus 4.5, and why evaluating new LLMs is increasingly difficultsimonwillison.net6 pointsjonesn117 months ago