Claude Opus 4.5, and why evaluating new LLMs is increasingly difficultsimonw.substack.com5 pointshackthegibson27 months ago