LLMs predict my coffee: Why not benchmark with physical experiments?dynomight.substack.com1 pointcrescit_eundo3 months ago