LLM Speedrunner: Eval for frontier models to reproduce scientific findingsgithub.com/facebookresearch2 pointszerojamesa year ago