Evaluating frontier AI R&D capabilities of LLM agents against human expertsmetr.org1 pointtedsanders2 years ago