A brief history of LLM Benchmarking and the devil in their detailslatent.space3 pointsmooreds3 years ago