Show HN: Benchmarking LLM Agents on Consequential Real World Tasksthe-agent-company.com3 pointsliboxuanhka year agoA benchmark that you could run locally to test out LLM & AI agents' abilities to do real-world tasks