Show HN: CATArena – Evaluating LLM agents via dynamic enviroment interactionsgithub.com/AGI-Eval-Official3 pointsjinqueeny6 months ago