Shade-Arena: Evaluating Sabotage and Monitoring in LLM Agents [pdf]assets.anthropic.com4 pointsJnBrymna year ago