LLM INQUISITOR: Evaluating how AI models handle long, realistic tasksgithub.com/AssimilatedHuman1 pointballista2026a month ago