Towards a Realistic Long-Term Benchmark for Open-Web Research Agentsfuturesearch.ai1 pointtheptip2 years ago