Benchmark that evaluates LLMs using 759 NYT Connections puzzlesgithub.com/lechmazur1 pointShrugLife6 months ago