30% drop in O1-preview accuracy when Putnam problems are slightly variatedopenreview.net560 pointsoptimalsolvera year ago