E2E LLM evals, with less focus on metrics and more focus on binary assertionsgithub.com/openchatai1 pointgharbata year ago