AutoMegaKernel: Compile an LLM into one provably-correct CUDA megakernelgithub.com/RightNow-AI4 pointsOsamaJaber14 days ago