Efficient implementations of Dion and Muon optimizers for distributed MLgithub.com/microsoft2 pointssimonpurea year ago