MegaScale: Scaling Large Language Model Training to More Than 10k GPUs [pdf]usenix.org1 pointyankcrime2 years ago