YaFSDP: a sharded data parallelism framework, faster for pre-training LLMsgithub.com/yandex135 pointswiradikusuma2 years ago