Fully Sharded Data Parallel: Faster AI Training with Fewer GPUsengineering.fb.com3 pointsTheGuyWhoCodes5 years ago