Cramming the training of a (BERT-type) language model into limited computegithub.com/JonasGeiping3 pointshomarp3 years ago