TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parametersarxiv.org174 pointsfamouswaffles2 years ago