Constructing Transformers for Longer Sequences with Sparse Attention Methodsai.googleblog.com1 pointtheafh5 years ago