Shallow Feed-Forward Neural Networks as Alternative to Attention in Transformershuggingface.co11 pointspanabee3 years ago