The Pile: An 800GB Dataset of Diverse Text for Language Modelingpile.eleuther.ai223 pointsleogao5 years ago