The Pile: An 800GB Dataset of Diverse Text for Language Modeling [pdf]pile.eleuther.ai1 pointnixtaken5 years ago