HK
Heykuki News
Top
New
Best
Ask
Show
Jobs
Toggle theme
Top
New
Best
Ask
Show
Jobs
Request
1.
▲
The Pile is a 825 GiB diverse, open-source language modelling data set (2020)
pile.eleuther.ai
234 comments
2 years ago
bilsbie
332 points
2.
▲
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
pile.eleuther.ai
60 comments
5 years ago
leogao
223 points
3.
▲
The Pile
pile.eleuther.ai
discuss
3 years ago
tosh
1 points
4.
▲
The Pile
pile.eleuther.ai
discuss
3 years ago
tosh
1 points
5.
▲
The Pile: An 800GB Dataset of Diverse Text for Language Modeling [pdf]
pile.eleuther.ai
discuss
5 years ago
nixtaken
1 points