HK
Heykuki News
Top
New
Best
Ask
Show
Jobs
Toggle theme
Top
New
Best
Ask
Show
Jobs
Request
151.
▲
The Museum of Modern Art Research Dataset
github.com/MuseumofModernArt
15 comments
11 years ago
danso
61 points
152.
▲
Chicago Crime Trends. Analyzing 3GB Dataset from Data.gov with SQL and Graphs
github.com/axibase
3 comments
9 years ago
rodionos
44 points
153.
▲
Dataset of Linus Torvalds' rants ranked by hate
github.com/corollari
17 comments
5 years ago
fctorial
42 points
154.
▲
ClickHouse Obfuscator – A tool for dataset anonymization
github.com/ClickHouse
3 comments
3 years ago
rrampage
39 points
155.
▲
DeepMind's machine-reading question/answer dataset
github.com/deepmind
3 comments
11 years ago
andrewtbham
37 points
156.
▲
Madlad-400: A Multilingual and Document-Level Large Audited Dataset
github.com/google-research
1 comment
3 years ago
the_bookmaker
37 points
157.
▲
A dataset of crimes committed in Buenos Aires
github.com/ramadis
4 comments
8 years ago
ramadis
34 points
158.
▲
Show HN: I used streaming to skip downloading my 45GB dataset
github.com/DagsHub
discuss
4 years ago
npRandom
31 points
159.
▲
Toxicity Dataset
github.com/surge-ai
32 comments
5 years ago
CarrieLab
25 points
160.
▲
Structured Etymology Dataset
github.com/droher
3 comments
a year ago
downboots
24 points
161.
▲
Washington Post publishes dataset of 52,000 criminal homicides
github.com/washingtonpost
2 comments
8 years ago
danso
24 points
162.
▲
I have trained StyleGAN2 from scratch with a dataset of female portraits
github.com/l4rz
20 comments
5 years ago
EvgeniyZh
20 points
163.
▲
VoxelCNN: Order-Aware Generative Modeling Using the 3D-Craft Dataset
github.com/facebookresearch
discuss
6 years ago
ingve
20 points
164.
▲
Show HN: I made this tool for navigating pandas datasets
github.com/man-group
discuss
6 years ago
leehcksource
20 points
165.
▲
Show HN: SemHash – Fast Semantic Text Deduplication for Cleaner Datasets
github.com/MinishLab
6 comments
a year ago
Pringled
19 points
166.
▲
Show HN: Version code, models, & datasets together in GitHub
6 comments
3 years ago
skadamat
19 points
167.
▲
NLP: A new datasets and metrics library from Hugging Face
github.com/huggingface
discuss
6 years ago
julien_c
19 points
168.
▲
Show HN: Dataset of Linus Torvalds' rants sorted by hate
github.com/corollari
4 comments
7 years ago
corollari
17 points
169.
▲
GitHub: Awesome-reasoning, a curated list of datasets for reasoning AIs
github.com/neurallambda
discuss
2 years ago
neurallambda
17 points
170.
▲
ICLR 2026 – Institutional Affiliations Dataset and Analysis
github.com/DmytroLopushanskyy
2 comments
a month ago
stared
15 points
171.
▲
Datasetq: jq for Datasets; Polars-powered Parquet/JSON/CSV query lang/cli
github.com/datasetq
2 comments
6 months ago
djb-at-durable
15 points
172.
▲
Easy way to load, create, version, query and visualize computer vision datasets
discuss
4 years ago
morpheusme
13 points
173.
▲
Show HN: Dataset of 125k Medium Blog Post Titles and Subtitles (With Categories)
github.com/turbo
discuss
7 years ago
minxomat
13 points
174.
▲
Show HN: Create datasets more simply and improve AI model with unstructured data
github.com/adansons
3 comments
4 years ago
KenichiHiguchi
12 points
175.
▲
Fast and scalable dataset preparation and curation tool from Nvidia
github.com/NVIDIA
discuss
2 years ago
shcheklein
12 points
176.
▲
Show HN: Dataset of Sarcastic HN Comments
github.com/traghav
6 comments
5 years ago
raghavtoshniwal
11 points
177.
▲
Dimensionality reduction in large data sets using Siamese Networks
github.com/beringresearch
discuss
7 years ago
pickleMeTimbers
11 points
178.
▲
Show HN: Download HuggingFace Models/Datasets easily and super fast
github.com/bodaay
2 comments
3 years ago
qqqbodaayqqq
10 points
179.
▲
Show HN: Training synthetic models on highly complex datasets
github.com/gretelai
2 comments
4 years ago
repeat_or
10 points
180.
▲
Show HN: React-like Declarative DSL for building synthetic LLM datasets
github.com/qforge-dev
discuss
8 months ago
arturwala
10 points
More