HK
Heykuki News
Top
New
Best
Ask
Show
Jobs
Toggle theme
Top
New
Best
Ask
Show
Jobs
Request
181.
▲
GitHub: Awesome-reasoning, a curated list of datasets for reasoning AIs
github.com/neurallambda
discuss
2 years ago
neurallambda
17 points
182.
▲
ICLR 2026 – Institutional Affiliations Dataset and Analysis
github.com/DmytroLopushanskyy
2 comments
a month ago
stared
15 points
183.
▲
Easy way to load, create, version, query and visualize computer vision datasets
discuss
4 years ago
morpheusme
13 points
184.
▲
Show HN: Dataset of 125k Medium Blog Post Titles and Subtitles (With Categories)
github.com/turbo
discuss
7 years ago
minxomat
13 points
185.
▲
Show HN: Create datasets more simply and improve AI model with unstructured data
github.com/adansons
3 comments
4 years ago
KenichiHiguchi
12 points
186.
▲
Fast and scalable dataset preparation and curation tool from Nvidia
github.com/NVIDIA
discuss
2 years ago
shcheklein
12 points
187.
▲
Show HN: Dataset of Sarcastic HN Comments
github.com/traghav
6 comments
5 years ago
raghavtoshniwal
11 points
188.
▲
Show HN: Download HuggingFace Models/Datasets easily and super fast
github.com/bodaay
2 comments
3 years ago
qqqbodaayqqq
10 points
189.
▲
Show HN: Training synthetic models on highly complex datasets
github.com/gretelai
2 comments
4 years ago
repeat_or
10 points
190.
▲
Show HN: React-like Declarative DSL for building synthetic LLM datasets
github.com/qforge-dev
discuss
8 months ago
arturwala
10 points
191.
▲
Texthero – Python module to analyze any text dataset in seconds
github.com/jbesomi
6 comments
6 years ago
BertAndErnie
9 points
192.
▲
Kangas: Explore Multimedia Datasets at Scale
github.com/comet-ml
2 comments
4 years ago
dmoura
9 points
193.
▲
Nvidia open sources the synthetic data framework used to build Nemotron datasets
1 comment
7 months ago
alexwatson405
8 points
194.
▲
Show HN: Using DSPy to enrich a dataset of the Nobel laureate network
blog.kuzudb.com
discuss
a year ago
laminarflow027
8 points
195.
▲
Open Thoughts: Curating the best reasoning datasets
github.com/open-thoughts
discuss
a year ago
madiator
8 points
196.
▲
Show HN: Automate Variable Selection for Research on Big Datasets (Open-Source)
github.com/MalikHarrisAhm
discuss
2 years ago
mha23
8 points
197.
▲
Show HN: GitHub Typo Corpus: Largest Dataset of Misspellings and Grammar Errors
github.com/mhagiwara
discuss
7 years ago
mhagiwara
8 points
198.
▲
Show HN: Open-source LLM and dataset for sports forecasting (Pro Golf)
huggingface.co
discuss
4 months ago
bturtel
7 points
199.
▲
Show HN: Bridge-ds – Dataset handling for any modality a la Pandas
github.com/guybuk
discuss
2 years ago
guyuz
7 points
200.
▲
Show HN: Interactively explore your Hugging Face dataset with one line of code
huggingface.co
discuss
3 years ago
sps44
7 points
201.
▲
Our classifier outperforms CatBoost, XGBoost, LightGBM on 5 benchmark datasets
github.com/LinearBoost
5 comments
2 years ago
hamid9
6 points
202.
▲
Show HN: Terminal-Wrench, a dataset of 331 realistic hackable environments
github.com/few-sh
2 comments
2 months ago
neversupervised
6 points
203.
▲
Show HN: FiftyOne – Explore, Analyze and Curate Visual Datasets
github.com/voxel51
1 comment
6 years ago
benjaminpkane
6 points
204.
▲
Show HN: Open Covid-19 Dataset
github.com/open-covid-19
1 comment
6 years ago
omtinez
6 points
205.
▲
Show HN: Xray: N-D labeled arrays and datasets in Python
github.com/xray
discuss
12 years ago
shoyer
6 points
206.
▲
Show HN: Generate Fine-tunning dataset using deep research in terminal
github.com/Datalore-ai
discuss
a year ago
FineTuner42
6 points
207.
▲
Show HN: SemHash – Fast Semantic Text Deduplication for Cleaner Datasets
github.com/MinishLab
discuss
a year ago
stephantul
6 points
208.
▲
Show HN: Interactively explore unstructured datasets from your dataframe
github.com/Renumics
discuss
3 years ago
sps44
6 points
209.
▲
Kangas: Pandas for Multimedia Datasets
github.com/comet-ml
discuss
3 years ago
synergy20
6 points
210.
▲
The fastest command-line tools for querying large JSON datasets
github.com/dcmoura
discuss
4 years ago
zX41ZdbW
6 points
More