HK
Heykuki News
Top
New
Best
Ask
Show
Jobs
Toggle theme
Top
New
Best
Ask
Show
Jobs
Request
181.
▲
Texthero – Python module to analyze any text dataset in seconds
github.com/jbesomi
6 comments
6 years ago
BertAndErnie
9 points
182.
▲
Kangas: Explore Multimedia Datasets at Scale
github.com/comet-ml
2 comments
4 years ago
dmoura
9 points
183.
▲
Nvidia open sources the synthetic data framework used to build Nemotron datasets
1 comment
7 months ago
alexwatson405
8 points
184.
▲
Show HN: Using DSPy to enrich a dataset of the Nobel laureate network
blog.kuzudb.com
discuss
a year ago
laminarflow027
8 points
185.
▲
Open Thoughts: Curating the best reasoning datasets
github.com/open-thoughts
discuss
a year ago
madiator
8 points
186.
▲
Show HN: Automate Variable Selection for Research on Big Datasets (Open-Source)
github.com/MalikHarrisAhm
discuss
2 years ago
mha23
8 points
187.
▲
Show HN: GitHub Typo Corpus: Largest Dataset of Misspellings and Grammar Errors
github.com/mhagiwara
discuss
7 years ago
mhagiwara
8 points
188.
▲
Show HN: Open-source LLM and dataset for sports forecasting (Pro Golf)
huggingface.co
discuss
4 months ago
bturtel
7 points
189.
▲
Show HN: Bridge-ds – Dataset handling for any modality a la Pandas
github.com/guybuk
discuss
2 years ago
guyuz
7 points
190.
▲
Show HN: Interactively explore your Hugging Face dataset with one line of code
huggingface.co
discuss
3 years ago
sps44
7 points
191.
▲
Our classifier outperforms CatBoost, XGBoost, LightGBM on 5 benchmark datasets
github.com/LinearBoost
5 comments
2 years ago
hamid9
6 points
192.
▲
Show HN: Terminal-Wrench, a dataset of 331 realistic hackable environments
github.com/few-sh
2 comments
2 months ago
neversupervised
6 points
193.
▲
DatasetGPT – an open-source command line tool for generating datasets with LLMs
github.com/radi-cho
1 comment
3 years ago
radicho123
6 points
194.
▲
Show HN: FiftyOne – Explore, Analyze and Curate Visual Datasets
github.com/voxel51
1 comment
6 years ago
benjaminpkane
6 points
195.
▲
Show HN: Open Covid-19 Dataset
github.com/open-covid-19
1 comment
6 years ago
omtinez
6 points
196.
▲
Show HN: Xray: N-D labeled arrays and datasets in Python
github.com/xray
discuss
12 years ago
shoyer
6 points
197.
▲
Show HN: Generate Fine-tunning dataset using deep research in terminal
github.com/Datalore-ai
discuss
a year ago
FineTuner42
6 points
198.
▲
Show HN: SemHash – Fast Semantic Text Deduplication for Cleaner Datasets
github.com/MinishLab
discuss
a year ago
stephantul
6 points
199.
▲
Show HN: Interactively explore unstructured datasets from your dataframe
github.com/Renumics
discuss
3 years ago
sps44
6 points
200.
▲
Kangas: Pandas for Multimedia Datasets
github.com/comet-ml
discuss
3 years ago
synergy20
6 points
201.
▲
The fastest command-line tools for querying large JSON datasets
github.com/dcmoura
discuss
4 years ago
zX41ZdbW
6 points
202.
▲
Video Classification Starter Code for Working with the YouTube-8M Dataset
github.com/google
discuss
10 years ago
tylerwhipple
6 points
203.
▲
Select2: jQuery select boxes with search, remote data sets, infinite scrolling
ivaynberg.github.com
1 comment
14 years ago
soulclap
5 points
204.
▲
Resampling Unbalanced Datasets
github.com/fmfn
discuss
12 years ago
hrb1979
5 points
205.
▲
Curated list of language modeling researches for code, plus related datasets
github.com/codefuse-ai
discuss
a year ago
Bluestein
5 points
206.
▲
Show HN: Byte-Pair Encoding tokenizer for training LLMs on large datasets
github.com/jmaczan
discuss
2 years ago
yu3zhou4
5 points
207.
▲
DataDM – Search and analyze datasets with LLMs
github.com/approximatelabs
discuss
3 years ago
cle
5 points
208.
▲
DataDM: Open-source local-LLM code-interpreter with dataset search
github.com/approximatelabs
discuss
3 years ago
bluecoconut
5 points
209.
▲
Show HN: Multiobjective Large-Scale Fashion Dataset with Distributional Shifts
github.com/st-tech
discuss
5 years ago
nanikano
5 points
210.
▲
Show HN: H5records – simple large dataset for pytorch training
github.com/theblackcat102
discuss
5 years ago
polymorph1sm
5 points
More