HK
Heykuki News
Top
New
Best
Ask
Show
Jobs
Toggle theme
Top
New
Best
Ask
Show
Jobs
Request
721.
▲
Swedish Construction FAQ: 503 bilingual Q&A dataset, CC BY 4.0
github.com/zaragoza-ab
discuss
2 months ago
DecDEPO
1 points
722.
▲
Show HN: Fastdedup – Rust dataset deduplication (2:55 vs. 7:55 688MB vs. 22GB)
wapplewhite4.github.io
discuss
4 months ago
wapplewhite4
1 points
723.
▲
Show HN: Talpa – Datasette-powered reading stats dashboards for Kobo and Kindle
github.com/gildo
discuss
4 months ago
fyskij
1 points
724.
▲
Show HN: Hyperstar – LiveView/Datastar for TypeScript and JSX
github.com/StreamUI
discuss
4 months ago
Jonovono
1 points
725.
▲
GABRIEL – turn messy qualitative corpora into analysis-ready datasets
github.com/openai
discuss
5 months ago
michaelsbradley
1 points
726.
▲
Show HN: Vietnam Elections (open, source-linked datasets and site)
bamboo-filing-cabinet.github.io
discuss
5 months ago
vietthan
1 points
727.
▲
The Guardian Headline Entailment Training Dataset
github.com/daoudclarke
discuss
14 years ago
daoudc
1 points
728.
▲
Fasttfidf: High-performance TF-IDF vectorization for large-scale text datasets
github.com/purijs
discuss
6 months ago
jspuri
1 points
729.
▲
Show HN: AI tool that walks citation graph and extracts data to create datasets
github.com/eamag
discuss
6 months ago
eamag
1 points
730.
▲
Training YOLO vision models on Kaggle datasets
github.com/mfranzon
discuss
8 months ago
walterbell
1 points
731.
▲
Show HN: Gaggle – A DuckDB extension for working with Kaggle datasets
discuss
8 months ago
habedi0
1 points
732.
▲
Show HN: I built a tool to sort a Northern Lights dataset for a CV model
picsort.coolapso.sh
discuss
8 months ago
coolapso
1 points
733.
▲
Show HN: Django PostgreSQL Anonymizer – prod → safe dev datasets (beta)
github.com/CuriousLearner
discuss
8 months ago
sanyam-khurana
1 points
734.
▲
A toolkit for improving the quality of your LeRobot datasets
github.com/RoboticsData
discuss
8 months ago
machinelearning
1 points
735.
▲
A new RAG algorithm to self-heal damaged datasets and query them on a graph
github.com/iblameandrew
discuss
9 months ago
scraper02
1 points
736.
▲
Show HN: Tensorpack a CLI tool for semantic discovery across datasets
discuss
9 months ago
AyodeleFikayomi
1 points
737.
▲
Procedural Reasoning Datasets
github.com/open-thought
discuss
a year ago
t55
1 points
738.
▲
Reasoning Gym – Procedural RL reasoning datasets
github.com/open-thought
discuss
a year ago
t55
1 points
739.
▲
multi_db: repo that uses Datastar and has a multi db setup, one for each user
github.com/asmorris
discuss
a year ago
thunderbong
1 points
740.
▲
Mochi Programming Language v0.6.0 – LINQ syntax for querying datasets
github.com/mochilang
discuss
a year ago
scapbi
1 points
741.
▲
Reasoning Gym: Procedural Dataset Generation for Reinforcement Learning
github.com/open-thought
discuss
a year ago
starzmustdie
1 points
742.
▲
Datasets Are All You Need (LLM Learns to Prompt from Data)
github.com/intellectronica
discuss
a year ago
intellectronica
1 points
743.
▲
A multi-view video behavior monitoring dataset of wild mammals in the Swiss Alps
github.com/eceo-epfl
discuss
a year ago
moatmoat
1 points
744.
▲
RKaggle: Bring Kaggle Datasets Straight into the R console
github.com/benyamindsmith
discuss
a year ago
SuperMint
1 points
745.
▲
Show HN: I built a Graph Datastore that faster, simpler and cheaper
github.com/jakobap
discuss
a year ago
jpoersc
1 points
746.
▲
Logic R1: Reproduce DeepSeek R1 Zero on 2K Logic Puzzle Dataset
github.com/Unakar
discuss
a year ago
limoce
1 points
747.
▲
Drawdata: Draw Datasets from Within Jupyter
github.com/koaning
discuss
a year ago
yamrzou
1 points
748.
▲
Facebook Uncommon Objects in 3D Dataset
github.com/facebookresearch
discuss
a year ago
taikon
1 points
749.
▲
LENS: A Leo Satellite Network Measurement Dataset
github.com/clarkzjw
discuss
2 years ago
teleforce
1 points
750.
▲
Transform and optimize datasets for fast AI model training
github.com/Lightning-AI
discuss
2 years ago
shcheklein
1 points
More