I previously posted a terminal tool that could generate fine-tuning datasets from real-world data using deep research. One of the most common requests was: “Can it work with local resources instead of only going online?”
Over the weekend, I built a separate version that does exactly that:
Point it to a local file (PDF, DOCX, JPG, TXT)
Describe the dataset you want
It extracts text → finds relevant parts via semantic search → applies your instructions through a generated schema → outputs a clean dataset.