Show HN: Local OCR and Image Analysis for LLMs via Apple Vision (macOS)

1 point

3 days ago

I kept paying tokens and uploading documents to cloud APIs just to pull text out of screenshots and scanned PDFs, so I wrapped macOS's built-in Vision framework as an MCP server. Any MCP client (Claude Desktop/Code, Cursor) can call it to OCR images and PDFs, read QR/barcodes, detect faces, find document corners, and classify images — entirely on-device.

Two things made it worth packaging: nothing leaves the Mac (no API keys, no network calls), and sending extracted text instead of raw page images cut tokens ~97% on the documents I tested — Apple Vision is also often more accurate than a vision model on dense text. OCR returns reading-order paragraphs with bounding boxes and confidence, so the model can rebuild Markdown/HTML.

It's a small native Swift helper behind a Node MCP server. Limitations: macOS-only, and quality is whatever Apple Vision gives you.

Install: npx -y macos-vision-mcp

Happy to answer questions about the structured output or how the token measurement was done.

No comments

No comments