I built Moments to get this game idea out of my head, finally. My original goal was to run on-device models specifically in mobile browsers, but running local vision models directly in phone browsers is still very much too early, so I focused on desktop.
How it works:
- You upload a photo.
- A local vision model running entirely in your browser captions it and picks a prominent object from the image.
- You guess the word just like Wordle.
It uses a very tiny model so it is not very smart https://huggingface.co/onnx-community/Florence-2-base-ft