HK
Heykuki News
Top
New
Best
Ask
Show
Jobs
Toggle theme
Top
New
Best
Ask
Show
Jobs
Request
451.
▲
RVAA: Recursive Vision-Action Agent for Long Video Understanding
github.com/mohammed840
discuss
5 months ago
tmzt
1 points
452.
▲
LLM Vision: Visual intelligence for your smart home
github.com/valentinfrlch
discuss
6 months ago
solarist
1 points
453.
▲
Show HN: Agenteract – Drive mobile apps with LLMs using UI trees (no vision)
discuss
6 months ago
mribbons
1 points
454.
▲
Training YOLO vision models on Kaggle datasets
github.com/mfranzon
discuss
8 months ago
walterbell
1 points
455.
▲
VisionOS Godot Engine support merged
discuss
a year ago
iFire
1 points
456.
▲
Show HN: Vision AI Label Studio – Open-Source Image Labeling Tool
vailabel.com
discuss
a year ago
vicheanath
1 points
457.
▲
Show HN: OSS AI Agent for Computer Vision
github.com/picselliahq
discuss
a year ago
thibautlucas
1 points
458.
▲
[Google Research] Handwriting Conversion with Vision Language Model
github.com/google-research
discuss
a year ago
moatmoat
1 points
459.
▲
Show HN: Vision, PDF reading and Python
github.com/ilevd
discuss
a year ago
ilevd
1 points
460.
▲
Computer vision models inference directly on mobile
github.com/software-mansion
discuss
a year ago
mrys
1 points
461.
▲
DeepSeek-VL2: Moe Vision-Language Models for Advanced Multimodal Understanding [pdf]
github.com/deepseek-ai
discuss
2 years ago
limoce
1 points
462.
▲
OpenDAL Going to Set Vision as "One Layer, All Storage"
github.com/apache
discuss
2 years ago
xuanwo
1 points
463.
▲
Show HN: Capd – idea to visually analyze active PowerShell with OpenAI Vision
github.com/Lywald
discuss
2 years ago
anon012012
1 points
464.
▲
Roboflow Notebooks: 60+ computer vision modeling notebooks
github.com/roboflow
discuss
2 years ago
zerojames
1 points
465.
▲
Eagle: Vision-Centric High-Resolution Multimodal LLMs with Mixture of Encoders
github.com/NVlabs
discuss
2 years ago
taikon
1 points
466.
▲
Unibench: Vision-Language Model Evaluation
github.com/facebookresearch
discuss
2 years ago
zerojames
1 points
467.
▲
Try to dump traditional mouse. Click by [Vim] + [screen vision-recognition] way
github.com/garywill
discuss
2 years ago
gry_gh
1 points
468.
▲
Show HN: Gesture Composer for VisionOS [video]
youtube.com
discuss
2 years ago
nthState
1 points
469.
▲
Moondream: Tiny Vision Language Model
github.com/vikhyat
discuss
2 years ago
zerojames
1 points
470.
▲
Show HN: Geniusrise – open-source inference endpoints for text, vision, audio
github.com
discuss
2 years ago
ixaxaar
1 points
471.
▲
Show HN: Building WebApp with Vision Pro Like UI with CSS
github.com/kelvinkoko
discuss
2 years ago
kelvinko
1 points
472.
▲
3D Printing Failure Detection with GPT4 Vision
github.com/myrakrusemark
discuss
2 years ago
miduil
1 points
473.
▲
SeeAct GPT-4V(ision) Is a Generalist Web Agent, If Grounded
github.com/OSU-NLP-Group
discuss
2 years ago
r_singh
1 points
474.
▲
AI Employe: Actions Augmented Browser Automation Using GPT-4 Vision
github.com/vignshwarar
discuss
2 years ago
vignesh_warar
1 points
475.
▲
Show HN: Labelformat now supports all major vision labeling formats
github.com/lightly-ai
discuss
3 years ago
isusmelj
1 points
476.
▲
Sound and Vision – Video Streaming to the ESP32
github.com/atomic14
discuss
3 years ago
iamflimflam1
1 points
477.
▲
Large Language-and-Vision Assistant for BioMedicine
github.com/microsoft
discuss
3 years ago
yagizdegirmenci
1 points
478.
▲
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection
github.com/facebookresearch
discuss
3 years ago
teleforce
1 points
479.
▲
VoxelGPT: Open-source AI assistant for curating computer vision datasets
github.com/voxel51
discuss
3 years ago
sickeythecat
1 points
480.
▲
A general representation modal across vision, audio, language modalities
github.com/OFA-Sys
discuss
3 years ago
logikblok
1 points
More