Feed a video to a vision LLM as a sequence of JPEG frames on the CLIsimonw.substack.com4 pointswaprina year ago