Vid2Seq: A pretrained visual language model for describing multi-event videosai.googleblog.com87 pointsfamouswaffles3 years ago