MSR-VTT: A Large Video Description Dataset for Bridging Video and Languageresearch.microsoft.com1 pointputdat10 years ago