MMC4: An open, billion-scale corpus of images interleaved with textgithub.com/allenai132 pointstim_sw3 years ago