ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Modelshuggingface.co9 pointsUUjiasuqi2 years ago