OpenGVLab/InternVL3_5-241B-A28B-Pretrained
Image-Text-to-Text • 241B • Updated
• 4 • 1
Computer Vision
InternVideo-Next: Towards General Video Foundation Models without Video-Text Supervision
VKnowU: Evaluating Visual Knowledge Understanding in Multimodal LLMs