Cognitive-Lab
/

ColNetraEmbed

@@ -1,6 +1,27 @@
 ---
 language:
 - en
 license: gemma
 library_name: transformers
 tags:
@@ -8,9 +29,79 @@ tags:
 - retrieval
 - colbert
 - late-interaction
 pipeline_tag: visual-document-retrieval
 base_model:
 - google/gemma-3-4b-it
 ---
 # ColNetraEmbed
@@ -31,7 +122,7 @@ base_model:
 ColNetraEmbed is a multilingual multimodal embedding model that encodes documents as multi-vector representations using the ColPali architecture. Each image patch is mapped to a contextualized embedding, enabling fine-grained matching between visual content and text queries through late interaction (MaxSim).
 - **Model Type:** Multilingual Multimodal Embedding Model with ColPali-style Multi-vector representations
-- **Architecture:** ColPali with Gemma3-2B backbone
 - **Embedding Dimension:** 128 per token
 - **Capabilities:** Multilingual, Multimodal (Vision + Text), Multi-vector late interaction
 - **Use Case:** Visual document retrieval, multilingual document understanding, fine-grained visual search

 ---
 language:
 - en
+- es
+- fr
+- de
+- it
+- hi
+- mr
+- sa
+- kn
+- te
+- ta
+- ml
+- zh
+- ja
+- ko
+- ar
+- bn
+- gu
+- or
+- pa
+- ru
+- th
 license: gemma
 library_name: transformers
 tags:
 - retrieval
 - colbert
 - late-interaction
+- multimodal
+- multilingual
+- document-retrieval
+- 22-languages
 pipeline_tag: visual-document-retrieval
 base_model:
 - google/gemma-3-4b-it
+datasets:
+- Cognitive-Lab/nayanair-bench
+model-index:
+- name: ColNetraEmbed
+  results:
+  - task:
+      type: image-text-retrieval
+      name: Cross-Lingual Document Retrieval
+    dataset:
+      type: Cognitive-Lab/nayanair-bench
+      name: Nayana-IR Cross-Lingual
+      split: test
+    metrics:
+    - type: ndcg_at_5
+      value: 0.637
+      name: NDCG@5
+    - type: recall_at_10
+      value: 0.700
+      name: Recall@10
+    - type: map_at_10
+      value: 0.610
+      name: MAP@10
+    - type: mrr_at_10
+      value: 0.610
+      name: MRR@10
+  - task:
+      type: image-text-retrieval
+      name: Monolingual Document Retrieval
+    dataset:
+      type: Cognitive-Lab/nayanair-bench
+      name: Nayana-IR Monolingual
+      split: test
+    metrics:
+    - type: ndcg_at_5
+      value: 0.670
+      name: NDCG@5
+    - type: recall_at_10
+      value: 0.764
+      name: Recall@10
+    - type: map_at_10
+      value: 0.645
+      name: MAP@10
+    - type: mrr_at_10
+      value: 0.686
+      name: MRR@10
+  - task:
+      type: image-text-retrieval
+      name: English Document Retrieval
+    dataset:
+      type: vidore/vidore-benchmark
+      name: ViDoRe v2
+      split: test
+    metrics:
+    - type: ndcg_at_5
+      value: 0.551
+      name: NDCG@5
+    - type: recall_at_10
+      value: 0.664
+      name: Recall@10
+    - type: map_at_10
+      value: 0.445
+      name: MAP@10
+    - type: mrr_at_10
+      value: 0.445
+      name: MRR@10
 ---
 # ColNetraEmbed
 ColNetraEmbed is a multilingual multimodal embedding model that encodes documents as multi-vector representations using the ColPali architecture. Each image patch is mapped to a contextualized embedding, enabling fine-grained matching between visual content and text queries through late interaction (MaxSim).
 - **Model Type:** Multilingual Multimodal Embedding Model with ColPali-style Multi-vector representations
+- **Architecture:** ColPali with Gemma3-4B backbone
 - **Embedding Dimension:** 128 per token
 - **Capabilities:** Multilingual, Multimodal (Vision + Text), Multi-vector late interaction
 - **Use Case:** Visual document retrieval, multilingual document understanding, fine-grained visual search