AdithyaSK commited on
Commit
c8eb9de
·
verified ·
1 Parent(s): 64a5609

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +92 -1
README.md CHANGED
@@ -1,6 +1,27 @@
1
  ---
2
  language:
3
  - en
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  license: gemma
5
  library_name: transformers
6
  tags:
@@ -8,9 +29,79 @@ tags:
8
  - retrieval
9
  - colbert
10
  - late-interaction
 
 
 
 
11
  pipeline_tag: visual-document-retrieval
12
  base_model:
13
  - google/gemma-3-4b-it
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  ---
15
  # ColNetraEmbed
16
 
@@ -31,7 +122,7 @@ base_model:
31
  ColNetraEmbed is a multilingual multimodal embedding model that encodes documents as multi-vector representations using the ColPali architecture. Each image patch is mapped to a contextualized embedding, enabling fine-grained matching between visual content and text queries through late interaction (MaxSim).
32
 
33
  - **Model Type:** Multilingual Multimodal Embedding Model with ColPali-style Multi-vector representations
34
- - **Architecture:** ColPali with Gemma3-2B backbone
35
  - **Embedding Dimension:** 128 per token
36
  - **Capabilities:** Multilingual, Multimodal (Vision + Text), Multi-vector late interaction
37
  - **Use Case:** Visual document retrieval, multilingual document understanding, fine-grained visual search
 
1
  ---
2
  language:
3
  - en
4
+ - es
5
+ - fr
6
+ - de
7
+ - it
8
+ - hi
9
+ - mr
10
+ - sa
11
+ - kn
12
+ - te
13
+ - ta
14
+ - ml
15
+ - zh
16
+ - ja
17
+ - ko
18
+ - ar
19
+ - bn
20
+ - gu
21
+ - or
22
+ - pa
23
+ - ru
24
+ - th
25
  license: gemma
26
  library_name: transformers
27
  tags:
 
29
  - retrieval
30
  - colbert
31
  - late-interaction
32
+ - multimodal
33
+ - multilingual
34
+ - document-retrieval
35
+ - 22-languages
36
  pipeline_tag: visual-document-retrieval
37
  base_model:
38
  - google/gemma-3-4b-it
39
+
40
+ datasets:
41
+ - Cognitive-Lab/nayanair-bench
42
+ model-index:
43
+ - name: ColNetraEmbed
44
+ results:
45
+ - task:
46
+ type: image-text-retrieval
47
+ name: Cross-Lingual Document Retrieval
48
+ dataset:
49
+ type: Cognitive-Lab/nayanair-bench
50
+ name: Nayana-IR Cross-Lingual
51
+ split: test
52
+ metrics:
53
+ - type: ndcg_at_5
54
+ value: 0.637
55
+ name: NDCG@5
56
+ - type: recall_at_10
57
+ value: 0.700
58
+ name: Recall@10
59
+ - type: map_at_10
60
+ value: 0.610
61
+ name: MAP@10
62
+ - type: mrr_at_10
63
+ value: 0.610
64
+ name: MRR@10
65
+ - task:
66
+ type: image-text-retrieval
67
+ name: Monolingual Document Retrieval
68
+ dataset:
69
+ type: Cognitive-Lab/nayanair-bench
70
+ name: Nayana-IR Monolingual
71
+ split: test
72
+ metrics:
73
+ - type: ndcg_at_5
74
+ value: 0.670
75
+ name: NDCG@5
76
+ - type: recall_at_10
77
+ value: 0.764
78
+ name: Recall@10
79
+ - type: map_at_10
80
+ value: 0.645
81
+ name: MAP@10
82
+ - type: mrr_at_10
83
+ value: 0.686
84
+ name: MRR@10
85
+ - task:
86
+ type: image-text-retrieval
87
+ name: English Document Retrieval
88
+ dataset:
89
+ type: vidore/vidore-benchmark
90
+ name: ViDoRe v2
91
+ split: test
92
+ metrics:
93
+ - type: ndcg_at_5
94
+ value: 0.551
95
+ name: NDCG@5
96
+ - type: recall_at_10
97
+ value: 0.664
98
+ name: Recall@10
99
+ - type: map_at_10
100
+ value: 0.445
101
+ name: MAP@10
102
+ - type: mrr_at_10
103
+ value: 0.445
104
+ name: MRR@10
105
  ---
106
  # ColNetraEmbed
107
 
 
122
  ColNetraEmbed is a multilingual multimodal embedding model that encodes documents as multi-vector representations using the ColPali architecture. Each image patch is mapped to a contextualized embedding, enabling fine-grained matching between visual content and text queries through late interaction (MaxSim).
123
 
124
  - **Model Type:** Multilingual Multimodal Embedding Model with ColPali-style Multi-vector representations
125
+ - **Architecture:** ColPali with Gemma3-4B backbone
126
  - **Embedding Dimension:** 128 per token
127
  - **Capabilities:** Multilingual, Multimodal (Vision + Text), Multi-vector late interaction
128
  - **Use Case:** Visual document retrieval, multilingual document understanding, fine-grained visual search