ZYMScott commited on
Commit
c8f232d
·
verified ·
1 Parent(s): 15d9d45

Upload tokenizer

Browse files
README.md ADDED
@@ -0,0 +1,201 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ tags: []
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+ This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
200
+
201
+
added_tokens.json ADDED
@@ -0,0 +1,605 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "<0813-124 phase II>": 32190,
3
+ "<090008>": 32446,
4
+ "<1.2.20>": 32497,
5
+ "<15H5D-4a>": 32428,
6
+ "<1692>": 32200,
7
+ "<174/2>": 32305,
8
+ "<17Nkhm-UP2>": 32525,
9
+ "<18EpOKYJ>": 32468,
10
+ "<200023>": 32153,
11
+ "<21A>": 32575,
12
+ "<24.1>": 32217,
13
+ "<301>": 32266,
14
+ "<3347689II>": 32620,
15
+ "<3937>": 32490,
16
+ "<477>": 32459,
17
+ "<49125>": 32498,
18
+ "<5D>": 32375,
19
+ "<640>": 32504,
20
+ "<670-83>": 32312,
21
+ "<675>": 32419,
22
+ "<6D370>": 32381,
23
+ "<757>": 32517,
24
+ "<78-1320>": 32391,
25
+ "<7A>": 32199,
26
+ "<80813>": 32120,
27
+ "<>": 32350,
28
+ "<A1122>": 32144,
29
+ "<A1>": 32401,
30
+ "<A2-F21>": 32388,
31
+ "<A212-S19-A16>": 32335,
32
+ "<A23BA>": 32556,
33
+ "<A398-S21-F17>": 32286,
34
+ "<ACYC.E9L>": 32546,
35
+ "<ANU1>": 32399,
36
+ "<AR>": 32170,
37
+ "<ARAD>": 32220,
38
+ "<AR_0082>": 32101,
39
+ "<AS9>": 32364,
40
+ "<ATCC 13028>": 32426,
41
+ "<ATCC 39140>": 32424,
42
+ "<ATCC 43969>": 32511,
43
+ "<ATCC 51329>": 32626,
44
+ "<ATCC BAA-895>": 32277,
45
+ "<AVS0177>": 32630,
46
+ "<Annandia>": 32126,
47
+ "<ArsBeeUS>": 32544,
48
+ "<Arsenophonus apicola>": 32500,
49
+ "<Arsenophonus endosymbiont of Aleurodicus dispersus>": 32475,
50
+ "<Arsenophonus endosymbiont of Aphis craccivora>": 32386,
51
+ "<Arsenophonus nasoniae>": 32509,
52
+ "<Arsenophonus>": 32605,
53
+ "<Atlantibacter hermannii>": 32242,
54
+ "<Atlantibacter subterranea>": 32593,
55
+ "<Atlantibacter>": 32239,
56
+ "<BDA62-3>": 32159,
57
+ "<BHKY>": 32321,
58
+ "<BO-1>": 32215,
59
+ "<BPEN>": 32403,
60
+ "<BVAF>": 32308,
61
+ "<BY21311>": 32290,
62
+ "<Bacteria>": 32613,
63
+ "<Blochmannia endosymbiont of Camponotus (Colobopsis) obliquus>": 32535,
64
+ "<Blochmannia endosymbiont of Camponotus modoc>": 32244,
65
+ "<Blochmannia endosymbiont of Camponotus nipponensis>": 32135,
66
+ "<Blochmannia endosymbiont of Colobopsis nipponica>": 32313,
67
+ "<Blochmannia endosymbiont of Polyrhachis (Hedomyrma) turneri>": 32582,
68
+ "<Blochmannia>": 32336,
69
+ "<Brenneria goodwinii>": 32276,
70
+ "<Brenneria izadpanahii>": 32284,
71
+ "<Brenneria nigrifluens>": 32531,
72
+ "<Brenneria rubrifaciens>": 32137,
73
+ "<Brenneria ulupoensis>": 32385,
74
+ "<Brenneria>": 32275,
75
+ "<Bruguierivoracaceae>": 32229,
76
+ "<Buchnera aphidicola>": 32484,
77
+ "<Buchnera>": 32528,
78
+ "<Budviciaceae>": 32304,
79
+ "<Buttiauxella agrestis>": 32193,
80
+ "<Buttiauxella ferragutiae>": 32564,
81
+ "<Buttiauxella>": 32518,
82
+ "<C-002>": 32506,
83
+ "<C-005>": 32256,
84
+ "<C-006>": 32596,
85
+ "<C-050>": 32634,
86
+ "<C-7-2>": 32577,
87
+ "<CAVP490>": 32488,
88
+ "<CB>": 32631,
89
+ "<CCA6>": 32410,
90
+ "<CCUG 66741>": 32378,
91
+ "<CF-458>": 32293,
92
+ "<CFBP 3304>": 32332,
93
+ "<CFCC10813>": 32296,
94
+ "<CFPB1430>": 32465,
95
+ "<CFS1934>": 32192,
96
+ "<CQ10>": 32281,
97
+ "<CS-931>": 32251,
98
+ "<Candidatus Arsenophonus lipoptenae>": 32325,
99
+ "<Candidatus Blochmannia pennsylvanicus>": 32353,
100
+ "<Candidatus Blochmannia vafer>": 32320,
101
+ "<Candidatus Doolittlea endobia>": 32303,
102
+ "<Candidatus Fukatsuia symbiotica>": 32435,
103
+ "<Candidatus Gullanella endobia>": 32125,
104
+ "<Candidatus Hoaglandella endobia>": 32594,
105
+ "<Candidatus Mikella endobia>": 32372,
106
+ "<Candidatus Purcelliella pentastirinorum>": 32578,
107
+ "<Candidatus Riesia pediculicola>": 32489,
108
+ "<Candidatus Tachikawaea gelatinosa>": 32362,
109
+ "<Candidatus Westeberhardia cardiocondylae>": 32571,
110
+ "<Candidatus blochmannia chromaiodes>": 32421,
111
+ "<Candidatus ishikawaella capsulata>": 32141,
112
+ "<Candidatus moranella endobia>": 32139,
113
+ "<Candidatus sodalis pierantonius>": 32148,
114
+ "<Candidatus>": 32536,
115
+ "<Candidatus_antoea carbekii>": 32282,
116
+ "<Candidatus_ukatsuia>": 32576,
117
+ "<Cedecea lapagei>": 32261,
118
+ "<Cedecea neteri>": 32379,
119
+ "<Cedecea>": 32208,
120
+ "<Cf7303>": 32597,
121
+ "<Chania multitudinisentens>": 32356,
122
+ "<Chania>": 32219,
123
+ "<Citrobacter amalonaticus>": 32516,
124
+ "<Citrobacter arsenatis>": 32456,
125
+ "<Citrobacter braakii>": 32233,
126
+ "<Citrobacter freundii>": 32310,
127
+ "<Citrobacter koseri>": 32591,
128
+ "<Citrobacter portucalensis>": 32295,
129
+ "<Citrobacter rodentium>": 32572,
130
+ "<Citrobacter sedlakii>": 32469,
131
+ "<Citrobacter tructae>": 32342,
132
+ "<Citrobacter werkmanii>": 32523,
133
+ "<Citrobacter>": 32584,
134
+ "<Cp2>": 32514,
135
+ "<Cronobacter condimenti>": 32603,
136
+ "<Cronobacter dublinensis>": 32166,
137
+ "<Cronobacter malonaticus>": 32463,
138
+ "<Cronobacter muytjensii>": 32327,
139
+ "<Cronobacter sakazakii>": 32173,
140
+ "<Cronobacter universalis>": 32464,
141
+ "<Cronobacter>": 32568,
142
+ "<DH-S01>": 32147,
143
+ "<DSM 101947>": 32156,
144
+ "<DSM 102253>": 32610,
145
+ "<DSM 107547>": 32280,
146
+ "<DSM 15199>": 32246,
147
+ "<DSM 16636>": 32105,
148
+ "<DSM 16690>": 32533,
149
+ "<DSM 22758>": 32413,
150
+ "<DSM 32899>": 32441,
151
+ "<DSM 4481>": 32411,
152
+ "<DSM 4576>": 32343,
153
+ "<DSM 9389>": 32112,
154
+ "<Dickeya aquatica>": 32330,
155
+ "<Dickeya chrysanthemi>": 32513,
156
+ "<Dickeya dadantii>": 32376,
157
+ "<Dickeya dianthicola>": 32422,
158
+ "<Dickeya fangzhongdai>": 32340,
159
+ "<Dickeya parazeae>": 32574,
160
+ "<Dickeya poaceiphila>": 32481,
161
+ "<Dickeya solani>": 32367,
162
+ "<Dickeya zeae>": 32225,
163
+ "<Dickeya>": 32152,
164
+ "<Doolittlea>": 32299,
165
+ "<Duffyella gerundensis>": 32527,
166
+ "<Duffyella>": 32237,
167
+ "<EBP3064>": 32167,
168
+ "<EN-119>": 32627,
169
+ "<ERMR1:05>": 32323,
170
+ "<Eb661>": 32473,
171
+ "<Ech1591>": 32117,
172
+ "<Ech586>": 32346,
173
+ "<Ech703>": 32131,
174
+ "<Edwardsiella anguillarum>": 32203,
175
+ "<Edwardsiella hoshinae>": 32108,
176
+ "<Edwardsiella ictaluri>": 32522,
177
+ "<Edwardsiella piscicida>": 32451,
178
+ "<Edwardsiella tarda>": 32587,
179
+ "<Edwardsiella>": 32185,
180
+ "<Enterobacter asburiae>": 32442,
181
+ "<Enterobacter bugandensis>": 32231,
182
+ "<Enterobacter chengduensis>": 32333,
183
+ "<Enterobacter cloacae>": 32589,
184
+ "<Enterobacter hormaechei>": 32339,
185
+ "<Enterobacter huaxiensis>": 32145,
186
+ "<Enterobacter ludwigii>": 32329,
187
+ "<Enterobacter mori>": 32492,
188
+ "<Enterobacter oligotrophicus>": 32583,
189
+ "<Enterobacter pseudoroggenkampii>": 32224,
190
+ "<Enterobacter roggenkampii>": 32287,
191
+ "<Enterobacter sichuanensis>": 32161,
192
+ "<Enterobacter soli>": 32440,
193
+ "<Enterobacter>": 32132,
194
+ "<Enterobacterales>": 32174,
195
+ "<Enterobacteriaceae endosymbiont of Macroplea mutica>": 32592,
196
+ "<Enterobacteriaceae endosymbiont of Plateumaris pusilla>": 32366,
197
+ "<Enterobacteriaceae endosymbiont of_acroplea mutica>": 32377,
198
+ "<Enterobacteriaceae>": 32396,
199
+ "<EpK1/15>": 32478,
200
+ "<ErCicurvipes>": 32466,
201
+ "<Erwinia amylovora>": 32113,
202
+ "<Erwinia billingiae>": 32618,
203
+ "<Erwinia persicina>": 32195,
204
+ "<Erwinia pyrifoliae>": 32434,
205
+ "<Erwinia rhapontici>": 32300,
206
+ "<Erwinia sorbitola>": 32294,
207
+ "<Erwinia tasmaniensis>": 32588,
208
+ "<Erwinia tracheiphila>": 32357,
209
+ "<Erwinia>": 32397,
210
+ "<Erwiniaceae>": 32252,
211
+ "<Escherichia albertii>": 32554,
212
+ "<Escherichia coli >": 32566,
213
+ "<Escherichia fergusonii>": 32600,
214
+ "<Escherichia marmotae>": 32461,
215
+ "<Escherichia>": 32415,
216
+ "<Et1/99>": 32326,
217
+ "<FDAARGOS 1447>": 32158,
218
+ "<FDAARGOS_1499>": 32205,
219
+ "<FDAARGOS_165>": 32448,
220
+ "<FDAARGOS_186>": 32479,
221
+ "<FDAARGOS_392>": 32292,
222
+ "<FDAARGOS_408>": 32175,
223
+ "<FDAARGOS_500>": 32437,
224
+ "<FDAARGOS_616>": 32213,
225
+ "<FDAARGOS_730>": 32177,
226
+ "<FDAARGOS_926>": 32418,
227
+ "<FDAARGOS_940>": 32637,
228
+ "<FIN>": 32314,
229
+ "<FN20211>": 32429,
230
+ "<FRB141>": 32427,
231
+ "<FRB97>": 32169,
232
+ "<FRM16>": 32408,
233
+ "<FY-07>": 32123,
234
+ "<FY158>": 32485,
235
+ "<G5>": 32241,
236
+ "<G6>": 32547,
237
+ "<Gammaproteobacteria>": 32311,
238
+ "<Gibbsiella quercinecans>": 32212,
239
+ "<Gibbsiella>": 32590,
240
+ "<Gullanella>": 32248,
241
+ "<H4-C11>": 32149,
242
+ "<HI4320>": 32143,
243
+ "<HS11286>": 32384,
244
+ "<HS1>": 32168,
245
+ "<HYN0051>": 32121,
246
+ "<Hafnia alvei>": 32482,
247
+ "<Hafnia paralvei>": 32298,
248
+ "<Hafnia>": 32234,
249
+ "<Hafniaceae>": 32262,
250
+ "<Hoaglandella>": 32405,
251
+ "<IFB5427>": 32230,
252
+ "<IP32953>": 32358,
253
+ "<Iran 50>": 32455,
254
+ "<Ishikawaella>": 32567,
255
+ "<J780>": 32453,
256
+ "<JH01>": 32259,
257
+ "<JK2.1>": 32390,
258
+ "<JZ-GX1>": 32128,
259
+ "<JZB2120001>": 32106,
260
+ "<Jejubacter calystegiae>": 32104,
261
+ "<Jejubacter>": 32636,
262
+ "<K-12 substr. MG1655>": 32501,
263
+ "<K61>": 32400,
264
+ "<KACC 18508>": 32404,
265
+ "<KC-Pc-HB1>": 32452,
266
+ "<KMM821>": 32338,
267
+ "<KSNA2>": 32240,
268
+ "<KUDC3025>": 32209,
269
+ "<Ka37751>": 32334,
270
+ "<Kalro>": 32409,
271
+ "<Klebsiella aerogenes>": 32425,
272
+ "<Klebsiella africana>": 32491,
273
+ "<Klebsiella electrica>": 32201,
274
+ "<Klebsiella huaxiensis>": 32539,
275
+ "<Klebsiella michiganensis>": 32608,
276
+ "<Klebsiella oxytoca>": 32291,
277
+ "<Klebsiella pasteurii>": 32433,
278
+ "<Klebsiella pneumoniae>": 32317,
279
+ "<Klebsiella quasipneumoniae>": 32499,
280
+ "<Klebsiella variicola>": 32188,
281
+ "<Klebsiella>": 32520,
282
+ "<Kluyvera ascorbata>": 32214,
283
+ "<Kluyvera intermedia>": 32503,
284
+ "<Kluyvera>": 32297,
285
+ "<Kosakonia arachidis>": 32182,
286
+ "<Kosakonia cowanii>": 32487,
287
+ "<Kosakonia oryzae>": 32467,
288
+ "<Kosakonia oryzendophytica>": 32543,
289
+ "<Kosakonia pseudosacchari>": 32545,
290
+ "<Kosakonia radicincitans>": 32382,
291
+ "<Kosakonia sacchari>": 32124,
292
+ "<Kosakonia>": 32138,
293
+ "<KqPF26>": 32196,
294
+ "<L6>": 32559,
295
+ "<LEMB11>": 32551,
296
+ "<LF7a>": 32265,
297
+ "<LH84-a>": 32632,
298
+ "<LJ1>": 32483,
299
+ "<LMG 23823>": 32345,
300
+ "<LMG 23826>": 32560,
301
+ "<LMG 24197>": 32194,
302
+ "<LMG 24199>": 32114,
303
+ "<LMG 26250>": 32436,
304
+ "<LMG24200>": 32604,
305
+ "<LST-1>": 32476,
306
+ "<LT-1>": 32103,
307
+ "<LT2>": 32111,
308
+ "<LTYR-11Z>": 32406,
309
+ "<LY-1>": 32581,
310
+ "<Leclercia adecarboxylata>": 32157,
311
+ "<Leclercia pneumoniae>": 32228,
312
+ "<Leclercia>": 32598,
313
+ "<Lelliottia steviae>": 32563,
314
+ "<Lelliottia>": 32172,
315
+ "<Leminorella richardii>": 32524,
316
+ "<Leminorella>": 32502,
317
+ "<Limnobaculum parvum>": 32540,
318
+ "<Limnobaculum zhutongyuii>": 32477,
319
+ "<Limnobaculum>": 32183,
320
+ "<Lonsdalea britannica>": 32617,
321
+ "<Lonsdalea populi>": 32537,
322
+ "<Lonsdalea>": 32457,
323
+ "<Lsch>": 32417,
324
+ "<ME23>": 32530,
325
+ "<MS2>": 32316,
326
+ "<MiY-A>": 32557,
327
+ "<Mikella>": 32373,
328
+ "<Mixta gaviniae>": 32184,
329
+ "<Mixta hanseatica>": 32460,
330
+ "<Mixta intestinalis>": 32180,
331
+ "<Mixta>": 32628,
332
+ "<Moellerella wisconsensis>": 32267,
333
+ "<Moellerella>": 32619,
334
+ "<Moranella>": 32486,
335
+ "<Morganella morganii>": 32398,
336
+ "<Morganella>": 32322,
337
+ "<Morganellaceae>": 32165,
338
+ "<Mpkobe>": 32580,
339
+ "<Musicola paradisiaca>": 32430,
340
+ "<Musicola>": 32176,
341
+ "<N-5-1>": 32245,
342
+ "<N2-1>": 32211,
343
+ "<N268-08>": 32370,
344
+ "<NA>": 32622,
345
+ "<NCPPB 569>": 32109,
346
+ "<NCTC 14382>": 32392,
347
+ "<NCTC 9529>": 32337,
348
+ "<NCTC11466>": 32359,
349
+ "<NCTC12003>": 32407,
350
+ "<NCTC12148>": 32179,
351
+ "<NCTC12151>": 32602,
352
+ "<NCTC12284>": 32278,
353
+ "<NCTC13188>": 32361,
354
+ "<NIBIO1392>": 32206,
355
+ "<OLIH>": 32223,
356
+ "<Ola 51>": 32130,
357
+ "<PA13>": 32470,
358
+ "<PCVAL>": 32599,
359
+ "<PPO 9019>": 32250,
360
+ "<PR-310>": 32606,
361
+ "<PRI-2C>": 32178,
362
+ "<Pantoea agglomerans>": 32462,
363
+ "<Pantoea alfalfae>": 32526,
364
+ "<Pantoea alhagi>": 32247,
365
+ "<Pantoea ananatis>": 32163,
366
+ "<Pantoea deleyi>": 32198,
367
+ "<Pantoea dispersa>": 32493,
368
+ "<Pantoea eucalypti>": 32255,
369
+ "<Pantoea eucrina>": 32579,
370
+ "<Pantoea soli>": 32363,
371
+ "<Pantoea stewartii>": 32227,
372
+ "<Pantoea vagans>": 32341,
373
+ "<Pantoea>": 32122,
374
+ "<Pectobacteriaceae>": 32629,
375
+ "<Pectobacterium aquaticum>": 32555,
376
+ "<Pectobacterium aroidearum>": 32348,
377
+ "<Pectobacterium atrosepticum>": 32154,
378
+ "<Pectobacterium brasiliense>": 32127,
379
+ "<Pectobacterium cacticida>": 32257,
380
+ "<Pectobacterium carotovorum>": 32640,
381
+ "<Pectobacterium colocasium>": 32302,
382
+ "<Pectobacterium odoriferum>": 32496,
383
+ "<Pectobacterium parmentieri>": 32197,
384
+ "<Pectobacterium parvum>": 32355,
385
+ "<Pectobacterium polaris>": 32100,
386
+ "<Pectobacterium punjabense>": 32115,
387
+ "<Pectobacterium quasiaquaticum>": 32447,
388
+ "<Pectobacterium wasabiae>": 32186,
389
+ "<Pectobacterium>": 32306,
390
+ "<Photorhabdus akhurstii>": 32134,
391
+ "<Photorhabdus asymbiotica>": 32315,
392
+ "<Photorhabdus laumondii>": 32243,
393
+ "<Photorhabdus thracensis>": 32191,
394
+ "<Photorhabdus>": 32565,
395
+ "<Phytobacter diazotrophicus>": 32445,
396
+ "<Phytobacter>": 32393,
397
+ "<Plesiomonas shigelloides>": 32458,
398
+ "<Plesiomonas>": 32601,
399
+ "<Pluralibacter gergoviae>": 32309,
400
+ "<Pluralibacter>": 32611,
401
+ "<Pragia fontium>": 32623,
402
+ "<Pragia>": 32402,
403
+ "<Profftia>": 32273,
404
+ "<Proteus hauseri>": 32609,
405
+ "<Proteus mirabilis>": 32274,
406
+ "<Proteus penneri>": 32439,
407
+ "<Proteus terrae>": 32351,
408
+ "<Proteus>": 32561,
409
+ "<Providencia alcalifaciens>": 32238,
410
+ "<Providencia hangzhouensis>": 32431,
411
+ "<Providencia heimbachae>": 32585,
412
+ "<Providencia huaxiensis>": 32171,
413
+ "<Providencia rettgeri>": 32616,
414
+ "<Providencia stuartii>": 32331,
415
+ "<Providencia>": 32324,
416
+ "<Pseudocitrobacter corydidari>": 32368,
417
+ "<Pseudocitrobacter>": 32129,
418
+ "<Pseudomonadota>": 32218,
419
+ "<Purcelliella>": 32552,
420
+ "<RB-25>": 32318,
421
+ "<Rahnella aceris>": 32307,
422
+ "<Rahnella sikkimica>": 32289,
423
+ "<Rahnella victoriana>": 32118,
424
+ "<Rahnella>": 32416,
425
+ "<Raoultella planticola>": 32438,
426
+ "<Raoultella terrigena>": 32383,
427
+ "<Raoultella>": 32534,
428
+ "<Riesia>": 32633,
429
+ "<S07-698>": 32260,
430
+ "<S178-2>": 32202,
431
+ "<S1>": 32515,
432
+ "<S2-A69>": 32449,
433
+ "<SCPM-O-B-7604>": 32380,
434
+ "<SE6-1>": 32352,
435
+ "<SGAir0282>": 32136,
436
+ "<SII>": 32235,
437
+ "<SK>": 32494,
438
+ "<SNU WT2>": 32288,
439
+ "<SOPE>": 32558,
440
+ "<SRCM103226>": 32181,
441
+ "<SS95>": 32638,
442
+ "<SWHEFF_49>": 32512,
443
+ "<Sakai substr. RIMD 0509952>": 32387,
444
+ "<Salmonella bongori>": 32395,
445
+ "<Salmonella enterica>": 32541,
446
+ "<Salmonella>": 32519,
447
+ "<Sample 167>": 32454,
448
+ "<Sb-24>": 32160,
449
+ "<Scandinavium goeteborgense>": 32369,
450
+ "<Scandinavium>": 32347,
451
+ "<Schneideria>": 32607,
452
+ "<Serratia entomophila>": 32150,
453
+ "<Serratia ficaria>": 32480,
454
+ "<Serratia fonticola>": 32625,
455
+ "<Serratia inhibens>": 32349,
456
+ "<Serratia liquefaciens>": 32521,
457
+ "<Serratia nematodiphila>": 32360,
458
+ "<Serratia plymuthica>": 32116,
459
+ "<Serratia proteamaculans>": 32508,
460
+ "<Serratia quinivorans>": 32271,
461
+ "<Serratia rhizosphaerae>": 32444,
462
+ "<Serratia rubidaea>": 32570,
463
+ "<Serratia surfactantfaciens>": 32187,
464
+ "<Serratia symbiotica>": 32573,
465
+ "<Serratia ureilytica>": 32507,
466
+ "<Serratia>": 32389,
467
+ "<Shigella dysenteriae>": 32344,
468
+ "<Shigella flexneri>": 32614,
469
+ "<Shigella sonnei>": 32110,
470
+ "<Shigella>": 32548,
471
+ "<Shimwellia blattae>": 32553,
472
+ "<Shimwellia>": 32569,
473
+ "<Siccibacter colletis>": 32102,
474
+ "<Siccibacter>": 32162,
475
+ "<Sodalis endosymbiont of Henestaris halophilus>": 32270,
476
+ "<Sodalis glossinidius>": 32538,
477
+ "<Sodalis praecaptivus>": 32279,
478
+ "<Sodalis>": 32319,
479
+ "<SyEd1>": 32495,
480
+ "<Symbiopectobacterium purcellii>": 32226,
481
+ "<Symbiopectobacterium>": 32365,
482
+ "<T6>": 32472,
483
+ "<TA9759>": 32423,
484
+ "<TBY01>": 32505,
485
+ "<THO-011>": 32639,
486
+ "<TTO1>": 32549,
487
+ "<Tachikawaea>": 32253,
488
+ "<Tatumella citrea>": 32510,
489
+ "<Tatumella>": 32264,
490
+ "<Trabulsiella odontotermitis>": 32285,
491
+ "<Trabulsiella>": 32222,
492
+ "<US>": 32164,
493
+ "<USDA-ARS-USMARC-60222>": 32254,
494
+ "<USDA>": 32189,
495
+ "<UwTKB>": 32207,
496
+ "<VKH10>": 32615,
497
+ "<W65>": 32471,
498
+ "<WCHECl-C4 = WCHECh050004>": 32586,
499
+ "<WCHKl090001>": 32107,
500
+ "<WCHPr000369>": 32119,
501
+ "<WPP14>": 32354,
502
+ "<Westeberhardia>": 32432,
503
+ "<Wigglesworthia glossinidia>": 32155,
504
+ "<Wigglesworthia>": 32621,
505
+ "<Winslowiella toletana>": 32394,
506
+ "<Winslowiella>": 32236,
507
+ "<XL123>": 32133,
508
+ "<XL95>": 32412,
509
+ "<Xenorhabdus budapestensis>": 32151,
510
+ "<Xenorhabdus doucetiae>": 32263,
511
+ "<Xenorhabdus griffiniae>": 32216,
512
+ "<Xenorhabdus hominickii>": 32142,
513
+ "<Xenorhabdus nematophila>": 32269,
514
+ "<Xenorhabdus poinarii>": 32204,
515
+ "<Xenorhabdus>": 32232,
516
+ "<YD25>": 32612,
517
+ "<YF8>": 32542,
518
+ "<YRA>": 32550,
519
+ "<YSD YN2>": 32624,
520
+ "<Y_sim_228>": 32374,
521
+ "<Yersinia aldovae>": 32272,
522
+ "<Yersinia alsatica>": 32328,
523
+ "<Yersinia canariae>": 32450,
524
+ "<Yersinia hibernica>": 32301,
525
+ "<Yersinia intermedia>": 32258,
526
+ "<Yersinia mollaretii>": 32283,
527
+ "<Yersinia pestis>": 32532,
528
+ "<Yersinia pseudotuberculosis>": 32210,
529
+ "<Yersinia rohdei>": 32268,
530
+ "<Yersinia ruckeri>": 32420,
531
+ "<Yersinia similis>": 32635,
532
+ "<Yersinia>": 32146,
533
+ "<Yersiniaceae>": 32443,
534
+ "<ZJ-FGZX1>": 32595,
535
+ "<ZN2>": 32474,
536
+ "<[Enterobacter] lignolyticus>": 32562,
537
+ "<[Pantoea] beijingensis>": 32249,
538
+ "<morsitans>": 32221,
539
+ "<obscurior>": 32371,
540
+ "<secondary endosymbiont of Ctenarytaina eucalypti>": 32414,
541
+ "<secondary endosymbiont of Heteropsylla cubana>": 32529,
542
+ "<secondary endosymbiont of Trabutina mannipara>": 32140,
543
+ "AAA": 32681,
544
+ "AAC": 32680,
545
+ "AAG": 32682,
546
+ "AAU": 32679,
547
+ "ACG": 32666,
548
+ "ACU": 32665,
549
+ "AGA": 32697,
550
+ "AGC": 32696,
551
+ "AGG": 32698,
552
+ "AGU": 32695,
553
+ "AUA": 32651,
554
+ "AUC": 32650,
555
+ "AUG": 32652,
556
+ "AUU": 32649,
557
+ "CAA": 32677,
558
+ "CAC": 32676,
559
+ "CAG": 32678,
560
+ "CAU": 32675,
561
+ "CCA": 32663,
562
+ "CCC": 32662,
563
+ "CCG": 32664,
564
+ "CCU": 32661,
565
+ "CGA": 32693,
566
+ "CGC": 32692,
567
+ "CGG": 32694,
568
+ "CGU": 32691,
569
+ "CUA": 32647,
570
+ "CUC": 32646,
571
+ "CUG": 32648,
572
+ "CUU": 32645,
573
+ "GAA": 32685,
574
+ "GAC": 32684,
575
+ "GAG": 32686,
576
+ "GAU": 32683,
577
+ "GCA": 32669,
578
+ "GCC": 32668,
579
+ "GCG": 32670,
580
+ "GCU": 32667,
581
+ "GGA": 32701,
582
+ "GGC": 32700,
583
+ "GGG": 32702,
584
+ "GGU": 32699,
585
+ "GUA": 32655,
586
+ "GUC": 32654,
587
+ "GUG": 32656,
588
+ "GUU": 32653,
589
+ "UAA": 32673,
590
+ "UAC": 32672,
591
+ "UAG": 32674,
592
+ "UAU": 32671,
593
+ "UCA": 32659,
594
+ "UCC": 32658,
595
+ "UCG": 32660,
596
+ "UCU": 32657,
597
+ "UGA": 32689,
598
+ "UGC": 32688,
599
+ "UGG": 32690,
600
+ "UGU": 32687,
601
+ "UUA": 32643,
602
+ "UUC": 32642,
603
+ "UUG": 32644,
604
+ "UUU": 32641
605
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,107 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<extra_id_0>",
4
+ "<extra_id_1>",
5
+ "<extra_id_2>",
6
+ "<extra_id_3>",
7
+ "<extra_id_4>",
8
+ "<extra_id_5>",
9
+ "<extra_id_6>",
10
+ "<extra_id_7>",
11
+ "<extra_id_8>",
12
+ "<extra_id_9>",
13
+ "<extra_id_10>",
14
+ "<extra_id_11>",
15
+ "<extra_id_12>",
16
+ "<extra_id_13>",
17
+ "<extra_id_14>",
18
+ "<extra_id_15>",
19
+ "<extra_id_16>",
20
+ "<extra_id_17>",
21
+ "<extra_id_18>",
22
+ "<extra_id_19>",
23
+ "<extra_id_20>",
24
+ "<extra_id_21>",
25
+ "<extra_id_22>",
26
+ "<extra_id_23>",
27
+ "<extra_id_24>",
28
+ "<extra_id_25>",
29
+ "<extra_id_26>",
30
+ "<extra_id_27>",
31
+ "<extra_id_28>",
32
+ "<extra_id_29>",
33
+ "<extra_id_30>",
34
+ "<extra_id_31>",
35
+ "<extra_id_32>",
36
+ "<extra_id_33>",
37
+ "<extra_id_34>",
38
+ "<extra_id_35>",
39
+ "<extra_id_36>",
40
+ "<extra_id_37>",
41
+ "<extra_id_38>",
42
+ "<extra_id_39>",
43
+ "<extra_id_40>",
44
+ "<extra_id_41>",
45
+ "<extra_id_42>",
46
+ "<extra_id_43>",
47
+ "<extra_id_44>",
48
+ "<extra_id_45>",
49
+ "<extra_id_46>",
50
+ "<extra_id_47>",
51
+ "<extra_id_48>",
52
+ "<extra_id_49>",
53
+ "<extra_id_50>",
54
+ "<extra_id_51>",
55
+ "<extra_id_52>",
56
+ "<extra_id_53>",
57
+ "<extra_id_54>",
58
+ "<extra_id_55>",
59
+ "<extra_id_56>",
60
+ "<extra_id_57>",
61
+ "<extra_id_58>",
62
+ "<extra_id_59>",
63
+ "<extra_id_60>",
64
+ "<extra_id_61>",
65
+ "<extra_id_62>",
66
+ "<extra_id_63>",
67
+ "<extra_id_64>",
68
+ "<extra_id_65>",
69
+ "<extra_id_66>",
70
+ "<extra_id_67>",
71
+ "<extra_id_68>",
72
+ "<extra_id_69>",
73
+ "<extra_id_70>",
74
+ "<extra_id_71>",
75
+ "<extra_id_72>",
76
+ "<extra_id_73>",
77
+ "<extra_id_74>",
78
+ "<extra_id_75>",
79
+ "<extra_id_76>",
80
+ "<extra_id_77>",
81
+ "<extra_id_78>",
82
+ "<extra_id_79>",
83
+ "<extra_id_80>",
84
+ "<extra_id_81>",
85
+ "<extra_id_82>",
86
+ "<extra_id_83>",
87
+ "<extra_id_84>",
88
+ "<extra_id_85>",
89
+ "<extra_id_86>",
90
+ "<extra_id_87>",
91
+ "<extra_id_88>",
92
+ "<extra_id_89>",
93
+ "<extra_id_90>",
94
+ "<extra_id_91>",
95
+ "<extra_id_92>",
96
+ "<extra_id_93>",
97
+ "<extra_id_94>",
98
+ "<extra_id_95>",
99
+ "<extra_id_96>",
100
+ "<extra_id_97>",
101
+ "<extra_id_98>",
102
+ "<extra_id_99>"
103
+ ],
104
+ "eos_token": "</s>",
105
+ "pad_token": "<pad>",
106
+ "unk_token": "<unk>"
107
+ }
spiece.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d60acb128cf7b7f2536e8f38a5b18a05535c9e14c7a355904270e15b0945ea86
3
+ size 791656
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff