Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation
Paper • 2004.09813 • Published • 1
How to use tomaarsen/TinyBERT_L-4_H-312_v2-distilled-from-stsb-roberta-base-v2-projection-dim with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("tomaarsen/TinyBERT_L-4_H-312_v2-distilled-from-stsb-roberta-base-v2-projection-dim")
sentences = [
"At an outdoor event in an Asian-themed area, a crowd congregates as one person in a yellow Chinese dragon costume confronts the camera.",
"Boy dressed in blue holds a toy.",
"the animal is running",
"Two young asian men are squatting."
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]This is a sentence-transformers model finetuned from nreimers/TinyBERT_L-4_H-312_v2. It maps sentences & paragraphs to a 312-dimensional dense vector space and can be used for retrieval.
SentenceTransformer(
(0): Transformer({'transformer_task': 'feature-extraction', 'modality_config': {'text': {'method': 'forward', 'method_output_name': 'last_hidden_state'}}, 'module_output_name': 'token_embeddings', 'architecture': 'BertModel'})
(1): Pooling({'embedding_dimension': 312, 'pooling_mode': 'mean', 'include_prompt': True})
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("tomaarsen/TinyBERT_L-4_H-312_v2-distilled-from-stsb-roberta-base-v2-projection-dim")
# Run inference
sentences = [
'A black dog is drinking next to a brown and white dog that is looking at an orange ball in the lake, whilst a horse and rider passes behind.',
'There are two people running around a track in lane three and the one wearing a blue shirt with a green thing over the eyes is just barely ahead of the guy wearing an orange shirt and sunglasses.',
'the guy is dead',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 312]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[ 1.0000, 0.1955, -0.0013],
# [ 0.1955, 1.0000, -0.0433],
# [-0.0013, -0.0433, 1.0000]])
sts-dev and sts-testEmbeddingSimilarityEvaluator| Metric | sts-dev | sts-test |
|---|---|---|
| pearson_cosine | 0.808 | 0.7473 |
| spearman_cosine | 0.8203 | 0.7517 |
sentence and label| sentence | label | |
|---|---|---|
| type | string | list |
| modality | text | |
| details |
|
|
| sentence | label |
|---|---|
A person on a horse jumps over a broken down airplane. |
[-0.477225124835968, -0.027898235246539116, 0.6169318556785583, -1.6224359273910522, 0.7474681735038757, ...] |
Children smiling and waving at camera |
[-0.1697935163974762, 0.9077808856964111, -0.8368250727653503, -0.47047966718673706, -0.5604732036590576, ...] |
A boy is jumping on skateboard in the middle of a red bridge. |
[0.6267533898353577, 0.011438215151429176, 0.47103747725486755, 0.4887479841709137, -0.3095979690551758, ...] |
MSELoss with these parameters:{
"projection_dim": 768
}
sentence and label| sentence | label | |
|---|---|---|
| type | string | list |
| modality | text | |
| details |
|
|
| sentence | label |
|---|---|
Two women are embracing while holding to go packages. |
[1.3980050086975098, 0.659657895565033, -0.671194851398468, -0.3568831980228424, 0.08937378972768784, ...] |
Two young children in blue jerseys, one with the number 9 and one with the number 2 are standing on wooden steps in a bathroom and washing their hands in a sink. |
[0.08953701704740524, -0.16486810147762299, -0.5275247097015381, -0.13387243449687958, 0.3173069953918457, ...] |
A man selling donuts to a customer during a world exhibition event held in the city of Angeles |
[-0.18134362995624542, -0.27244624495506287, 0.6053312420845032, 0.4879472851753235, -0.4728725850582123, ...] |
MSELoss with these parameters:{
"projection_dim": 768
}
per_device_train_batch_size: 64num_train_epochs: 1learning_rate: 0.0001warmup_steps: 0.1fp16: Trueper_device_eval_batch_size: 64load_best_model_at_end: Trueper_device_train_batch_size: 64num_train_epochs: 1max_steps: -1learning_rate: 0.0001lr_scheduler_type: linearlr_scheduler_kwargs: Nonewarmup_steps: 0.1optim: adamw_torch_fusedoptim_args: Noneweight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08optim_target_modules: Nonegradient_accumulation_steps: 1average_tokens_across_devices: Truemax_grad_norm: 1.0label_smoothing_factor: 0.0bf16: Falsefp16: Truebf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Nonetorch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneuse_liger_kernel: Falseliger_kernel_config: Noneuse_cache: Falseneftune_noise_alpha: Nonetorch_empty_cache_steps: Noneauto_find_batch_size: Falselog_on_each_node: Truelogging_nan_inf_filter: Trueinclude_num_input_tokens_seen: nolog_level: passivelog_level_replica: warningdisable_tqdm: Falseproject: huggingfacetrackio_space_id: Nonetrackio_bucket_id: Nonetrackio_static_space_id: Noneper_device_eval_batch_size: 64prediction_loss_only: Trueeval_on_start: Falseeval_do_concat_batches: Trueeval_use_gather_object: Falseeval_accumulation_steps: Noneinclude_for_metrics: []batch_eval_metrics: Falsesave_only_model: Falsesave_on_each_node: Falseenable_jit_checkpoint: Falsepush_to_hub: Falsehub_private_repo: Nonehub_model_id: Nonehub_strategy: every_savehub_always_push: Falsehub_revision: Noneload_best_model_at_end: Trueignore_data_skip: Falserestore_callback_states_from_checkpoint: Falsefull_determinism: Falseseed: 42data_seed: Noneuse_cpu: Falseaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedataloader_drop_last: Falsedataloader_num_workers: 0dataloader_pin_memory: Truedataloader_persistent_workers: Falsedataloader_prefetch_factor: Noneremove_unused_columns: Truelabel_names: Nonetrain_sampling_strategy: randomlength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falseddp_static_graph: Noneddp_backend: Noneddp_timeout: 1800fsdp: []fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}deepspeed: Nonedebug: []skip_memory_metrics: Truedo_predict: Falseresume_from_checkpoint: Nonewarmup_ratio: Nonelocal_rank: -1prompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: proportionalrouter_mapping: {}learning_rate_mapping: {}| Epoch | Step | Training Loss | Validation Loss | sts-dev_spearman_cosine | sts-test_spearman_cosine |
|---|---|---|---|---|---|
| 0.032 | 100 | 0.3832 | - | - | - |
| 0.064 | 200 | 0.3556 | - | - | - |
| 0.096 | 300 | 0.3163 | - | - | - |
| 0.128 | 400 | 0.2814 | - | - | - |
| 0.16 | 500 | 0.2573 | 0.2855 | 0.7637 | - |
| 0.192 | 600 | 0.2412 | - | - | - |
| 0.224 | 700 | 0.2285 | - | - | - |
| 0.256 | 800 | 0.2205 | - | - | - |
| 0.288 | 900 | 0.2112 | - | - | - |
| 0.32 | 1000 | 0.2038 | 0.2533 | 0.7988 | - |
| 0.352 | 1100 | 0.1980 | - | - | - |
| 0.384 | 1200 | 0.1932 | - | - | - |
| 0.416 | 1300 | 0.1889 | - | - | - |
| 0.448 | 1400 | 0.1853 | - | - | - |
| 0.48 | 1500 | 0.1835 | 0.2375 | 0.8114 | - |
| 0.512 | 1600 | 0.1780 | - | - | - |
| 0.544 | 1700 | 0.1765 | - | - | - |
| 0.576 | 1800 | 0.1741 | - | - | - |
| 0.608 | 1900 | 0.1714 | - | - | - |
| 0.64 | 2000 | 0.1696 | 0.2292 | 0.8153 | - |
| 0.672 | 2100 | 0.1685 | - | - | - |
| 0.704 | 2200 | 0.1677 | - | - | - |
| 0.736 | 2300 | 0.1663 | - | - | - |
| 0.768 | 2400 | 0.1642 | - | - | - |
| 0.8 | 2500 | 0.1629 | 0.2246 | 0.8187 | - |
| 0.832 | 2600 | 0.1615 | - | - | - |
| 0.864 | 2700 | 0.1616 | - | - | - |
| 0.896 | 2800 | 0.1606 | - | - | - |
| 0.928 | 2900 | 0.1603 | - | - | - |
| 0.96 | 3000 | 0.1603 | 0.2217 | 0.8196 | - |
| 0.992 | 3100 | 0.1591 | - | - | - |
| 1.0 | 3125 | - | 0.2215 | 0.8203 | - |
| -1 | -1 | - | - | - | 0.7517 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@inproceedings{reimers-2020-multilingual-sentence-bert,
title = "Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2020",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/2004.09813",
}
Base model
nreimers/TinyBERT_L-4_H-312_v2