Building on HF
tomaarsen
·
AI & ML interests
NLP: text embeddings, information retrieval, named entity recognition, few-shot text classification
Recent Activity
posted
an
update
about 4 hours ago
🐦🔥 I've just published Sentence Transformers v5.2.0! It introduces multi-processing for CrossEncoder (rerankers), multilingual NanoBEIR evaluators, similarity score outputs in mine_hard_negatives, Transformers v5 support and more. Details:
- CrossEncoder multi-processing: Similar to SentenceTransformer and SparseEncoder, you can now use multi-processing with CrossEncoder rerankers. Useful for multi-GPU and CPU settings, and simple to configure: just `device=["cuda:0", "cuda:1"]` or `device=["cpu"]*4` on the `model.predict` or `model.rank` calls.
- Multilingual NanoBEIR Support: You can now use community translations of the tiny NanoBEIR retrieval benchmark instead of only the English one, by passing `dataset_id`, e.g. `dataset_id="lightonai/NanoBEIR-de"` for the German benchmark.
- Similarity scores in Hard Negatives Mining: When mining for hard negatives to create a strong training dataset, you can now pass `output_scores=True` to get similarity scores returned. This can be useful for some distillation losses!
- Transformers v5: This release works with both Transformers v4 and the upcoming v5. In the future, Sentence Transformers will only work with Transformers v5, but not yet!
- Python 3.9 deprecation: Now that Python 3.9 has lost security support, Sentence Transformers no longer supports it.
Check out the full changelog for more details: https://github.com/huggingface/sentence-transformers/releases/tag/v5.2.0
I'm quite excited about what's coming. There's a huge draft PR with a notable refactor in the works that should bring some exciting support. Specifically, better multimodality, rerankers, and perhaps some late interaction in the future!
View all activity
Organizations
-
-
-
-
-
-
-
-
-
-
-
published
an
article
about 2 months ago
view article
Introducing RTEB: A New Standard for Retrieval Evaluation
- +4
view article
Welcome EmbeddingGemma, Google's new efficient embedding model
- +4
view article
Training and Finetuning Sparse Embedding Models with Sentence Transformers v5
view article
Training and Finetuning Reranker Models with Sentence Transformers v4
view article
Train 400x faster Static Embedding Models with Sentence Transformers
view article
Finally, a Replacement for BERT: Introducing ModernBERT
- +13
published
an
article
over 1 year ago
published
an
article
over 1 year ago
published
an
article
over 1 year ago
view article
Training and Finetuning Embedding Models with Sentence Transformers v3
published
an
article
over 1 year ago
view article
Blazing Fast SetFit Inference with 🤗 Optimum Intel on Xeon
- +4
published
an
article
over 1 year ago
view article
Training and Finetuning Reranker Models with Sentence Transformers v4
published
an
article
over 1 year ago
view article
Binary and Scalar Embedding Quantization for Significantly Faster & Cheaper Retrieval
- +1
published
an
article
almost 2 years ago
published
an
article
about 2 years ago
view article
SetFitABSA: Few-Shot Aspect Based Sentiment Analysis using SetFit
- +4
published
an
article
about 2 years ago
view article
SetFitABSA: Few-Shot Aspect Based Sentiment Analysis using SetFit
- +4
published
an
article
about 2 years ago