Tom Aarsen's picture

Building on HF

Tom Aarsen

tomaarsen

·

https://linkedin.com/in/tomaarsen

AI & ML interests

NLP: text embeddings, information retrieval, named entity recognition, few-shot text classification

Recent Activity

liked a model about 2 hours ago

jrc2139/granite-embedding-english-r2-ONNX

posted an update about 4 hours ago

🐦‍🔥 I've just published Sentence Transformers v5.2.0! It introduces multi-processing for CrossEncoder (rerankers), multilingual NanoBEIR evaluators, similarity score outputs in mine_hard_negatives, Transformers v5 support and more. Details: - CrossEncoder multi-processing: Similar to SentenceTransformer and SparseEncoder, you can now use multi-processing with CrossEncoder rerankers. Useful for multi-GPU and CPU settings, and simple to configure: just `device=["cuda:0", "cuda:1"]` or `device=["cpu"]*4` on the `model.predict` or `model.rank` calls. - Multilingual NanoBEIR Support: You can now use community translations of the tiny NanoBEIR retrieval benchmark instead of only the English one, by passing `dataset_id`, e.g. `dataset_id="lightonai/NanoBEIR-de"` for the German benchmark. - Similarity scores in Hard Negatives Mining: When mining for hard negatives to create a strong training dataset, you can now pass `output_scores=True` to get similarity scores returned. This can be useful for some distillation losses! - Transformers v5: This release works with both Transformers v4 and the upcoming v5. In the future, Sentence Transformers will only work with Transformers v5, but not yet! - Python 3.9 deprecation: Now that Python 3.9 has lost security support, Sentence Transformers no longer supports it. Check out the full changelog for more details: https://github.com/huggingface/sentence-transformers/releases/tag/v5.2.0 I'm quite excited about what's coming. There's a huge draft PR with a notable refactor in the works that should bring some exciting support. Specifically, better multimodality, rerankers, and perhaps some late interaction in the future!

upvoted a changelog about 5 hours ago

Add a Status to your Hugging Face profile

View all activity

Organizations

published an article about 2 months ago

Article

Sentence Transformers is joining Hugging Face!

Oct 22

•

85

published an article 2 months ago

Article

Introducing RTEB: A New Standard for Retrieval Evaluation

+4

Oct 1

•

129

published an article 3 months ago

Article

Welcome EmbeddingGemma, Google's new efficient embedding model

+4

Sep 4

•

264

published an article 5 months ago

Article

Training and Finetuning Sparse Embedding Models with Sentence Transformers v5

Jul 1

•

132

published an article 8 months ago

Article

The NLP Course is becoming the LLM Course

+8

Apr 3

•

101

published an article 9 months ago

Article

Training and Finetuning Reranker Models with Sentence Transformers v4

Mar 26

•

176

published an article 11 months ago

Article

Train 400x faster Static Embedding Models with Sentence Transformers

Jan 15

•

221

published an article 12 months ago

Article

Finally, a Replacement for BERT: Introducing ModernBERT

+13

Dec 19, 2024

•

711

published an article over 1 year ago

Article

Welcome Gemma 2 - Google’s new open LLM

+4

Jun 27, 2024

•

132

published an article over 1 year ago

Article

Welcome Gemma 2 - Google’s new open LLM

+4

Jun 27, 2024

•

132

published an article over 1 year ago

Article

Training and Finetuning Embedding Models with Sentence Transformers v3

May 28, 2024

•

261

published an article over 1 year ago

Article

Blazing Fast SetFit Inference with 🤗 Optimum Intel on Xeon

+4

Apr 3, 2024

•

11

published an article over 1 year ago

Article

Training and Finetuning Reranker Models with Sentence Transformers v4

Mar 26

•

176

published an article over 1 year ago

Article

Binary and Scalar Embedding Quantization for Significantly Faster & Cheaper Retrieval

+1

Mar 22, 2024

•

103

published an article almost 2 years ago

Article

🪆 Introduction to Matryoshka Embedding Models

Feb 23, 2024

•

181

published an article about 2 years ago

Article

SetFitABSA: Few-Shot Aspect Based Sentiment Analysis using SetFit

+4

Dec 6, 2023

•

15

published an article about 2 years ago

Article

SetFitABSA: Few-Shot Aspect Based Sentiment Analysis using SetFit

+4

Dec 6, 2023

•

15

published an article about 2 years ago

Article

🕳️ Attention Sinks in LLMs for endless fluency

Oct 9, 2023

•

32