Tom Aarsen's picture
Building on HF

Tom Aarsen

tomaarsen

AI & ML interests

NLP: text embeddings, information retrieval, named entity recognition, few-shot text classification

Recent Activity

liked a model about 2 hours ago
jrc2139/granite-embedding-english-r2-ONNX
posted an update about 4 hours ago
🐦‍🔥 I've just published Sentence Transformers v5.2.0! It introduces multi-processing for CrossEncoder (rerankers), multilingual NanoBEIR evaluators, similarity score outputs in mine_hard_negatives, Transformers v5 support and more. Details: - CrossEncoder multi-processing: Similar to SentenceTransformer and SparseEncoder, you can now use multi-processing with CrossEncoder rerankers. Useful for multi-GPU and CPU settings, and simple to configure: just `device=["cuda:0", "cuda:1"]` or `device=["cpu"]*4` on the `model.predict` or `model.rank` calls. - Multilingual NanoBEIR Support: You can now use community translations of the tiny NanoBEIR retrieval benchmark instead of only the English one, by passing `dataset_id`, e.g. `dataset_id="lightonai/NanoBEIR-de"` for the German benchmark. - Similarity scores in Hard Negatives Mining: When mining for hard negatives to create a strong training dataset, you can now pass `output_scores=True` to get similarity scores returned. This can be useful for some distillation losses! - Transformers v5: This release works with both Transformers v4 and the upcoming v5. In the future, Sentence Transformers will only work with Transformers v5, but not yet! - Python 3.9 deprecation: Now that Python 3.9 has lost security support, Sentence Transformers no longer supports it. Check out the full changelog for more details: https://github.com/huggingface/sentence-transformers/releases/tag/v5.2.0 I'm quite excited about what's coming. There's a huge draft PR with a notable refactor in the works that should bring some exciting support. Specifically, better multimodality, rerankers, and perhaps some late interaction in the future!
View all activity

Organizations

Hugging Face's profile picture Sentence Transformers's profile picture Sentence Transformers - Cross-Encoders's profile picture Hugging Face Internal Testing Organization's profile picture SetFit's profile picture Hugging Face Fellows's profile picture Massive Text Embedding Benchmark's profile picture Open-Source AI Meetup's profile picture Nomic AI's profile picture Hugging Face OSS Metrics's profile picture Blog-explorers's profile picture Sentence Transformers Testing's profile picture mLLM multilingual's profile picture Social Post Explorers's profile picture Answer.AI's profile picture gg-tt's profile picture Distillation Hugs's profile picture Hugging Face Discord Community's profile picture Bert ... but new's profile picture EuroBERT's profile picture Sentence Transformers - Cross-Encoders Testing's profile picture hf-inference's profile picture gg-hf-gm's profile picture Hugging Face MCP Course's profile picture Sentence Transformers - Sparse Encoders's profile picture Sentence Transformers - Sparse Encoders Testing's profile picture Lumberjack's profile picture Late Interaction's profile picture