Quantized Qwen3.5 Collection Verified models. Compatible with Transformers v5.3 and vLLM v0.16.1rc1 (nightly). Under evaluation. • 10 items • Updated 3 days ago • 8
Qwen3-MoE Collection Compressed Qwen3 MoE models with a reduced number of experts. See additional models at https://huggingface.co/bknyaz. • 9 items • Updated 23 days ago • 3
Cerebras REAP Collection Sparse MoE models compressed using REAP (Router-weighted Expert Activation Pruning) method • 30 items • Updated 9 days ago • 129
gliner2 family Collection GLiNER2 extends the original GLiNER architecture to support multi-task information extraction with a schema-driven interface. This base model provid • 4 items • Updated 24 days ago • 34
view article Article From Golden Gate Bridge to Broken JSON: Why Anthropic's SAE Steering Fails for Structured Output 27 days ago • 21
view article Article Tensor Parallelism (TP) in Transformers: 5 Minutes to Understand Dec 4, 2025 • 65
The Bestiary Collection Decensored language models made using Heretic (https://github.com/p-e-w/heretic) • 6 items • Updated Nov 16, 2025 • 100
GLiNER-PII Collection PII detection models developed in collaboration with Wordcab • 5 items • Updated Jan 29 • 22
gpt-oss Collection Open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. • 2 items • Updated Aug 7, 2025 • 417
SauerkrautLM-Multilingual-(Reason)-ColBERT Collection SauerkrautLM ColBERT is a suite of Late-Interaction retrieval models built with PyLate’s ColBERT architecture and tuned for seven European languages. • 7 items • Updated Aug 3, 2025 • 20