LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models Paper • 2310.08659 • Published Oct 12, 2023 • 28
Transformers.js V4 demos Collection A collection of demos built with Transformers.js V4 • 17 items • Updated about 17 hours ago • 22
view article Article A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes Aug 17, 2022 • 128
view article Article GGML and llama.cpp join HF to ensure the long-term progress of Local AI +4 29 days ago • 488
view article Article I Let a Lobster Run My Jetson: What OpenClaw Taught Me About the Future of Computing 29 days ago • 15
view article Article From Golden Gate Bridge to Broken JSON: Why Anthropic's SAE Steering Fails for Structured Output Feb 7 • 22
Beyond Transcription: Mechanistic Interpretability in ASR Paper • 2508.15882 • Published Aug 21, 2025 • 87
view article Article Making LLMs Smaller Without Breaking Them: A GLU-Aware Pruning Approach Nov 24, 2024 • 20
Quartet II: Accurate LLM Pre-Training in NVFP4 by Improved Unbiased Gradient Estimation Paper • 2601.22813 • Published Jan 30 • 60