Updates

  • 5/18/2026: I've uploaded new quants that include the MTP Tensors (@ Q8_0).
  • 3/10/2026: I've uploaded new quants using the new fused Up + Gate conversion, this offers up to a +10% boost in prompt processing speed from my testing.

Description

This repo contains specialized MoE-quants for Qwen3.5-122B-A10B. The idea being that given the huge size of the FFN tensors compared to the rest of the tensors in the model, it should be possible to achieve a better quality while keeping the overall size of the entire model smaller compared to a similar naive quantization. To that end, the quantization type default is kept in high quality and the FFN UP + FFN GATE tensors are quanted down along with the FFN DOWN tensors.

Quant Size Mixture PPL 1-(Mean PPL(Q)/PPL(base)) KLD
Q8_0 123.44 GiB (8.51 BPW) Q8_0 4.817202 ยฑ 0.028383 +0.0132% 0.003675 ยฑ 0.000034
Q5_K_M 87.74 GiB (6.05 BPW) Q8_0 / Q5_K / Q5_K / Q6_K 4.823697 ยฑ 0.028442 +0.1481% 0.005549 ยฑ 0.000042
Q4_K_M 73.96 GiB (5.10 BPW) Q8_0 / Q4_K / Q4_K / Q5_K 4.829970 ยฑ 0.028461 +0.2783% 0.010420 ยฑ 0.000078
IQ4_XS 58.77 GiB (4.05 BPW) Q8_0 / IQ3_S / IQ3_S / IQ4_XS 4.914083 ยฑ 0.028949 +2.0246% 0.027803 ยฑ 0.000206
IQ3_S 45.87 GiB (3.16 BPW) Q6_K / IQ2_S / IQ2_S / IQ3_S 5.128301 ยฑ 0.030530 +6.4722% 0.074536 ยฑ 0.000521
IQ2_XXS 34.08 GiB (2.35 BPW) Q4_K / IQ2_XXS / IQ2_XXS / IQ2_XXS 5.733499 ยฑ 0.035073 +19.0371% 0.185455 ยฑ 0.001112

kld_graph ppl_graph

Downloads last month
1,099
GGUF
Model size
125B params
Architecture
qwen35moe
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for AesSedai/Qwen3.5-122B-A10B-GGUF

Quantized
(112)
this model