Cornmonster's picture
Create README.md (#1)
70f8707 verified
---
license: apache-2.0
---
## Deepseek-R1-0528-W4AFP8
## Model Overview
- **Model Architecture:** DeepseekV3ForCausalLM
- **Input:** Text
- **Output:** Text
- **Model Optimizations:**
- **Dense Weight quantization:** FP8
- **MOE Weight quantization:** INT4
- **Activation quantization:** FP8
- **Release Date:** 25/10/2025
- **Version:** 1.0
Quantized version of [deepseek-ai/Deepseek-R1-0528](https://huggingface.co/deepseek-ai/Deepseek-R1-0528)
| Model| MMLU |
|-------|-------|
| novita/Deepseek-R1-0528-W4AFP8 | 0.8728 |
### Model Optimizations
These models were obtained by quantizing the weights and activations of DeepSeek models to mixed-precision data types (W4(int)A(FP)8 for MoE layers and FP8 for dense layers).
This optimization reduces the number of bits per parameter 4/8, significantly reducing GPU memory requirements.
## Use with SGLANG
This model can be deployed efficiently using the SGLANG backend with only H200x4, as shown in the example below.
```bash
python -m sglang.launch_server --model novita/Deepseek-R1-0528-W4AFP8 --mem-fraction-static 0.85 --disable-shared-experts-fusion --tp-size 4
```