Cornmonster RandomXiong commited on
Commit
bea151f
·
verified ·
1 Parent(s): 810d0e8

Create README.md (#1)

Browse files

- Create README.md (aeefe970c959954b8c19a7746eaecd209aa093e0)
- Update README.md (d41150b23b345ac5597329f2000e7e30fe2c1670)
- Update README.md (5cceab6699ec08df6404ad6fc16ef7eac37d02c1)


Co-authored-by: XIONG CHAO <RandomXiong@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +36 -0
README.md ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+
5
+ ## Deepseek-V3.1-W4AFP8
6
+
7
+ ## Model Overview
8
+ - **Model Architecture:** DeepseekV3ForCausalLM
9
+ - **Input:** Text
10
+ - **Output:** Text
11
+ - **Model Optimizations:**
12
+ - **Dense Weight quantization:** FP8
13
+ - **MOE Weight quantization:** INT4
14
+ - **Activation quantization:** FP8
15
+ - **Release Date:** 25/10/2025
16
+ - **Version:** 1.0
17
+
18
+ Quantized version of [deepseek-ai/DeepSeek-V3.1](https://huggingface.co/deepseek-ai/DeepSeek-V3.1)
19
+
20
+
21
+ | Model| MMLU |
22
+ |-------|-------|
23
+ | novita/Deepseek-V3.1-W4AFP8 | 0.8680 |
24
+
25
+ ### Model Optimizations
26
+ These models were obtained by quantizing the weights and activations of DeepSeek models to mixed-precision data types (W4(int)A(FP)8 for MoE layers and FP8 for dense layers).
27
+ This optimization reduces the number of bits per parameter 4/8, significantly reducing GPU memory requirements.
28
+
29
+ ## Use with SGLANG
30
+ This model can be deployed efficiently using the SGLANG backend with only H200x4, as shown in the example below.
31
+ ```bash
32
+ python -m sglang.launch_server --model novita/Deepseek-V3.1-W4AFP8 --mem-fraction-static 0.85 --disable-shared-experts-fusion --tp-size 4
33
+ ```
34
+
35
+
36
+