Model Card for Qwen-SEA-LION-v4.5-27B-IT-SpecDecoder
Last update: 2026-05-19
SEA-LION is a collection of Large Language Models (LLMs) which have been pretrained and instruct-tuned for the Southeast Asia (SEA) region.
Qwen-SEA-LION-v4.5-27B-IT-SpecDecoder is a draft model using speculative decoding method to employ a lightweight block diffusion model to draft multiple tokens in parallel trained from Qwen-SEA-LION-v4.5-27B-IT. This is the drafter model, which must be paired with aisingapore/Qwen-SEA-LION-v4.5-27B-IT.
Model Details
Model Description
SEA-LION stands for Southeast Asian Languages In One Network.
We performed training on top of z-lab/Qwen3.6-27B-DFlash using the DFlash diffusion algorithm with forward KL divergence loss using aisingapore/Qwen-SEA-LION-v4.5-27B-IT as the target model.
- Developed by: AI Products Pillar, AI Singapore
- Funded by: Singapore NRF
- Shared by: AI Products Pillar, AI Singapore
- Model type: drafter for speculative decoding
- Training Stage: SFT
- License: MIT
- Target model: aisingapore/Qwen-SEA-LION-v4.5-27B-IT
Model Sources
- Repository: SEA-LION v4.5 - an aisingapore Collection
Uses
Out-of-Scope Use
The model has not been aligned for safety. Developers and users should perform their own safety fine-tuning and related security measures. In no event shall the authors be held liable for any claims, damages, or other liabilities arising from the use of the released weights and codes.
Bias, Risks, and Limitations
The model was not tested for robustness against adversarial prompting. It is important for users to be aware that our model exhibits certain limitations that warrant consideration. Like many LLMs, the model can hallucinate and occasionally generates irrelevant content, introducing fictional elements that are not grounded in the provided context. Users should also exercise caution in interpreting and validating the model's responses due to the potential inconsistencies.
How to Get Started with the Model
Use the code below to get started with the model with vLLM.
CUDA_VISIBLE_DEVICES=0 vllm serve aisingapore/Qwen-SEA-LION-v4.5-27B-IT \
--speculative-config '{"method": "dflash", "model": "aisingapore/Qwen-SEA-LION-v4.5-27B-IT-SpecDecoder", "num_speculative_tokens": 16}' \
--attention-backend flash_attn \
--max-num-batched-tokens 32768 \
--gdn-prefill-backend triton
Training Details
Training Data
The instruction fine-tuning text dataset comprises of a collection of OSS & synthetic data.
Training Procedure
Training Hyperparameters
Training regime: Our workflow consists of instruction fine-tuning and model merging.
Evaluation
Testing Data, Factors & Metrics
We evaluated Qwen-SEA-LION-v4.5-27B-IT-SpecDecoder on various dataset using sampled 128 prompts from https://huggingface.co/datasets/aisingapore/SEA-Instruct-2602 .
| Dataset | Baselineaisingapore/Qwen-SEA-LION-v4.5-27B-IT |
w/ MTP | w/ DFlashz-lab/Qwen3.6-27B-Dflash |
w/ SpecDecoderaisingapore/Qwen-SEA-LION-v4.5-27B-IT-SpecDecoder |
|---|---|---|---|---|
| gsm8k | 64.55 tok/s | 149.19 tok/s 2.31x |
283.67 tok/s 4.39x |
324.32 tok/s 5.02x |
| math500 | 65.91 tok/s | 153.19 tok/s 2.32x |
306.96 tok/s 4.66x |
335.22 tok/s 5.09x |
| humaneval | 66.03 tok/s | 155.52 tok/s 2.36x |
374.13 tok/s 5.67x |
397.11 tok/s 6.01x |
| mbpp | 66.44 tok/s | 148.69 tok/s 2.24x |
235.37 tok/s 3.54x |
260.46 tok/s 3.92x |
| mt-bench | 66.40 tok/s | 136.89 tok/s 2.06x |
153.81 tok/s 2.32x |
163.37 tok/s 2.46x |
| Tagalog | 66.45 tok/s | 117.06 tok/s 1.76x |
79.91 tok/s 1.20x |
164.53 tok/s 2.48x |
| Burmese | 66.46 tok/s | 134.61 tok/s 2.03x |
85.47 tok/s 1.29x |
266.73 tok/s 4.01x |
| Tamil | 66.48 tok/s | 111.99 tok/s 1.68x |
76.47 tok/s 1.15x |
175.82 tok/s 2.64x |
| Indonesian | 66.47 tok/s | 119.68 tok/s 1.80x |
90.30 tok/s 1.36x |
134.24 tok/s 2.02x |
| Vietnamese | 66.48 tok/s | 112.04 tok/s 1.69x |
87.51 tok/s 1.32x |
133.95 tok/s 2.01x |
| Thai | 66.28 tok/s | 104.81 tok/s 1.58x |
75.37 tok/s 1.14x |
118.49 tok/s 1.79x |
| Chinese | 66.46 tok/s | 130.32 tok/s 1.96x |
113.88 tok/s 1.71x |
133.43 tok/s 2.01x |
| Malay | 66.44 tok/s | 127.31 tok/s 1.92x |
106.24 tok/s 1.60x |
169.91 tok/s 2.56x |
Note that: Concurrency could degrade throughput to all models. All the parameters setting follows default. num_speculative_tokens is 16
Concurrency (Scaling Performance)
| Dataset | 1 | 8 | 16 | 32 | 64 |
|---|---|---|---|---|---|
| gsm8k | 5.02x | 3.84x | 2.90x | 2.01x | 1.60x |
| math500 | 5.09x | 3.77x | 2.89x | 1.75x | 1.29x |
| humaneval | 6.01x | 3.78x | 3.52x | 2.32x | 2.17x |
| mbpp | 3.92x | 3.61x | 2.79x | 1.77x | 1.27x |
| mt-bench | 2.46x | 3.06x | 2.51x | 1.81x | 1.59x |
| Tagalog | 2.48x | 2.32x | 1.76x | 1.09x | 0.77x |
| Burmese | 4.01x | 2.50x | 1.78x | 1.12x | 0.79x |
| Tamil | 2.65x | 2.54x | 1.90x | 1.13x | 0.77x |
| Indonesian | 2.02x | 2.47x | 1.92x | 1.21x | 0.86x |
| Vietnamese | 2.02x | 2.49x | 1.89x | 1.12x | 0.89x |
| Thai | 1.79x | 2.36x | 1.99x | 1.32x | 1.05x |
| Chinese | 2.01x | 2.65x | 2.02x | 1.40x | 0.97x |
| Malay | 2.56x | 2.57x | 2.08x | 1.28x | 0.92x |
Column headers are concurrency values. Prompts are sampled from SEA-Instruct. Each language contains 128 prompts.
More Information
This is the repository for the commercial instruction-tuned model. The model has not been aligned for safety. Developers and users should perform their own safety fine-tuning and related security measures. In no event shall the authors be held liable for any claims, damages, or other liabilities arising from the use of the released weights and codes.
For more info, please contact us at sealion@aisingapore.org
Team
Ahmed Dabeer, Ahn Jeongmi, Antonyrex Sajeban, Chan Hok Teng Adwin, Cheng Zi Yi Nicholas, Choa Hsueh Mei Esther, Heng Jonathan, Huang Yuli, Jann Railey Estrada Montalan, Lee Chwan Ren, Leong Wai Yi, Leong Wei Qi, Liew Rachel, Limkonchotiwat Peerat, Muhammad Ridzuan Bin Mokhtar, Nagarajan Karthik, Ng Boon Cheong Raymond, Ngee Chia Tai, Ngui Jian Gang, Nguyen Thanh Ngan, Ong Tat-Wee David, Ong Zhi Hao, Pereira Mark, Poon Joseph, Rengarajan Hamsawardhini, Siow Wei Kang Bryan, Susanto Yosephine, Sutaveephamochanon Anocha, Tan Choon Meng, Tan Chor Phin Evelyn, Tan Siao Wei Jessica, Tan Yixian, Tee Jun Yun, Teng Kok Wai Walter, Teo Eng Sipp Leslie, Tjhi William, Wu Donghang, Yeo Yeow Tong, Yong Xianbin, Zhang Haoyang, Zhang Zhou
Acknowledgement
This project is supported by the National Research Foundation Singapore and Infocomm Media Development Authority (IMDA), Singapore under its National Large Language Model Funding Initiative.
Contact
- Downloads last month
- 231
Model tree for aisingapore/Qwen-SEA-LION-v4.5-27B-IT-SpecDecoder
Base model
z-lab/Qwen3.6-27B-DFlash