README.md · DavidAU/Qwen3-48B-A4B-Savant-Commander-GATED-12x-Closed-Open-Source-Distill-GGUF at db82d94c2848ff802c60b474a9116da928f79a24

Qwen3-48B-A4B-Savant-Commander-GATED-12x-Closed-Open-Source-Distill-GGUF / README.md

DavidAU

Update README.md

02fd4bf verified 8 days ago

preview code

raw

history blame

8.19 kB

metadata

license: apache-2.0
base_model:
  - janhq/Jan-v1-2509
  - TeichAI/Qwen3-4B-Thinking-2507-GPT-5.1-High-Reasoning-Distill
  - TeichAI/Qwen3-4B-Thinking-2507-Gemini-3-Pro-Preview-High-Reasoning-Distill
  - TeichAI/Qwen3-4B-Thinking-2507-Claude-4.5-Opus-High-Reasoning-Distill
  - Liontix/Qwen3-4B-Claude-Sonnet-4-Reasoning-Distill-Safetensor
  - TeichAI/Qwen3-4B-Thinking-2507-Kimi-K2-Thinking-Distill
  - TeichAI/Qwen3-4B-Thinking-2507-Gemini-2.5-Flash-Distill
  - TeichAI/Qwen3-4B-Thinking-2507-Gemini-2.5-Flash-Lite-Preview-Distill
  - Jackrong/gpt-oss-120b-Distill-Qwen3-4B-Thinking
  - TeichAI/Qwen3-4B-Thinking-2507-GLM-4.6-Distill
  - angelchen/Qwen3-4B-Open-R1-Distill_1
  - TeichAI/Qwen3-4B-Thinking-2507-Command-A-Reasoning-Distill
  - janhq/Jan-v1-4B
tags:
  - 256k context
  - Qwen3
  - Mixture of Experts
  - MOE
  - MOE Dense
  - 2 experts
  - 4Bx12
  - All use cases
  - bfloat16
  - merge
pipeline_tag: text-generation
language:
  - en
library_name: transformers

Qwen3-48B-A4B-Savant-Commander-GATED-12x-Closed-Open-Source-Distill

Savant Commander is a specialized MOE model that allows you to control which expert(s) are assigned to your use case(s) / prompt(s).

The model is composed of 12 DISTILLS (compressed 12x4B MOE) of top closed (GPT5.1, OpenAI 120 GPT Oss, Gemini, Claude) and open source models (Kimi, GLM, Deepseek, Command-A, Jan) all in one.

256k Context, 2 experts activated.

Ask it about Orbital Mechanics and prepared to be "schoooled".

Uploading one test quant (q4KS) for the time being, more to follow.

HOW TO ACCESS the EXPERTS:

In your prompts simply add the name(s) of the model(s)/expert(s) you want assigned.

Here is the list [no quotes]:

"Gemini" [activates all 3 Gemini distills]
"Claude" [activates both Claude distills]
"JanV1"
"CommandA"
"OPENR1"
"GLM"
"Kimi"
"GPTOSS"
"GPT51"

To access groups use:

"AllAI" [all ais]
"Closed-AI" [only closed source]
"Open-AI" [only open source]

Access like:

Gemini, Tell me a horror story.

GLM and JanV1, write me a horror story.

Note the name[s] must be in the prompt and/or the system role and can be located anywhere in the prompt / system role.

You MAY want to increase the number of active experts in some cases.

IMPORTANT:

Min Quant of Q4ks (non imatrix) or IQ3_M (imatrix) ; otherwise it will "snap".
Higher quants will result in much stronger performance.
4-8k context window min, temp .7 [higher/lower is okay]
2-3 regens -> as each will be VERY DIFFERENT due to model design.
You can use 1 expert or up to 12... token/second will drop the more you activate.

ENJOY.

DETAILS:

This is a DENSE MOE (12 X 4B) - Mixture of Expert model; using the strongest Qwen3 4B DISTILL models available with 2 experts activated by default, however you can activate up to all 12 experts if you need the extra "brainpower".

This allows you to run the model at 4, 8, 12, 16, 20, 24 and up to 48B "power levels" as needed.

Even at 1 expert activated (4B parameters/mixed), this model is very strong.

This is a full "thinking" / "reasoning" model.

NOTE: Due to compression during the "MOEing" process, actual size of the model is SMALLER than a typical 48B model.

Meet the Team: Mixture of Experts Models

This model is comprised of the following 12 models ("the experts") (in full):

https://huggingface.co/janhq/Jan-v1-2509

The mixture of experts is set at TWO experts, but you can use 2, 3, 4, 5, or 6...12 ... 24 (?!)

This "team" has a Captain (first listed model), and then all the team members contribute to the to "token" choice billions of times per second. Note the Captain also contributes too.

Think of 2, 3 or 4 (or more) master chefs in the kitchen all competing to make the best dish for you.

This results in higher quality generation.

This also results in many cases in higher quality instruction following too.

That means the power of every model is available during instruction and output generation.

CHANGING THE NUMBER OF EXPERTS:

You can set the number of experts in LMStudio (https://lmstudio.ai) at the "load" screen and via other apps/llm apps by setting "Experts" or "Number of Experts".

For Text-Generation-Webui (https://github.com/oobabooga/text-generation-webui) you set the number of experts at the loading screen page.

For KolboldCPP (https://github.com/LostRuins/koboldcpp) Version 1.8+ , on the load screen, click on "TOKENS", you can set experts on this page, and the launch the model.

For server.exe / Llama-server.exe (Llamacpp - https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md ) add the following to the command line to start the "llamacpp server" (CLI):

"--override-kv llama.expert_used_count=int:6"

(no quotes, where "6" is the number of experts to use)

When using "API", you set the "num_experts_used" in the JSON payload (this maybe different for different back ends).

CREDITS:

Special thanks to all the model makers / creators listed above.

Please visit each repo above to see what model(s) contributed to each of models above and/or to learn more about the models from the model makers.

Special credit goes to MERGEKIT, without you this project / model would not have been possible.

[ https://github.com/arcee-ai/mergekit ]

Settings: CHAT / ROLEPLAY and/or SMOOTHER operation of this model:

In "KoboldCpp" or "oobabooga/text-generation-webui" or "Silly Tavern" ;

Set the "Smoothing_factor" to 1.5

: in KoboldCpp -> Settings->Samplers->Advanced-> "Smooth_F"

: in text-generation-webui -> parameters -> lower right.

: In Silly Tavern this is called: "Smoothing"

NOTE: For "text-generation-webui"

-> if using GGUFs you need to use "llama_HF" (which involves downloading some config files from the SOURCE version of this model)

Source versions (and config files) of my models are here:

https://huggingface.co/collections/DavidAU/d-au-source-files-for-gguf-exl2-awq-gptq-hqq-etc-etc-66b55cb8ba25f914cbf210be

OTHER OPTIONS:

Increase rep pen to 1.1 to 1.15 (you don't need to do this if you use "smoothing_factor")
If the interface/program you are using to run AI MODELS supports "Quadratic Sampling" ("smoothing") just make the adjustment as noted.

Highest Quality Settings / Optimal Operation Guide / Parameters and Samplers

This a "Class 1" model:

For all settings used for this model (including specifics for its "class"), including example generation(s) and for advanced settings guide (which many times addresses any model issue(s)), including methods to improve model performance for all use case(s) as well as chat, roleplay and other use case(s) please see:

[ https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters ]

You can see all parameters used for generation, in addition to advanced parameters and samplers to get the most out of this model here:

[ https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters ]

Example Generation:

2 experts, Temp .7, topk 40, top p .95, min p .05, rep pen 1.05,

QUANT: Q4KS, Lmstudio.