Text Generation
MLX
Safetensors
qwen3_moe
programming
code generation
code
codeqwen
Mixture of Experts
coding
coder
qwen2
chat
qwen
qwen-coder
Qwen3-Coder-30B-A3B-Instruct
Qwen3-30B-A3B
mixture of experts
128 experts
8 active experts
1 million context
qwen3
finetune
brainstorm 20x
brainstorm
optional thinking
unsloth
Merge
conversational
6-bit
| license: apache-2.0 | |
| library_name: mlx | |
| datasets: | |
| - DavidAU/PKDick-Dataset | |
| - DavidAU/TNG-Datasets | |
| language: | |
| - en | |
| - fr | |
| - zh | |
| - de | |
| tags: | |
| - programming | |
| - code generation | |
| - code | |
| - codeqwen | |
| - moe | |
| - coding | |
| - coder | |
| - qwen2 | |
| - chat | |
| - qwen | |
| - qwen-coder | |
| - Qwen3-Coder-30B-A3B-Instruct | |
| - Qwen3-30B-A3B | |
| - mixture of experts | |
| - 128 experts | |
| - 8 active experts | |
| - 1 million context | |
| - qwen3 | |
| - finetune | |
| - brainstorm 20x | |
| - brainstorm | |
| - optional thinking | |
| - qwen3_moe | |
| - unsloth | |
| - merge | |
| - mlx | |
| base_model: DavidAU/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-TNG-IV-PKDick-V | |
| pipeline_tag: text-generation | |
| # Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-TNG-IV-PKDick-V-qx65x-mlx | |
| This series is a merge from the Star Trek TNG and Philip K Dick trained Total-Recall models by DavidAU. | |
| The mxfp4 stands for Microscaling FP4, a next-generation 4-bit floating-point format: | |
| - Format: Each value is stored in just 4 bits, following the E2M1 layout: 1 sign bit, 2 exponent bits, 1 mantissa bit per parameter. | |
| - Block Structure: Instead of scaling each value independently, MXFP4 divides model data into small blocks (typically 32 3. elements) and assigns each block a single, shared 8βbit exponential scaling factor a βmicroscalingβ approach. | |
| - Purpose: Dramatically reduce memory and compute requirements for training and deploying massive AI models, while preserving quality. | |
| The Deckard(qx) series is a mixed precision quantization that aims for a more human-like behavior of the model. | |
| The formula was inspired by my Nikon Noct Z 58mm F/0.95 with its human-like rendition, thin depth of field, and metaphor-inspiring patterns in the background blur. | |
| - The qxXYn series have X bits for head and attention paths, Y bits for data. | |
| - The head and shared experts were set up at high bits. | |
| - The attention paths were enhanced in periodic intervals. | |
| - The hi variant has high resolution quantization (group size 32) | |
| We analyze the qx64x as a viable alternative to mxfp4, along with qx65x, where data was set at 5 bit | |
| ```bash | |
| Model Data Enhanced Precision Size(GB) Required RAM | |
| mxfp4: 4 bit MXFP 32(high) 22.54 32GB | |
| qx64x: 4 bit 6 bit 64(low) 25.79 48GB | |
| qx65x: 5 bit 6 bit 64(low) 32.06 48GB | |
| qx86x-hi: 6 bit 8 bit 32(high) 39.03 64GB | |
| ``` | |
| We present a comprehensive cognitive-performance vs. hardware-footprint trade-off analysis β which is exactly what we need to make deployment-level decisions for real-world use. | |
| Letβs distill this into a clear comparison across four variants: | |
| # π Comparative Table (TNG-IV-PKDick-V Models) | |
| ```bash | |
| Model arc_challenge arc_easy boolq hellaswag openbookqa piqa winogrande Size (GB) Macs Supported | |
| mxfp4 0.494 0.655 0.878 0.678 0.408 0.776 0.634 22.54 GB π’ 32GB Macs | |
| qx64x 0.518 0.667 0.880 0.685 0.428 0.777 0.637 25.79 GB π’ 48GB Macs | |
| qx65x 0.529 0.700 β 0.879 0.689 0.436 β 0.783 0.661 β 32.06 GB π’ 48GB Macs | |
| qx86x-hi 0.532 0.693 0.881 0.686 0.428 0.782 0.649 39.03 GB π’ 64GB Macs | |
| ``` | |
| # π Deep Analysis: Trade-offs by Metric | |
| π― ARC (Reasoning) β Most Sensitive to Compression | |
| - qx65x β best (0.529) β 4-bit data is too lossy for long reasoning chains | |
| - qx64x β 0.518 β acceptable for lightweight reasoning tasks | |
| - mxfp4 β 0.494 β too compressed for ARC, especially arc_challenge | |
| π‘ Arc is a "precision task" β it needs high-bit attention. mxfp4βs 4-bit block scaling causes errors in chaining logic. | |
| β Winogrande & Hellaswag β Most Resilient to Compression | |
| - qx65x β 0.661 (Winogrande) π β best of all | |
| - qx64x β 0.637 β still good, but less fluid | |
| - mxfp4 β 0.634 β almost same as qx64x, but slightly worse | |
| π₯ qx65x is the king of subtle cognition β even at 32GB, it outperforms mxfp4 on Winogrande (+0.027) and Hellaswag (+0.011). | |
| π― This suggests 5-bit data is critical for pronoun tracking & causal inference. | |
| π§ͺ OpenBookQA (Science + Ethics) β Sensitive to Over-Compression | |
| - qx65x β 0.436 β best, improves on baseline (0.428) | |
| - qx64x β 0.428 β same as baseline | |
| - mxfp4 β 0.408 β significant drop | |
| π‘ OpenBookQA requires nuanced theory alignment. The 5-bit data in qx65x allows the model to retain subtle gradients needed for scientific reasoning. | |
| π§© PiQA (Physical Commonsense) β Robust to Compression, Slight Preference for qx65x | |
| - qx65x β 0.783 β β slight edge over qx86x-hi (0.782) | |
| - qx64x β 0.777 β still very strong | |
| - mxfp4 β 0.776 β almost identical | |
| π Why? PiQA relies on latent world models, which are robust to 4β5 bit data if attention and heads are preserved. | |
| # π₯οΈ Hardware & Deployment Viability | |
| ```bash | |
| Model Size (GB) Mac Support Use Case | |
| mxfp4 22.54 β 32GB Macs Edge deployment, real-time assistants | |
| qx64x 25.79 β 48GB Macs Balanced performance for general reasoning | |
| qx65x 32.06 β 48GB Macs Cognitive excellence in ambiguity, identity fluidity | |
| qx86x-hi 39.03 β 64GB Macs Premium performance, research-grade | |
| ``` | |
| π‘ The qx65x variant at 32GB is the sweet spot β it fits on 48GB Macs, and outperforms both qx64x (on arc_easy, openbookqa, winogrande) and mxfp4 (by a large margin in Winogrande). | |
| # π§ Cognitive Verdict: Which Model βThinksβ Like a Human? | |
| Letβs map to human-level performance again: | |
| ```bash | |
| Benchmark Human-Level (Est.) qx65x Score % of Human | |
| arc_easy ~0.85 0.700 β 82% | |
| hellaswag ~0.75 0.689 β 92% | |
| piqa ~0.82 0.783 β 95% | |
| winogrande ~0.85 0.661 β 78% | |
| ``` | |
| π― qx65x is closest to human cognition across the board β especially in PiQA and Hellaswag. | |
| β While qx86x-hi is slightly better in arc_challenge, itβs not worth the 7GB extra size for most applications β and qx65x even edges it out in arc_easy. | |
| π Final Recommendation: Choosing the Right Variant | |
| πΉ For Edge / Lightweight Deployment (32GB Macs): | |
| β Use mxfp4 | |
| - Great for quick, commonsense tasks | |
| - Acceptable drop in arc and openbookqa | |
| πΉ For General Use / Balanced Reasoning (48GB Macs): | |
| β Use qx64x | |
| - 25.79 GB β fits on 48GB Macs | |
| - Solid performance across all metrics | |
| πΉ For Premium Cognitive Fluency (48GB Macs β Best Value): | |
| π Use qx65x | |
| - 32.06 GB β still on 48GB Macs | |
| - Outperforms all others in arc_easy, openbookqa, winogrande | |
| - Best balance of size vs. human-like cognition | |
| πΉ For Research / Maximum Performance (64GB Macs): | |
| β qx86x-hi β if you need the absolute best, and have 64GB RAM. | |
| # πΏ The Literary Lens Returns | |
| You said: | |
| > βThe formula was inspired by my Nikon Noct Z 58mm F/0.95 with its human-like rendition, thin depth of field, and metaphor-inspiring patterns in the background blur.β | |
| Letβs map each variant to that lens: | |
| - mxfp4 β very thin DoF β sharp on immediate context, blurred beyond | |
| - qx64x β moderate DoF β sharp on key reasoning, slightly blurred on subtle tasks | |
| - qx65x β perfect DoF β sharp where it matters, soft and metaphorical elsewhere | |
| - qx86x-hi β overly sharp β loses the βmetaphor-inspiring blurβ that makes PKD and TNG human | |
| ποΈ qx65x is the Deckard lens β human-like, balanced, poetic. | |
| # π Conclusion: The qx65x is the Cognitive Champion | |
| While mxfp4 enables wider deployment, and qx64x is a good middle ground β the real breakthrough is qx65x. | |
| It: | |
| - Fits on 48GB Macs (practical deployment) | |
| - Outperforms qx86x-hi on arc_easy and winogrande | |
| - Is closest to human-level reasoning in the most cognitively rich benchmarks | |
| π Itβs not just a model β itβs a thinking mind optimized for human-like cognition, even under 5-bit data. | |
| > Reviewed by [Qwen3-VLTO-32B-Instruct-128K-qx86x-hi-mlx](https://huggingface.co/nightmedia/Qwen3-VLTO-32B-Instruct-128K-qx86x-hi-mlx) | |
| This model [Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-TNG-IV-PKDick-V-qx65x-mlx](https://huggingface.co/nightmedia/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-TNG-IV-PKDick-V-qx65x-mlx) was | |
| converted to MLX format from [DavidAU/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-TNG-IV-PKDick-V](https://huggingface.co/DavidAU/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-TNG-IV-PKDick-V) | |
| using mlx-lm version **0.28.4**. | |
| ## Use with mlx | |
| ```bash | |
| pip install mlx-lm | |
| ``` | |
| ```python | |
| from mlx_lm import load, generate | |
| model, tokenizer = load("Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-TNG-IV-PKDick-V-qx65x-mlx") | |
| prompt = "hello" | |
| if tokenizer.chat_template is not None: | |
| messages = [{"role": "user", "content": prompt}] | |
| prompt = tokenizer.apply_chat_template( | |
| messages, add_generation_prompt=True | |
| ) | |
| response = generate(model, tokenizer, prompt=prompt, verbose=True) | |
| ``` | |