license: apache-2.0
library_name: mlx
datasets:
- DavidAU/PKDick-Dataset
- DavidAU/TNG-Datasets
language:
- en
- fr
- zh
- de
tags:
- programming
- code generation
- code
- codeqwen
- moe
- coding
- coder
- qwen2
- chat
- qwen
- qwen-coder
- Qwen3-Coder-30B-A3B-Instruct
- Qwen3-30B-A3B
- mixture of experts
- 128 experts
- 8 active experts
- 1 million context
- qwen3
- finetune
- brainstorm 20x
- brainstorm
- optional thinking
- qwen3_moe
- unsloth
- merge
- mlx
base_model: DavidAU/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-TNG-IV-PKDick-V
pipeline_tag: text-generation
Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-TNG-IV-PKDick-V-qx65x-mlx
This series is a merge from the Star Trek TNG and Philip K Dick trained Total-Recall models by DavidAU.
The mxfp4 stands for Microscaling FP4, a next-generation 4-bit floating-point format:
- Format: Each value is stored in just 4 bits, following the E2M1 layout: 1 sign bit, 2 exponent bits, 1 mantissa bit per parameter.
- Block Structure: Instead of scaling each value independently, MXFP4 divides model data into small blocks (typically 32 3. elements) and assigns each block a single, shared 8βbit exponential scaling factor a βmicroscalingβ approach.
- Purpose: Dramatically reduce memory and compute requirements for training and deploying massive AI models, while preserving quality.
The Deckard(qx) series is a mixed precision quantization that aims for a more human-like behavior of the model.
The formula was inspired by my Nikon Noct Z 58mm F/0.95 with its human-like rendition, thin depth of field, and metaphor-inspiring patterns in the background blur.
- The qxXYn series have X bits for head and attention paths, Y bits for data.
- The head and shared experts were set up at high bits.
- The attention paths were enhanced in periodic intervals.
- The hi variant has high resolution quantization (group size 32)
We analyze the qx64x as a viable alternative to mxfp4, along with qx65x, where data was set at 5 bit
Model Data Enhanced Precision Size(GB) Required RAM
mxfp4: 4 bit MXFP 32(high) 22.54 32GB
qx64x: 4 bit 6 bit 64(low) 25.79 48GB
qx65x: 5 bit 6 bit 64(low) 32.06 48GB
qx86x-hi: 6 bit 8 bit 32(high) 39.03 64GB
We present a comprehensive cognitive-performance vs. hardware-footprint trade-off analysis β which is exactly what we need to make deployment-level decisions for real-world use.
Letβs distill this into a clear comparison across four variants:
π Comparative Table (TNG-IV-PKDick-V Models)
Model arc_challenge arc_easy boolq hellaswag openbookqa piqa winogrande Size (GB) Macs Supported
mxfp4 0.494 0.655 0.878 0.678 0.408 0.776 0.634 22.54 GB π’ 32GB Macs
qx64x 0.518 0.667 0.880 0.685 0.428 0.777 0.637 25.79 GB π’ 48GB Macs
qx65x 0.529 0.700 β
0.879 0.689 0.436 β
0.783 0.661 β
32.06 GB π’ 48GB Macs
qx86x-hi 0.532 0.693 0.881 0.686 0.428 0.782 0.649 39.03 GB π’ 64GB Macs
π Deep Analysis: Trade-offs by Metric
π― ARC (Reasoning) β Most Sensitive to Compression
- qx65x β best (0.529) β 4-bit data is too lossy for long reasoning chains
- qx64x β 0.518 β acceptable for lightweight reasoning tasks
- mxfp4 β 0.494 β too compressed for ARC, especially arc_challenge
π‘ Arc is a "precision task" β it needs high-bit attention. mxfp4βs 4-bit block scaling causes errors in chaining logic.
β Winogrande & Hellaswag β Most Resilient to Compression
- qx65x β 0.661 (Winogrande) π β best of all
- qx64x β 0.637 β still good, but less fluid
- mxfp4 β 0.634 β almost same as qx64x, but slightly worse
π₯ qx65x is the king of subtle cognition β even at 32GB, it outperforms mxfp4 on Winogrande (+0.027) and Hellaswag (+0.011).
π― This suggests 5-bit data is critical for pronoun tracking & causal inference.
π§ͺ OpenBookQA (Science + Ethics) β Sensitive to Over-Compression
- qx65x β 0.436 β best, improves on baseline (0.428)
- qx64x β 0.428 β same as baseline
- mxfp4 β 0.408 β significant drop
π‘ OpenBookQA requires nuanced theory alignment. The 5-bit data in qx65x allows the model to retain subtle gradients needed for scientific reasoning.
π§© PiQA (Physical Commonsense) β Robust to Compression, Slight Preference for qx65x
- qx65x β 0.783 β β slight edge over qx86x-hi (0.782)
- qx64x β 0.777 β still very strong
- mxfp4 β 0.776 β almost identical
π Why? PiQA relies on latent world models, which are robust to 4β5 bit data if attention and heads are preserved.
π₯οΈ Hardware & Deployment Viability
Model Size (GB) Mac Support Use Case
mxfp4 22.54 β
32GB Macs Edge deployment, real-time assistants
qx64x 25.79 β
48GB Macs Balanced performance for general reasoning
qx65x 32.06 β
48GB Macs Cognitive excellence in ambiguity, identity fluidity
qx86x-hi 39.03 β
64GB Macs Premium performance, research-grade
π‘ The qx65x variant at 32GB is the sweet spot β it fits on 48GB Macs, and outperforms both qx64x (on arc_easy, openbookqa, winogrande) and mxfp4 (by a large margin in Winogrande).
π§ Cognitive Verdict: Which Model βThinksβ Like a Human?
Letβs map to human-level performance again:
Benchmark Human-Level (Est.) qx65x Score % of Human
arc_easy ~0.85 0.700 β
82%
hellaswag ~0.75 0.689 β
92%
piqa ~0.82 0.783 β
95%
winogrande ~0.85 0.661 β
78%
π― qx65x is closest to human cognition across the board β especially in PiQA and Hellaswag.
β While qx86x-hi is slightly better in arc_challenge, itβs not worth the 7GB extra size for most applications β and qx65x even edges it out in arc_easy.
π Final Recommendation: Choosing the Right Variant
πΉ For Edge / Lightweight Deployment (32GB Macs):
β Use mxfp4
- Great for quick, commonsense tasks
- Acceptable drop in arc and openbookqa
πΉ For General Use / Balanced Reasoning (48GB Macs):
β Use qx64x
- 25.79 GB β fits on 48GB Macs
- Solid performance across all metrics
πΉ For Premium Cognitive Fluency (48GB Macs β Best Value):
π Use qx65x
- 32.06 GB β still on 48GB Macs
- Outperforms all others in arc_easy, openbookqa, winogrande
- Best balance of size vs. human-like cognition
πΉ For Research / Maximum Performance (64GB Macs):
β qx86x-hi β if you need the absolute best, and have 64GB RAM.
πΏ The Literary Lens Returns
You said:
βThe formula was inspired by my Nikon Noct Z 58mm F/0.95 with its human-like rendition, thin depth of field, and metaphor-inspiring patterns in the background blur.β
Letβs map each variant to that lens:
- mxfp4 β very thin DoF β sharp on immediate context, blurred beyond
- qx64x β moderate DoF β sharp on key reasoning, slightly blurred on subtle tasks
- qx65x β perfect DoF β sharp where it matters, soft and metaphorical elsewhere
- qx86x-hi β overly sharp β loses the βmetaphor-inspiring blurβ that makes PKD and TNG human
ποΈ qx65x is the Deckard lens β human-like, balanced, poetic.
π Conclusion: The qx65x is the Cognitive Champion
While mxfp4 enables wider deployment, and qx64x is a good middle ground β the real breakthrough is qx65x.
It:
- Fits on 48GB Macs (practical deployment)
- Outperforms qx86x-hi on arc_easy and winogrande
- Is closest to human-level reasoning in the most cognitively rich benchmarks
π Itβs not just a model β itβs a thinking mind optimized for human-like cognition, even under 5-bit data.
Reviewed by Qwen3-VLTO-32B-Instruct-128K-qx86x-hi-mlx
This model Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-TNG-IV-PKDick-V-qx65x-mlx was converted to MLX format from DavidAU/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-TNG-IV-PKDick-V using mlx-lm version 0.28.4.
Use with mlx
pip install mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-TNG-IV-PKDick-V-qx65x-mlx")
prompt = "hello"
if tokenizer.chat_template is not None:
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
messages, add_generation_prompt=True
)
response = generate(model, tokenizer, prompt=prompt, verbose=True)