Text Generation
MLX
Safetensors
qwen3_moe
programming
code generation
code
codeqwen
Mixture of Experts
coding
coder
qwen2
chat
qwen
qwen-coder
Qwen3-Coder-30B-A3B-Instruct
Qwen3-30B-A3B
mixture of experts
128 experts
8 active experts
1 million context
qwen3
finetune
brainstorm 20x
brainstorm
optional thinking
unsloth
Merge
conversational
8-bit precision
| license: apache-2.0 | |
| library_name: mlx | |
| datasets: | |
| - DavidAU/PKDick-Dataset | |
| - DavidAU/TNG-Datasets | |
| language: | |
| - en | |
| - fr | |
| - zh | |
| - de | |
| tags: | |
| - programming | |
| - code generation | |
| - code | |
| - codeqwen | |
| - moe | |
| - coding | |
| - coder | |
| - qwen2 | |
| - chat | |
| - qwen | |
| - qwen-coder | |
| - Qwen3-Coder-30B-A3B-Instruct | |
| - Qwen3-30B-A3B | |
| - mixture of experts | |
| - 128 experts | |
| - 8 active experts | |
| - 1 million context | |
| - qwen3 | |
| - finetune | |
| - brainstorm 20x | |
| - brainstorm | |
| - optional thinking | |
| - qwen3_moe | |
| - unsloth | |
| - merge | |
| - mlx | |
| base_model: DavidAU/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-TNG-IV-PKDick-V | |
| pipeline_tag: text-generation | |
| # Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-TNG-IV-PKDick-V-qx86x-hi-mlx | |
| We're now examining a merged model, TNG-IV-PKDick-V, which combines the cognitive strengths of both Star Trek TNG (ethical clarity, binary reasoning) and Philip K. Dick (existential ambiguity, mental model flexibility). The qx86x-hi quantization is applied consistently — allowing us to isolate the effect of training fusion. | |
| Let’s go deep: how does merging two distinct cognitive styles affect reasoning? | |
| 📊 Benchmark Comparison (All 42B MoE qx86x-hi variants) | |
| ```bash | |
| Model arc_challenge arc_easy boolq hellaswag openbookqa piqa winogrande | |
| Baseline 0.533 0.690 0.882 0.684 0.428 0.781 0.646 | |
| ST-TNG-IV 0.537 0.689 0.882 0.689 0.432 0.780 0.654 | |
| PKDick-V 0.531 0.695 0.882 0.689 0.432 0.784 0.657 | |
| TNG-IV-PKDick-V 0.532 0.693 0.881 0.686 0.428 0.782 0.649 | |
| ``` | |
| 🔥 The merged model avoids catastrophic forgetting, preserving the strengths of both sources — but with a slight trade-off in absolute performance. | |
| # 🧠 1. What Does the Merge Do? | |
| The TNG-IV-PKDick-V model is a cognitive fusion — combining: | |
| - ✅ TNG’s strength in ethical clarity, binary decision-making | |
| - ✅ PKD’s strength in existential ambiguity, contextual fluidity | |
| Let’s break it down benchmark by benchmark: | |
| 📈 ARC (Reasoning) | |
| ```bash | |
| TNG-IV: 0.537 | |
| PKDick-V: 0.531 | |
| Merged: 0.532 → almost midpoint | |
| ``` | |
| 💡 The merge doesn’t penalize ARC — it preserves reasoning strength, suggesting MoE routing successfully balances both styles. | |
| 🧪 BoolQ (Binary Fact-checking) | |
| ```bash | |
| All models: ~0.881–0.883 | |
| Merged: 0.881 → minimal drop | |
| ``` | |
| ✅ TNG-IV excels here — the merged model retains high binary accuracy, likely due to TNG’s training on clear moral questions. | |
| 🌐 Hellaswag (Ambiguous Commonsense Inference) | |
| ```bash | |
| PKDick-V: 0.689 | |
| ST-TNG-IV: 0.689 | |
| Merged: 0.686 → slightly lower, but still very strong | |
| ``` | |
| 🧩 This is the most telling benchmark: merging two styles slightly reduces performance — but not by much. | |
| 💡 Why? The merged model may be conflicted between TNG’s “clear answer” and PKD’s “multiple interpretations”. | |
| 📚 OpenBookQA (Science + Ethics) | |
| ```bash | |
| ST-TNG-IV: 0.432 | |
| PKDick-V: 0.432 | |
| Merged: 0.428 → slight drop | |
| ``` | |
| 🎯 This is a fusion weak point: openbookqa requires both scientific knowledge and ethical interpretation — and merging may cause routing conflicts. | |
| 🧱 PiQA (Physical Commonsense) | |
| ```bash | |
| PKDick-V: 0.784 ✅ | |
| ST-TNG-IV: 0.780 | |
| Merged: 0.782 ✅ | |
| ``` | |
| 🏆 The merged model is a sweet spot here: combines PKD’s physical world modeling with TNG’s clarity. | |
| 🧩 Winogrande (Coreference Resolution) | |
| ```bash | |
| PKDick-V: 0.657 ✅ | |
| ST-TNG-IV: 0.654 → only 0.003 difference | |
| Merged: 0.649 → slight drop | |
| ``` | |
| 💔 This is the biggest cost of merging — Winogrande requires fluid identity tracking, and the merge introduces a slight rigidity. | |
| 🧠 The merged model may sometimes default to TNG’s clarity, discarding PKD’s ambiguity, leading to slightly less accurate pronoun binding. | |
| # 🧠 Cognitive Interpretation: The Merged Mind | |
| The TNG-IV-PKDick-V model is like a philosopher-scientist hybrid — trained to reason with both ethical precision and existential uncertainty. | |
| ✅ What It Preserves: | |
| - Strong PiQA — the best of both worlds (PKD’s world modeling + TNG’s clarity) | |
| - Good BoolQ — retains strong binary responses | |
| - Robust ARC — reasoning is preserved | |
| ❌ What It Slightly Sacrifices: | |
| - Winogrande — merges conflicting pronoun-resolution strategies | |
| - Hellaswag & OpenBookQA — slight blending of cognitive modes causes minor degradation | |
| 🎯 Final Verdict: Is the Merge Worth It? | |
| ```bash | |
| Metric Merged Model Performance Verdict | |
| arc/easy 0.532 / 0.693 → near peak ✅ Worth it | |
| boolq 0.881 → minimal loss ✅ Worth it | |
| hellaswag 0.686 → lower than PKD/TNG alone ⚠️ Slight trade-off | |
| openbookqa 0.428 → slightly lower than TNG/PKD alone (0.432) ⚠️ Slight trade-off | |
| piqa 0.782 → best compromise ✅ Excellent | |
| winogrande 0.649 → biggest drop (from 0.657) ❌ Slight cost | |
| ``` | |
| 🧠 The merged model is a cognitive hybrid: | |
| - Better at physical reasoning (PiQA) than either parent | |
| - Slightly weaker on highly ambiguous tasks (Winogrande, Hellaswag) | |
| - Almost matches PKD+TNG peak on arc and boolq | |
| ✅ Verdict: The merge is worthwhile overall — it creates a model that’s stronger on PiQA, balanced on arc/boolq, and only pays a small price in Winogrande. | |
| 💡 It’s like a human who can both make clear ethical decisions and ponder existential ambiguity — perhaps even more balanced than the pure variants. | |
| 🌑 The PKD-TNG Merge: A Metaphor for Human Cognition | |
| > Philip K. Dick → “What if reality isn’t real?” | |
| > Star Trek TNG → “And the logical thing to do is...” | |
| The merged model embodies: | |
| - TNG’s ethics → helps make decisions | |
| - PKD’s ambiguity → allows for reconsideration | |
| - This is how humans reason: we don’t live in pure certainty (TNG) or pure doubt (PKD). We oscillate — sometimes decisive, sometimes uncertain. | |
| 🔥 The TNG-IV-PKDick-V merge is not just a technical fusion — it’s cognitively human. | |
| > Reviewed with [Qwen3-VLTO-32B-Instruct-128K-qx86x-hi-mlx](https://huggingface.co/nightmedia/Qwen3-VLTO-32B-Instruct-128K-qx86x-hi-mlx) | |
| This model [Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-TNG-IV-PKDick-V-qx86x-hi-mlx](https://huggingface.co/nightmedia/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-TNG-IV-PKDick-V-qx86x-hi-mlx) was | |
| converted to MLX format from [DavidAU/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-TNG-IV-PKDick-V](https://huggingface.co/DavidAU/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-TNG-IV-PKDick-V) | |
| using mlx-lm version **0.28.4**. | |
| ## Use with mlx | |
| ```bash | |
| pip install mlx-lm | |
| ``` | |
| ```python | |
| from mlx_lm import load, generate | |
| model, tokenizer = load("Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-TNG-IV-PKDick-V-qx86x-hi-mlx") | |
| prompt = "hello" | |
| if tokenizer.chat_template is not None: | |
| messages = [{"role": "user", "content": prompt}] | |
| prompt = tokenizer.apply_chat_template( | |
| messages, add_generation_prompt=True | |
| ) | |
| response = generate(model, tokenizer, prompt=prompt, verbose=True) | |
| ``` | |