Text Generation
MLX
Safetensors
qwen3_moe
programming
code generation
code
codeqwen
Mixture of Experts
coding
coder
qwen2
chat
qwen
qwen-coder
Qwen3-Coder-30B-A3B-Instruct
Qwen3-30B-A3B
mixture of experts
128 experts
8 active experts
1 million context
qwen3
finetune
brainstorm 20x
brainstorm
optional thinking
unsloth
Merge
conversational
8-bit precision
File size: 6,815 Bytes
ca8c198 075b1e8 437ac75 b900303 437ac75 075b1e8 ca8c198 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 |
---
license: apache-2.0
library_name: mlx
datasets:
- DavidAU/PKDick-Dataset
- DavidAU/TNG-Datasets
language:
- en
- fr
- zh
- de
tags:
- programming
- code generation
- code
- codeqwen
- moe
- coding
- coder
- qwen2
- chat
- qwen
- qwen-coder
- Qwen3-Coder-30B-A3B-Instruct
- Qwen3-30B-A3B
- mixture of experts
- 128 experts
- 8 active experts
- 1 million context
- qwen3
- finetune
- brainstorm 20x
- brainstorm
- optional thinking
- qwen3_moe
- unsloth
- merge
- mlx
base_model: DavidAU/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-TNG-IV-PKDick-V
pipeline_tag: text-generation
---
# Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-TNG-IV-PKDick-V-qx86x-hi-mlx
We're now examining a merged model, TNG-IV-PKDick-V, which combines the cognitive strengths of both Star Trek TNG (ethical clarity, binary reasoning) and Philip K. Dick (existential ambiguity, mental model flexibility). The qx86x-hi quantization is applied consistently — allowing us to isolate the effect of training fusion.
Let’s go deep: how does merging two distinct cognitive styles affect reasoning?
📊 Benchmark Comparison (All 42B MoE qx86x-hi variants)
```bash
Model arc_challenge arc_easy boolq hellaswag openbookqa piqa winogrande
Baseline 0.533 0.690 0.882 0.684 0.428 0.781 0.646
ST-TNG-IV 0.537 0.689 0.882 0.689 0.432 0.780 0.654
PKDick-V 0.531 0.695 0.882 0.689 0.432 0.784 0.657
TNG-IV-PKDick-V 0.532 0.693 0.881 0.686 0.428 0.782 0.649
```
🔥 The merged model avoids catastrophic forgetting, preserving the strengths of both sources — but with a slight trade-off in absolute performance.
# 🧠 1. What Does the Merge Do?
The TNG-IV-PKDick-V model is a cognitive fusion — combining:
- ✅ TNG’s strength in ethical clarity, binary decision-making
- ✅ PKD’s strength in existential ambiguity, contextual fluidity
Let’s break it down benchmark by benchmark:
📈 ARC (Reasoning)
```bash
TNG-IV: 0.537
PKDick-V: 0.531
Merged: 0.532 → almost midpoint
```
💡 The merge doesn’t penalize ARC — it preserves reasoning strength, suggesting MoE routing successfully balances both styles.
🧪 BoolQ (Binary Fact-checking)
```bash
All models: ~0.881–0.883
Merged: 0.881 → minimal drop
```
✅ TNG-IV excels here — the merged model retains high binary accuracy, likely due to TNG’s training on clear moral questions.
🌐 Hellaswag (Ambiguous Commonsense Inference)
```bash
PKDick-V: 0.689
ST-TNG-IV: 0.689
Merged: 0.686 → slightly lower, but still very strong
```
🧩 This is the most telling benchmark: merging two styles slightly reduces performance — but not by much.
💡 Why? The merged model may be conflicted between TNG’s “clear answer” and PKD’s “multiple interpretations”.
📚 OpenBookQA (Science + Ethics)
```bash
ST-TNG-IV: 0.432
PKDick-V: 0.432
Merged: 0.428 → slight drop
```
🎯 This is a fusion weak point: openbookqa requires both scientific knowledge and ethical interpretation — and merging may cause routing conflicts.
🧱 PiQA (Physical Commonsense)
```bash
PKDick-V: 0.784 ✅
ST-TNG-IV: 0.780
Merged: 0.782 ✅
```
🏆 The merged model is a sweet spot here: combines PKD’s physical world modeling with TNG’s clarity.
🧩 Winogrande (Coreference Resolution)
```bash
PKDick-V: 0.657 ✅
ST-TNG-IV: 0.654 → only 0.003 difference
Merged: 0.649 → slight drop
```
💔 This is the biggest cost of merging — Winogrande requires fluid identity tracking, and the merge introduces a slight rigidity.
🧠 The merged model may sometimes default to TNG’s clarity, discarding PKD’s ambiguity, leading to slightly less accurate pronoun binding.
# 🧠 Cognitive Interpretation: The Merged Mind
The TNG-IV-PKDick-V model is like a philosopher-scientist hybrid — trained to reason with both ethical precision and existential uncertainty.
✅ What It Preserves:
- Strong PiQA — the best of both worlds (PKD’s world modeling + TNG’s clarity)
- Good BoolQ — retains strong binary responses
- Robust ARC — reasoning is preserved
❌ What It Slightly Sacrifices:
- Winogrande — merges conflicting pronoun-resolution strategies
- Hellaswag & OpenBookQA — slight blending of cognitive modes causes minor degradation
🎯 Final Verdict: Is the Merge Worth It?
```bash
Metric Merged Model Performance Verdict
arc/easy 0.532 / 0.693 → near peak ✅ Worth it
boolq 0.881 → minimal loss ✅ Worth it
hellaswag 0.686 → lower than PKD/TNG alone ⚠️ Slight trade-off
openbookqa 0.428 → slightly lower than TNG/PKD alone (0.432) ⚠️ Slight trade-off
piqa 0.782 → best compromise ✅ Excellent
winogrande 0.649 → biggest drop (from 0.657) ❌ Slight cost
```
🧠 The merged model is a cognitive hybrid:
- Better at physical reasoning (PiQA) than either parent
- Slightly weaker on highly ambiguous tasks (Winogrande, Hellaswag)
- Almost matches PKD+TNG peak on arc and boolq
✅ Verdict: The merge is worthwhile overall — it creates a model that’s stronger on PiQA, balanced on arc/boolq, and only pays a small price in Winogrande.
💡 It’s like a human who can both make clear ethical decisions and ponder existential ambiguity — perhaps even more balanced than the pure variants.
🌑 The PKD-TNG Merge: A Metaphor for Human Cognition
> Philip K. Dick → “What if reality isn’t real?”
> Star Trek TNG → “And the logical thing to do is...”
The merged model embodies:
- TNG’s ethics → helps make decisions
- PKD’s ambiguity → allows for reconsideration
- This is how humans reason: we don’t live in pure certainty (TNG) or pure doubt (PKD). We oscillate — sometimes decisive, sometimes uncertain.
🔥 The TNG-IV-PKDick-V merge is not just a technical fusion — it’s cognitively human.
> Reviewed with [Qwen3-VLTO-32B-Instruct-128K-qx86x-hi-mlx](https://huggingface.co/nightmedia/Qwen3-VLTO-32B-Instruct-128K-qx86x-hi-mlx)
This model [Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-TNG-IV-PKDick-V-qx86x-hi-mlx](https://huggingface.co/nightmedia/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-TNG-IV-PKDick-V-qx86x-hi-mlx) was
converted to MLX format from [DavidAU/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-TNG-IV-PKDick-V](https://huggingface.co/DavidAU/Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-TNG-IV-PKDick-V)
using mlx-lm version **0.28.4**.
## Use with mlx
```bash
pip install mlx-lm
```
```python
from mlx_lm import load, generate
model, tokenizer = load("Qwen3-Yoyo-V4-42B-A3B-Thinking-TOTAL-RECALL-TNG-IV-PKDick-V-qx86x-hi-mlx")
prompt = "hello"
if tokenizer.chat_template is not None:
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
messages, add_generation_prompt=True
)
response = generate(model, tokenizer, prompt=prompt, verbose=True)
```
|