Qwen3-8B-Drama-Thinking

This model is a full parameter fine-tuned version of Qwen/Qwen3-8B on a custom drama thinking dataset with explicit creative reasoning chains.

Model Description

  • Base Model: Qwen3-8B (8 billion parameters)
  • Training Method: Full Parameter Fine-tuning (NOT LoRA)
  • Training Framework: ms-swift
  • Training Data: Custom Drama Thinking Dataset (6,319 samples, avg ~5,000 tokens)
  • Specialization: Screenwriting with explicit <think>...</think> creative reasoning
  • Hardware: 2x NVIDIA H100 80GB SXM5
  • Training Time: 2 hours 46 minutes (3 epochs)
  • Training Cost: ~$17.86

Key Features

๐ŸŽฌ Professional Screenwriting Assistant

This model generates dramatic scripts with explicit creative deliberation:

  • โœ… Thinking Process Visible: Uses <think>...</think> tags to show internal reasoning
  • โœ… Deep Character Psychology: Analyzes motivations, defense mechanisms, subtext
  • โœ… Structural Planning: Three-act structure, emotional arcs, pacing decisions
  • โœ… Visual Storytelling: Symbolism, atmosphere, cinematographic choices
  • โœ… Professional Format: Correct screenplay formatting (scene headers, action lines, dialogue)

๐Ÿ“Š Performance Comparison

Compared to base Qwen3-8B:

Metric Base Model Fine-Tuned Improvement
Output Length 1,071 tokens 3,874 tokens +262%
Thinking Depth 5/10 9/10 +80%
Creative Reasoning 500 tokens 3,400 tokens +580%
Craft Analysis Generic Professional Qualitative leap

๐ŸŽฏ Unique Value Proposition

This is not just a text generator - it's a creative thinking partner that externalizes the entire screenwriting process: from title analysis to character psychology to structural planning to final execution.

Training Details

Training Configuration

Model:              Qwen/Qwen3-8B
Template:           qwen3_thinking
Training Type:      Full Parameter (all 8B parameters)
Max Length:         8192 tokens (for long thinking chains)
Batch Size:         1 per device ร— 2 GPUs
Gradient Accum:     8 steps (effective batch size: 16)
Learning Rate:      1e-5
Epochs:             3
Optimization:       DeepSpeed Zero3 + Gradient Checkpointing
                    Liger Kernel, BF16 mixed precision
Loss Scale:         ignore_empty_think
GPU Memory:         ~74.62 GB per H100 (stable)

Dataset Characteristics

  • Samples: 6,319 dramatic script continuations
  • Average Length: ~5,000 tokens per sample
  • Max Length: ~6,100 tokens
  • Format: Conversations with <think>...</think> reasoning tags
  • Content:
    • Script opening scenes (title, description, initial dialogue)
    • Extensive creative deliberation (3,000+ tokens of thinking)
    • Script continuation with proper formatting
  • Style: Dramatic, emotionally intense scenarios (conflicts, reconciliation, tragedy)

Training Metrics

  • Final Loss: 0.844
  • Average Loss: 0.978
  • Loss Trajectory: 1.602 (start) โ†’ 0.82-0.83 (end)
  • Training Speed: ~8 seconds/iteration
  • Total Steps: 1,185
  • Checkpoints: 5 saved (400, 800, 900, 1000, 1185)

Usage

Quick Start (ms-swift)

# Install ms-swift
pip install ms-swift

# Inference (interactive mode)
swift infer \
    --ckpt_dir FutureMa/Qwen3-8B-Drama-Thinking \
    --template qwen3_thinking \
    --max_new_tokens 4096 \
    --temperature 0.7

Python API

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

from swift.llm import PtEngine, InferRequest, RequestConfig

# Initialize engine
engine = PtEngine(
    model_id_or_path="FutureMa/Qwen3-8B-Drama-Thinking",
    max_batch_size=1,
    torch_dtype="bfloat16"
)

# Create prompt
prompt = """Title: The Reunion
Description: Two estranged siblings meet at their childhood home after 20 years.

INT. FAMILY LIVING ROOM - DAY

SARAH (35) stands by the window, looking out at the garden.
MICHAEL (38) enters, hesitant."""

# Generate (streaming)
messages = [{'role': 'user', 'content': prompt}]
request = InferRequest(messages=messages)
config = RequestConfig(max_tokens=4096, temperature=0.7, stream=True)

for response in engine.infer([request], config)[0]:
    if response:
        print(response.choices[0].delta.content, end='', flush=True)

Using with Transformers (requires adapting checkpoint)

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "FutureMa/Qwen3-8B-Drama-Thinking",
    torch_dtype="bfloat16",
    device_map="auto"
)

tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-8B")

messages = [
    {"role": "system", "content": "You are a creative screenwriter assistant specialized in dramatic storytelling with internal reasoning."},
    {"role": "user", "content": "Write a scene about forgiveness..."}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=4096, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Example Output

Input Prompt:

Title: The Last Apology
Description: A daughter arrives at her estranged father's deathbed...

Model Output (excerpt):

<think>
Okay. Let's go. I'm sitting here with the opening of *The Last Apology* fresh in my mind,
and I need to figure out exactly where this goes nextโ€”not just the next scene, but the next
emotional beat, the next structural pivot...

First, the title: *The Last Apology*. That's not just poeticโ€”it's structural. It tells me
this story is about delayed recognition, about the finality of words left unsaid...

[3,400 tokens of deep creative analysis including:]
- Title deconstruction and thematic implications
- Character psychology analysis
- Three-act structure planning
- Visual language and symbolism
- Multiple narrative paths considered
- Professional screenwriting techniques
</think>

INT. HOSPITAL ROOM - NIGHT

ANNA (28), in a wrinkled business suit, hesitates at the doorway.

DAVID (65) lies in bed, breathing labored...

[Script continues with proper formatting]

Intended Use

โœ… Recommended Use Cases

  1. Screenwriting Education: Learn professional creative thinking process
  2. Script Ideation: Generate story frameworks and narrative alternatives
  3. Story Consulting: Explore "what if" scenarios with explicit reasoning
  4. Creative Brainstorming: Understand decision-making in storytelling
  5. Draft Development: Plan structure before execution

โŒ Not Recommended For

  1. Final Shooting Scripts: Requires human refinement for production
  2. Comedy/Action Genres: Training bias toward dramatic content
  3. Long-form Series: Single-pass generation may lack consistency
  4. Immediate Production: Dialogue needs naturalization

Evaluation Results

Quantitative Metrics (vs. Base Model)

Aspect Score Base Model Improvement
Thinking Depth 9/10 5/10 +80%
Script Format 9/10 8/10 +13%
Dramatic Craft 8.5/10 8/10 +6%
Character Psychology 9/10 6/10 +50%
Decision Transparency 9/10 5/10 +80%
Overall 8.1/10 6.9/10 +17%

Qualitative Improvements

  • โœ… Professional Voice: Sounds like experienced screenwriter
  • โœ… Structural Thinking: Explicit three-act planning
  • โœ… Meta-Awareness: "This isn't just a script. It's a reckoning."
  • โœ… Non-Linear Reasoning: Considers alternatives, backtracks, refines
  • โœ… Craft-Oriented: Explains why choices serve the story

Limitations

  1. Thinking Verbosity: Generates ~3,400 tokens of thinking (87% of output)

    • May be excessive for quick tasks
    • Consider using max_new_tokens to limit length
  2. Incomplete Execution: Token budget consumed by thinking

    • Many planned scenes not fully generated
    • May need 6,000-8,000 token limit for complete scripts
  3. Dialogue Naturalness: More direct/literary than conversational

    • Training data style influences output
    • May need post-processing for natural speech
  4. Training Data Bias: Skews toward melodramatic scenarios

    • Less suited for subtle/realistic dialogue
    • Best for emotionally intense stories

Training Insights

What Made This Successful

  1. 8192 Token Context: Essential for capturing full thinking chains

    • Initial assumption of 2048 would have truncated data
    • Average sample length: ~5,000 tokens
  2. DeepSpeed Zero3: Required (not optional)

    • Single H100: Would need ~109-114 GB (OOM)
    • Zero3 sharding: ~74.62 GB per card โœ…
  3. Full Parameter Training: Worth the cost

    • Deeper capability transfer than LoRA
    • Better thinking process internalization
    • Cost: $17.86 (2.8 hours) vs ~$5 for LoRA
  4. Quality Training Data: 6,319 long-form reasoning examples

    • Actual creative process in <think> tags
    • High-quality dramatic writing

Citation

@misc{qwen3-drama-thinking-2025,
  author = {FutureMa},
  title = {Qwen3-8B-Drama-Thinking: Full Parameter Fine-tuning for Creative Screenwriting},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/FutureMa/Qwen3-8B-Drama-Thinking}},
  note = {Full parameter fine-tuning on 6,319 drama samples with explicit reasoning chains}
}

Acknowledgments

  • Base Model: Qwen Team - Qwen3-8B
  • Training Framework: ms-swift - ModelScope SWIFT
  • Infrastructure: Lambda Cloud - 2x H100 80GB SXM5
  • Dataset: Custom Drama Thinking Dataset (6,319 samples)

Model Card Contact

For questions or feedback:

  • HuggingFace: @FutureMa
  • GitHub Issues: Report via ms-swift repository

Training Date: 2025-12-08 Training Duration: 2h 46m Model Size: ~16GB (BF16 precision) Recommended VRAM: 16GB+ for inference

Downloads last month
-
Safetensors
Model size
308k params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for FutureMa/Qwen3-8B-Drama-Thinking

Base model

Qwen/Qwen3-8B-Base
Finetuned
Qwen/Qwen3-8B
Finetuned
(630)
this model

Evaluation results