metadata
title: Docker Model Runner
emoji: π³
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
suggested_hardware: cpu-basic
pinned: false
Docker Model Runner
Anthropic API Compatible with Interleaved Thinking support.
Hardware
- CPU Basic: 2 vCPU Β· 16 GB RAM
Quick Start
pip install anthropic
export ANTHROPIC_BASE_URL=https://likhonsheikhdev-docker-model-runner.hf.space
export ANTHROPIC_API_KEY=any-key
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="MiniMax-M2",
max_tokens=1000,
system="You are a helpful assistant.",
messages=[{"role": "user", "content": "Hi, how are you?"}]
)
for block in message.content:
if block.type == "thinking":
print(f"Thinking:\n{block.thinking}\n")
elif block.type == "text":
print(f"Text:\n{block.text}\n")
Interleaved Thinking
Enable thinking to get reasoning steps interleaved with responses:
import anthropic
client = anthropic.Anthropic(
base_url="https://likhonsheikhdev-docker-model-runner.hf.space"
)
message = client.messages.create(
model="MiniMax-M2",
max_tokens=1024,
thinking={
"type": "enabled",
"budget_tokens": 200
},
messages=[{"role": "user", "content": "Explain quantum computing"}]
)
# Response contains interleaved thinking and text blocks
for block in message.content:
if block.type == "thinking":
print(f"π Thinking: {block.thinking}")
elif block.type == "text":
print(f"π Response: {block.text}")
Streaming with Thinking
import anthropic
client = anthropic.Anthropic(
base_url="https://likhonsheikhdev-docker-model-runner.hf.space"
)
with client.messages.stream(
model="MiniMax-M2",
max_tokens=1024,
thinking={"type": "enabled", "budget_tokens": 100},
messages=[{"role": "user", "content": "Hello!"}]
) as stream:
for event in stream:
if hasattr(event, 'type'):
if event.type == 'content_block_start':
print(f"\n[{event.content_block.type}]", end=" ")
elif event.type == 'content_block_delta':
if hasattr(event.delta, 'thinking'):
print(event.delta.thinking, end="")
elif hasattr(event.delta, 'text'):
print(event.delta.text, end="")
Multi-Turn with Thinking History
Important: In multi-turn conversations, append the complete model response (including thinking blocks) to maintain reasoning chain continuity.
import anthropic
client = anthropic.Anthropic(
base_url="https://likhonsheikhdev-docker-model-runner.hf.space"
)
messages = [{"role": "user", "content": "What is 2+2?"}]
# First turn
response = client.messages.create(
model="MiniMax-M2",
max_tokens=1024,
thinking={"type": "enabled", "budget_tokens": 100},
messages=messages
)
# Append full response (including thinking) to history
messages.append({
"role": "assistant",
"content": response.content # Includes both thinking and text blocks
})
# Second turn
messages.append({"role": "user", "content": "Now multiply that by 3"})
response2 = client.messages.create(
model="MiniMax-M2",
max_tokens=1024,
thinking={"type": "enabled", "budget_tokens": 100},
messages=messages
)
Supported Models
| Model | Description |
|---|---|
| MiniMax-M2 | Agentic capabilities, Advanced reasoning |
| MiniMax-M2-Stable | High concurrency and commercial use |
API Compatibility
Parameters
| Parameter | Status |
|---|---|
| model | β Fully supported |
| messages | β Partial (text, tool calls) |
| max_tokens | β Fully supported |
| stream | β Fully supported |
| system | β Fully supported |
| temperature | β Range (0.0, 1.0] |
| thinking | β Fully supported |
| thinking.budget_tokens | β Fully supported |
| tools | β Fully supported |
| tool_choice | β Fully supported |
| top_p | β Fully supported |
| metadata | β Fully supported |
| top_k | βͺ Ignored |
| stop_sequences | βͺ Ignored |
Message Types
| Type | Status |
|---|---|
| text | β Supported |
| thinking | β Supported |
| tool_use | β Supported |
| tool_result | β Supported |
| image | β Not supported |
| document | β Not supported |
Endpoints
| Endpoint | Method | Description |
|---|---|---|
/v1/messages |
POST | Anthropic Messages API |
/v1/chat/completions |
POST | OpenAI Chat API |
/v1/models |
GET | List models |
/health |
GET | Health check |
/info |
GET | API info |
cURL Example
curl -X POST https://likhonsheikhdev-docker-model-runner.hf.space/v1/messages \
-H "Content-Type: application/json" \
-H "x-api-key: any-key" \
-d '{
"model": "MiniMax-M2",
"max_tokens": 1024,
"thinking": {"type": "enabled", "budget_tokens": 100},
"messages": [
{"role": "user", "content": "Explain AI briefly"}
]
}'