docker-model-runner / README.md
likhonsheikhdev's picture
Upload folder using huggingface_hub
7270816 verified
metadata
title: Docker Model Runner
emoji: 🐳
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
suggested_hardware: cpu-basic
pinned: false

Docker Model Runner

Anthropic API Compatible with Interleaved Thinking support.

Hardware

  • CPU Basic: 2 vCPU Β· 16 GB RAM

Quick Start

pip install anthropic
export ANTHROPIC_BASE_URL=https://likhonsheikhdev-docker-model-runner.hf.space
export ANTHROPIC_API_KEY=any-key
import anthropic

client = anthropic.Anthropic()

message = client.messages.create(
    model="MiniMax-M2",
    max_tokens=1000,
    system="You are a helpful assistant.",
    messages=[{"role": "user", "content": "Hi, how are you?"}]
)

for block in message.content:
    if block.type == "thinking":
        print(f"Thinking:\n{block.thinking}\n")
    elif block.type == "text":
        print(f"Text:\n{block.text}\n")

Interleaved Thinking

Enable thinking to get reasoning steps interleaved with responses:

import anthropic

client = anthropic.Anthropic(
    base_url="https://likhonsheikhdev-docker-model-runner.hf.space"
)

message = client.messages.create(
    model="MiniMax-M2",
    max_tokens=1024,
    thinking={
        "type": "enabled",
        "budget_tokens": 200
    },
    messages=[{"role": "user", "content": "Explain quantum computing"}]
)

# Response contains interleaved thinking and text blocks
for block in message.content:
    if block.type == "thinking":
        print(f"πŸ’­ Thinking: {block.thinking}")
    elif block.type == "text":
        print(f"πŸ“ Response: {block.text}")

Streaming with Thinking

import anthropic

client = anthropic.Anthropic(
    base_url="https://likhonsheikhdev-docker-model-runner.hf.space"
)

with client.messages.stream(
    model="MiniMax-M2",
    max_tokens=1024,
    thinking={"type": "enabled", "budget_tokens": 100},
    messages=[{"role": "user", "content": "Hello!"}]
) as stream:
    for event in stream:
        if hasattr(event, 'type'):
            if event.type == 'content_block_start':
                print(f"\n[{event.content_block.type}]", end=" ")
            elif event.type == 'content_block_delta':
                if hasattr(event.delta, 'thinking'):
                    print(event.delta.thinking, end="")
                elif hasattr(event.delta, 'text'):
                    print(event.delta.text, end="")

Multi-Turn with Thinking History

Important: In multi-turn conversations, append the complete model response (including thinking blocks) to maintain reasoning chain continuity.

import anthropic

client = anthropic.Anthropic(
    base_url="https://likhonsheikhdev-docker-model-runner.hf.space"
)

messages = [{"role": "user", "content": "What is 2+2?"}]

# First turn
response = client.messages.create(
    model="MiniMax-M2",
    max_tokens=1024,
    thinking={"type": "enabled", "budget_tokens": 100},
    messages=messages
)

# Append full response (including thinking) to history
messages.append({
    "role": "assistant",
    "content": response.content  # Includes both thinking and text blocks
})

# Second turn
messages.append({"role": "user", "content": "Now multiply that by 3"})

response2 = client.messages.create(
    model="MiniMax-M2",
    max_tokens=1024,
    thinking={"type": "enabled", "budget_tokens": 100},
    messages=messages
)

Supported Models

Model Description
MiniMax-M2 Agentic capabilities, Advanced reasoning
MiniMax-M2-Stable High concurrency and commercial use

API Compatibility

Parameters

Parameter Status
model βœ… Fully supported
messages βœ… Partial (text, tool calls)
max_tokens βœ… Fully supported
stream βœ… Fully supported
system βœ… Fully supported
temperature βœ… Range (0.0, 1.0]
thinking βœ… Fully supported
thinking.budget_tokens βœ… Fully supported
tools βœ… Fully supported
tool_choice βœ… Fully supported
top_p βœ… Fully supported
metadata βœ… Fully supported
top_k βšͺ Ignored
stop_sequences βšͺ Ignored

Message Types

Type Status
text βœ… Supported
thinking βœ… Supported
tool_use βœ… Supported
tool_result βœ… Supported
image ❌ Not supported
document ❌ Not supported

Endpoints

Endpoint Method Description
/v1/messages POST Anthropic Messages API
/v1/chat/completions POST OpenAI Chat API
/v1/models GET List models
/health GET Health check
/info GET API info

cURL Example

curl -X POST https://likhonsheikhdev-docker-model-runner.hf.space/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: any-key" \
  -d '{
    "model": "MiniMax-M2",
    "max_tokens": 1024,
    "thinking": {"type": "enabled", "budget_tokens": 100},
    "messages": [
      {"role": "user", "content": "Explain AI briefly"}
    ]
  }'