Spaces:

likhonsheikhdev
/

docker-model-runner

Sleeping

App Files Files Community

docker-model-runner / README.md

likhonsheikhdev

Upload folder using huggingface_hub

7270816 verified 6 days ago

preview code

raw

history blame contribute delete

5.05 kB

metadata

title: Docker Model Runner
emoji: 🐳
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
suggested_hardware: cpu-basic
pinned: false

Docker Model Runner

Anthropic API Compatible with Interleaved Thinking support.

Hardware

CPU Basic: 2 vCPU · 16 GB RAM

Quick Start

pip install anthropic
export ANTHROPIC_BASE_URL=https://likhonsheikhdev-docker-model-runner.hf.space
export ANTHROPIC_API_KEY=any-key

import anthropic

client = anthropic.Anthropic()

message = client.messages.create(
    model="MiniMax-M2",
    max_tokens=1000,
    system="You are a helpful assistant.",
    messages=[{"role": "user", "content": "Hi, how are you?"}]
)

for block in message.content:
    if block.type == "thinking":
        print(f"Thinking:\n{block.thinking}\n")
    elif block.type == "text":
        print(f"Text:\n{block.text}\n")

Interleaved Thinking

Enable thinking to get reasoning steps interleaved with responses:

import anthropic

client = anthropic.Anthropic(
    base_url="https://likhonsheikhdev-docker-model-runner.hf.space"
)

message = client.messages.create(
    model="MiniMax-M2",
    max_tokens=1024,
    thinking={
        "type": "enabled",
        "budget_tokens": 200
    },
    messages=[{"role": "user", "content": "Explain quantum computing"}]
)

# Response contains interleaved thinking and text blocks
for block in message.content:
    if block.type == "thinking":
        print(f"💭 Thinking: {block.thinking}")
    elif block.type == "text":
        print(f"📝 Response: {block.text}")

Streaming with Thinking

import anthropic

client = anthropic.Anthropic(
    base_url="https://likhonsheikhdev-docker-model-runner.hf.space"
)

with client.messages.stream(
    model="MiniMax-M2",
    max_tokens=1024,
    thinking={"type": "enabled", "budget_tokens": 100},
    messages=[{"role": "user", "content": "Hello!"}]
) as stream:
    for event in stream:
        if hasattr(event, 'type'):
            if event.type == 'content_block_start':
                print(f"\n[{event.content_block.type}]", end=" ")
            elif event.type == 'content_block_delta':
                if hasattr(event.delta, 'thinking'):
                    print(event.delta.thinking, end="")
                elif hasattr(event.delta, 'text'):
                    print(event.delta.text, end="")

Multi-Turn with Thinking History

Important: In multi-turn conversations, append the complete model response (including thinking blocks) to maintain reasoning chain continuity.

import anthropic

client = anthropic.Anthropic(
    base_url="https://likhonsheikhdev-docker-model-runner.hf.space"
)

messages = [{"role": "user", "content": "What is 2+2?"}]

# First turn
response = client.messages.create(
    model="MiniMax-M2",
    max_tokens=1024,
    thinking={"type": "enabled", "budget_tokens": 100},
    messages=messages
)

# Append full response (including thinking) to history
messages.append({
    "role": "assistant",
    "content": response.content  # Includes both thinking and text blocks
})

# Second turn
messages.append({"role": "user", "content": "Now multiply that by 3"})

response2 = client.messages.create(
    model="MiniMax-M2",
    max_tokens=1024,
    thinking={"type": "enabled", "budget_tokens": 100},
    messages=messages
)

Supported Models

Model	Description
MiniMax-M2	Agentic capabilities, Advanced reasoning
MiniMax-M2-Stable	High concurrency and commercial use

API Compatibility

Parameters

Parameter	Status
model	✅ Fully supported
messages	✅ Partial (text, tool calls)
max_tokens	✅ Fully supported
stream	✅ Fully supported
system	✅ Fully supported
temperature	✅ Range (0.0, 1.0]
thinking	✅ Fully supported
thinking.budget_tokens	✅ Fully supported
tools	✅ Fully supported
tool_choice	✅ Fully supported
top_p	✅ Fully supported
metadata	✅ Fully supported
top_k	⚪ Ignored
stop_sequences	⚪ Ignored

Message Types

Type	Status
text	✅ Supported
thinking	✅ Supported
tool_use	✅ Supported
tool_result	✅ Supported
image	❌ Not supported
document	❌ Not supported

Endpoints

Endpoint	Method	Description
`/v1/messages`	POST	Anthropic Messages API
`/v1/chat/completions`	POST	OpenAI Chat API
`/v1/models`	GET	List models
`/health`	GET	Health check
`/info`	GET	API info

cURL Example

curl -X POST https://likhonsheikhdev-docker-model-runner.hf.space/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: any-key" \
  -d '{
    "model": "MiniMax-M2",
    "max_tokens": 1024,
    "thinking": {"type": "enabled", "budget_tokens": 100},
    "messages": [
      {"role": "user", "content": "Explain AI briefly"}
    ]
  }'