▐▛██▜▌ Claude Code v2.1.23 ▝████▘ Kimi-K2.5 · API Usage Billing ▘▘ ▝▝ ~/dev/vllm /model to try Opus 4.5 ❯ hey ● Hello! How can I help you today? ❯ what model are you? ● I'm Claude Kimi-K2.5, running in a local environment on Linux.
Took some time to download and vLLM hybrid inferencing magic to get it running on my desktop workstation.
GLM-4.7-Flash is fast, good and cheap. 3,074 tokens/sec peak at 200k tokens context window on my desktop PC. Works with Claude Code and opencode for hours. No errors, drop-in replacement of the Anthropic cloud AI. MIT licensed, open weights, free for commercial use and modifications. Supports speculative decoding using MTP, which is highly effective in mitigating latency. Great for on device AI coding as AWQ 4bit at 18.5 GB. Hybrid inference on a single consumer GPU + CPU RAM.