litert-community/Qwen3-0.6B
Main Model Card: Qwen/Qwen3-0.6B
This model card provides a few variants of the Qwen3-0.6B model that are ready for deployment on Android and Desktop.
How to Use
Android (Google AI Edge Gallery)
You can either install Google AI Edge Gallery through Open Beta in the Play Store or install the APK from Github.
To build the demo app from source, please follow the instructions from the GitHub repository.
Android (LiteRT-LM)
1. Add the dependency
Make sure you have the necessary dependency in your Gradle file.
dependencies {
implementation("com.google.ai.edge.litertlm:litertlm:<LATEST_VERSION>")
}
2. Inference with the LiteRT-LM API
import com.google.ai.edge.litertlm.*
suspend fun main() {
Engine.setNativeMinLogSeverity(LogSeverity.ERROR) // hide log for TUI app
val engineConfig = EngineConfig(
modelPath = "/path/to/your/model.litertlm", // Replace with model path
backend = Backend.CPU, // Or Backend.GPU
visionBackend = Backend.GPU,
)
// See the Content class for other variants.
val multiModalMessage = Message.of(
Content.Text("Tell me a Joke."),
)
Engine(engineConfig).use { engine ->
engine.initialize()
engine.createConversation().use { conversation ->
while (true) {
print("\n>>> ")
conversation.sendMessageAsync(Message.of(readln())).collect { print(it) }
}
}
}
}
Try running this model on NPU by using this .litertlm file and setting your EngineConfig’s backend to NPU. To check if your phone’s NPU is supported see this guide.
Desktop
To build a Desktop application, C++ is the current recommendation. See the following code sample.
// Create engine.
auto engine_settings = EngineSettings::CreateDefault(
model_assets,
/*backend=*/litert::lm::Backend::CPU,
);
// The same steps to create Engine and Conversation as above...
// Send message to the LLM with image data.
absl::StatusOr<Message> model_message = (*conversation)->SendMessage(
JsonMessage{
{"role", "user"},
{"content", { // Now content must be an array.
{{"type", "text"}, {"text", "Tell me a Joke."}}
}},
});
CHECK_OK(model_message);
// Print the model message.
std::cout << *model_message << std::endl;
Performance
Android
Benchmarked on Vivo X300 Pro.
| Backend | Quantization scheme | Context length | Prefill (tokens/sec) | Decode (tokens/sec) | Model size (MB) | Model File |
|---|---|---|---|---|---|---|
CPU |
dynamic_int8 |
4096 |
165 tk/s |
9 tk/s |
586 MB |
|
GPU |
dynamic_int8 |
4096 |
580 tk/s |
21 tk/s |
586 MB |
|
NPU |
a16w8 |
4096 |
1,472 tk/s |
36 tk/s |
992 MB |
Notes:
- Model Size: measured by the size of the file on disk.
- The inference on CPU is accelerated via the LiteRT XNNPACK delegate with 4 threads
- Benchmark is run with cache enabled and initialized. During the first run, the latency and memory usage may differ.
- Downloads last month
- 9