BERT Torrent Classifier
A fine-tuned BERT-tiny model for classifying torrent content into media types.
Model Details
- Base model: prajjwal1/bert-tiny
- Task: Multi-class text classification
- Labels: audio, video, software, book, other
- Format: ONNX (with embedded weights)
- Size: ~17MB
Training
- Training data: ~10k torrent names with 4-LLM consensus voting
- LLM ensemble: qwen2.5:3b, gemma3:4b, mistral:7b, qwen3-coder:30b
- Consensus rules: 4-agree = high confidence, 3v1 = majority vote, 2v2 = discarded
- Accuracy: ~92% on held-out test set
Usage
This model is designed for use with mimmo, a Rust library for torrent content classification. The ONNX model is embedded directly in the binary at compile time.
// Model is automatically downloaded during build
const MODEL_BYTES: &[u8] = include_bytes!("../models/bert/model_embedded.onnx");
const TOKENIZER_JSON: &str = include_str!("../models/bert/tokenizer.json");
Performance
- Inference: <10ms per sample (CPU)
- Used as ML fallback when pattern matching is inconclusive
Files
model_embedded.onnx- ONNX model with embedded weightstokenizer.json- HuggingFace tokenizervocab.txt- Vocabulary fileconfig.json- Model configuration
License
MIT
- Downloads last month
- 14
Model tree for lelloman/bert-torrent-classifier
Base model
prajjwal1/bert-tiny