DEIMv2 - Real-Time Object Detection Meets DINOv3
Pre-trained DEIMv2 models with PyTorch checkpoints, ONNX exports, and TensorRT FP16 engines.
Model Zoo
| Model | AP | Params | GFLOPs | Checkpoint | ONNX | TensorRT |
|---|---|---|---|---|---|---|
| Atto | 23.8 | 0.5M | 0.8 | β | β | β |
| Femto | 31.0 | 1.0M | 1.7 | β | β | β |
| Pico | 38.5 | 1.5M | 5.2 | β | β | β |
| N | 43.0 | 3.6M | 6.8 | β | β | β |
| S | 50.9 | 9.7M | 25.6 | β | β | β |
| M | 53.0 | 18.1M | 52.2 | β | β | β |
| L | 56.0 | 32.2M | 96.7 | β | β | β |
| X | 57.8 | 50.3M | 151.6 | β | β | β |
Files
*.pth- PyTorch checkpoints (EMA weights)*.onnx- ONNX models (opset 17, dynamic batch)*.engine- TensorRT FP16 engines (built on RTX 4090, TensorRT 10.14)
Input Shapes
| Model | Input Size |
|---|---|
| Atto | 320x320 |
| Femto | 416x416 |
| Pico, N, S, M, L, X | 640x640 |
Usage
PyTorch
from huggingface_hub import hf_hub_download
import torch
# Download checkpoint
ckpt_path = hf_hub_download("carpedm20/DEIMv2", "deimv2_dinov3_s_coco.pth")
checkpoint = torch.load(ckpt_path, map_location='cpu')
state_dict = checkpoint['ema']['module']
ONNX Runtime
import onnxruntime as ort
from huggingface_hub import hf_hub_download
onnx_path = hf_hub_download("carpedm20/DEIMv2", "deimv2_dinov3_s_coco.onnx")
session = ort.InferenceSession(onnx_path)
TensorRT
import tensorrt as trt
from huggingface_hub import hf_hub_download
engine_path = hf_hub_download("carpedm20/DEIMv2", "deimv2_dinov3_s_coco.engine")
# Load engine with TensorRT runtime
Citation
@article{huang2025deimv2,
title={Real-Time Object Detection Meets DINOv3},
author={Huang, Shihua and Hou, Yongjie and Liu, Longfei and Yu, Xuanlong and Shen, Xi},
journal={arXiv},
year={2025}
}
License
Apache 2.0 - See DEIMv2 GitHub for details.