hnamt commited on
Commit
084b0f1
·
verified ·
1 Parent(s): 01c3924

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +70 -3
README.md CHANGED
@@ -1,3 +1,70 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language: en
4
+ library_name: pytorch
5
+ pipeline_tag: object-detection
6
+ tags:
7
+ - rtdetr
8
+ - object-detection
9
+ - knowledge-distillation
10
+ - taco-dataset
11
+ - dinov3
12
+ - convnext
13
+ ---
14
+
15
+ # RT-DisDINOv3-ConvNext: A Distilled RT-DETR-L Model
16
+
17
+ This model is an **RT-DETR-L** whose backbone and encoder have been pre-trained using knowledge distillation from a powerful **DINOv3 ConvNeXt-Base** teacher model. The distillation process was performed on feature maps from the [TACO (Trash Annotations in Context)](https://tacodataset.org/) dataset.
18
+
19
+ This pre-trained checkpoint contains the "distilled knowledge" and is intended to be used as a starting point for fine-tuning on downstream object detection tasks, potentially leading to better performance compared to standard pre-trained weights.
20
+
21
+ This work is part of the **RT-DisDINOv3** project. For full details on the training pipeline, baseline comparisons, and analysis, please visit the [main GitHub repository](https://github.com/your-username/your-repo-name). <!--- <<< TODO: Add your GitHub repo link here -->
22
+
23
+ ## How to Use
24
+
25
+ You can load these distilled weights and apply them to the original RT-DETR-L model's backbone and encoder before fine-tuning.
26
+
27
+ ```python
28
+ import torch
29
+ from torch.hub import load_state_dict_from_url
30
+
31
+ # 1. Load the original RT-DETR-L model architecture
32
+ # Make sure you have the 'rtdetr' repository cloned locally or installed
33
+ rtdetr_l = torch.hub.load('lyuwenyu/RT-DETR', 'rtdetrv2_l', pretrained=True)
34
+ model = rtdetr_l.model
35
+
36
+ # 2. Load the distilled weights from this Hugging Face Hub repository
37
+ MODEL_URL = "https://huggingface.co/hnamt/RT-DisDINOv3-ConvNext-Base/resolve/main/distilled_rtdetr_convnext_teacher_BEST.pth"
38
+ distilled_state_dict = load_state_dict_from_url(MODEL_URL, map_location='cpu')['model']
39
+
40
+ # 3. Load the weights into the model's backbone and encoder
41
+ # The `strict=False` flag ensures that only matching keys (backbone + encoder) are loaded.
42
+ model.load_state_dict(distilled_state_dict, strict=False)
43
+
44
+ print("Successfully loaded and applied distilled knowledge from ConvNeXt teacher!")
45
+
46
+ # Now the 'model' is ready for fine-tuning on your own dataset.
47
+ # For example:
48
+ # optimizer = torch.optim.AdamW(model.parameters(), lr=1e-4)
49
+ # model.train()
50
+ # ... your fine-tuning loop ...
51
+ ```
52
+
53
+ ## Training Details
54
+
55
+ - **Student Model**: RT-DETR-L (`rtdetrv2_l` from [lyuwenyu/RT-DETR](https://github.com/lyuwenyu/RT-DETR)).
56
+ - **Teacher Model**: DINOv3 ConvNeXt-Base (`facebook/dinov3-convnext-base-pretrain-lvd1689m`).
57
+ - **Dataset for Distillation**: TACO dataset images.
58
+ - **Distillation Procedure**: The student model's backbone and encoder were trained to minimize the Mean Squared Error (MSE) between their output feature maps and those of the teacher model.
59
+
60
+ ## Evaluation Results
61
+
62
+ After the distillation pre-training, the model was fine-tuned on the TACO dataset. The results show a significant improvement over the baseline.
63
+
64
+ | Model | mAP@50-95 | mAP@50 | Speed (ms) | Notes |
65
+ | ----------------------------- | :-------: | :----: | :--------: | ----------------------------------- |
66
+ | RT-DETR-L (Baseline) | 2.80% | 4.60% | 50.05 | Fine-tuned from COCO pre-trained. |
67
+ | **RT-DisDINOv3 (w/ ConvNeXt)**| **3.60%** | **5.30%**| 49.80 | **+28.6% mAP increase over baseline.** |
68
+
69
+ ## License
70
+ The weights in this repository are released under the Apache 2.0 License. Please be aware that the models used for training (RT-DETR, DINOv3) have their own licenses.