AD-Copilot

A vision-language assistant for industrial anomaly detection via visual in-context comparison.

Model Details

Model Description

Developed by: Xi Jiang, Yue Guo, Jian Li, Yong Liu, Bin-Bin Gao, Hanqiu Deng, Jun Liu, Heng Zhao, Chengjie Wang, Feng Zheng
Model type: Vision-Language Model (VLM)
Language(s): English and Chinese
License: Apache 2.0
Finetuned from: Qwen/Qwen2.5-VL-7B-Instruct

Model Sources

Repository: jam-cc/AD-Copilot
Paper: arXiv:2603.13779

Uses

Direct Use

AD-Copilot can be used for:

Industrial anomaly detection and localization
Natural language question answering about product defects
Visual comparison between normal reference images and query images
General visual question answering

How to Get Started with the Model

from transformers import AutoModelForImageTextToText, AutoProcessor
from qwen_vl_utils import process_vision_info

model = AutoModelForImageTextToText.from_pretrained(
    "jiang-cc/AD-Copilot",
    torch_dtype="auto",
    device_map="auto"
)
processor = AutoProcessor.from_pretrained("jiang-cc/AD-Copilot")

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": "<path_to_reference_image>"},
            {"type": "image", "image": "<path_to_query_image>"},
            {"type": "text", "text": "The first image is a normal reference. Is there any anomaly in the second image? If so, describe it."},
        ],
    }
]

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
    text=[text],
    images=image_inputs,
    videos=video_inputs,
    return_tensors="pt"
).to(model.device)

output_ids = model.generate(**inputs, max_new_tokens=512)
response = processor.batch_decode(
    output_ids[:, inputs.input_ids.shape[1]:],
    skip_special_tokens=True
)[0]
print(response)

Citation

BibTeX:

@article{jiang2026ad,
  title   = {AD-Copilot: A Vision-Language Assistant for Industrial Anomaly Detection via Visual In-context Comparison},
  author  = {Jiang, Xi and Guo, Yue and Li, Jian and Liu, Yong and Gao, Bin-Bin and Deng, Hanqiu and Liu, Jun and Zhao, Heng and Wang, Chengjie and Zheng, Feng},
  journal = {arXiv preprint arXiv:2603.13779},
  year    = {2026}
}

Downloads last month: 14

Safetensors

Model size

8B params

Tensor type

BF16

Paper for jiang-cc/AD-Copilot

AD-Copilot: A Vision-Language Assistant for Industrial Anomaly Detection via Visual In-context Comparison

Paper • 2603.13779 • Published 11 days ago