AD-Copilot: A Vision-Language Assistant for Industrial Anomaly Detection via Visual In-context Comparison
Paper • 2603.13779 • Published
A vision-language assistant for industrial anomaly detection via visual in-context comparison.
AD-Copilot can be used for:
from transformers import AutoModelForImageTextToText, AutoProcessor
from qwen_vl_utils import process_vision_info
model = AutoModelForImageTextToText.from_pretrained(
"jiang-cc/AD-Copilot",
torch_dtype="auto",
device_map="auto"
)
processor = AutoProcessor.from_pretrained("jiang-cc/AD-Copilot")
messages = [
{
"role": "user",
"content": [
{"type": "image", "image": "<path_to_reference_image>"},
{"type": "image", "image": "<path_to_query_image>"},
{"type": "text", "text": "The first image is a normal reference. Is there any anomaly in the second image? If so, describe it."},
],
}
]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
text=[text],
images=image_inputs,
videos=video_inputs,
return_tensors="pt"
).to(model.device)
output_ids = model.generate(**inputs, max_new_tokens=512)
response = processor.batch_decode(
output_ids[:, inputs.input_ids.shape[1]:],
skip_special_tokens=True
)[0]
print(response)
BibTeX:
@article{jiang2026ad,
title = {AD-Copilot: A Vision-Language Assistant for Industrial Anomaly Detection via Visual In-context Comparison},
author = {Jiang, Xi and Guo, Yue and Li, Jian and Liu, Yong and Gao, Bin-Bin and Deng, Hanqiu and Liu, Jun and Zhao, Heng and Wang, Chengjie and Zheng, Feng},
journal = {arXiv preprint arXiv:2603.13779},
year = {2026}
}