ID-RAG: industrial defect retrieval-augmented generation for industrial surface defect detection
Citations

WEB OF SCIENCE

0
Citations

SCOPUS

0

초록

General-purpose vision-language models (VLMs) often fail at industrial surface anomaly detection due to hallucinations and imprecise localization arising from missing grounded context. To address these limitations, we present Industrial Defect Retrieval-Augmented Generation (ID-RAG), a framework that reinterprets retrieval-augmented generation for industrial inspection. Rather than querying an external database, ID-RAG dynamically retrieves two types of image-internal evidence: (i) a constrained semantic classification over a domain vocabulary and (ii) a statistical anomaly prior from a state-of-the-art feature-based model. The retrieved evidence is fused into an augmented prompt that grounds the VLM and enables precise, evidence-backed outputs. ID-RAG operates in zero-shot mode by internally generating, using the VLM’s built-in text-to-image (T2I) capability, one or more generated template candidates and validating them against a pre-vetted reference standard image with a quality assurance (QA) module; the accepted template is then used to compute the anomaly prior. Evaluated on the MVTec AD texture categories, zero-shot ID-RAG is competitive with specialized detectors in detection while substantially improving localization over naive VLMs. These results indicate that dynamically retrieved, image-grounded evidence is an effective strategy for adapting foundation models to high-stakes industrial inspection.

키워드

Industrial anomaly detectionVision-language modelsMultimodal learningRetrieval-augmented generationZero-shotQuality assurance
제목
ID-RAG: industrial defect retrieval-augmented generation for industrial surface defect detection
저자
Lee, MingyuChoi, Jongwon
DOI
10.1007/s00138-026-01797-x
발행일
2026-02
유형
Article
저널명
Machine Vision and Applications
37
2