ID-RAG: industrial defect retrieval-augmented generation for industrial surface defect detection

Lee, Mingyu; Choi, Jongwon

doi:10.1007/s00138-026-01797-x

상세 보기

ID-RAG: industrial defect retrieval-augmented generation for industrial surface defect detection

Lee, Mingyu;
Choi, Jongwon

Citations

WEB OF SCIENCE

0

Citations

SCOPUS

0

초록

General-purpose vision-language models (VLMs) often fail at industrial surface anomaly detection due to hallucinations and imprecise localization arising from missing grounded context. To address these limitations, we present Industrial Defect Retrieval-Augmented Generation (ID-RAG), a framework that reinterprets retrieval-augmented generation for industrial inspection. Rather than querying an external database, ID-RAG dynamically retrieves two types of image-internal evidence: (i) a constrained semantic classification over a domain vocabulary and (ii) a statistical anomaly prior from a state-of-the-art feature-based model. The retrieved evidence is fused into an augmented prompt that grounds the VLM and enables precise, evidence-backed outputs. ID-RAG operates in zero-shot mode by internally generating, using the VLM’s built-in text-to-image (T2I) capability, one or more generated template candidates and validating them against a pre-vetted reference standard image with a quality assurance (QA) module; the accepted template is then used to compute the anomaly prior. Evaluated on the MVTec AD texture categories, zero-shot ID-RAG is competitive with specialized detectors in detection while substantially improving localization over naive VLMs. These results indicate that dynamically retrieved, image-grounded evidence is an effective strategy for adapting foundation models to high-stakes industrial inspection.

키워드

Industrial anomaly detection; Vision-language models; Multimodal learning; Retrieval-augmented generation; Zero-shot; Quality assurance

제목: ID-RAG: industrial defect retrieval-augmented generation for industrial surface defect detection

저자: Lee, Mingyu; Choi, Jongwon

DOI: 10.1007/s00138-026-01797-x

발행일: 2026-02

유형: Article

저널명: Machine Vision and Applications

권: 37

호: 2

ScholarWorks@중앙대학교

상세 보기

초록

키워드