상세 보기
- Lee, Mingyu;
- Choi, Jongwon
WEB OF SCIENCE
0SCOPUS
0초록
General-purpose vision-language models (VLMs) often fail at industrial surface anomaly detection due to hallucinations and imprecise localization arising from missing grounded context. To address these limitations, we present Industrial Defect Retrieval-Augmented Generation (ID-RAG), a framework that reinterprets retrieval-augmented generation for industrial inspection. Rather than querying an external database, ID-RAG dynamically retrieves two types of image-internal evidence: (i) a constrained semantic classification over a domain vocabulary and (ii) a statistical anomaly prior from a state-of-the-art feature-based model. The retrieved evidence is fused into an augmented prompt that grounds the VLM and enables precise, evidence-backed outputs. ID-RAG operates in zero-shot mode by internally generating, using the VLM’s built-in text-to-image (T2I) capability, one or more generated template candidates and validating them against a pre-vetted reference standard image with a quality assurance (QA) module; the accepted template is then used to compute the anomaly prior. Evaluated on the MVTec AD texture categories, zero-shot ID-RAG is competitive with specialized detectors in detection while substantially improving localization over naive VLMs. These results indicate that dynamically retrieved, image-grounded evidence is an effective strategy for adapting foundation models to high-stakes industrial inspection.
키워드
- 제목
- ID-RAG: industrial defect retrieval-augmented generation for industrial surface defect detection
- 저자
- Lee, Mingyu; Choi, Jongwon
- 발행일
- 2026-02
- 유형
- Article
- 권
- 37
- 호
- 2