Self-supervised multi-modal training from uncurated images and reports enables monitoring AI in radiology

Park, Sangjoon; Lee, Eun Sun; Shin, Kyung Sook; Lee, Jeong Eun; Ye, Jong Chul

doi:10.1016/j.media.2023.103021

상세 보기

Self-supervised multi-modal training from uncurated images and reports enables monitoring AI in radiology

Park, Sangjoon;
Lee, Eun Sun;
Shin, Kyung Sook;
Lee, Jeong Eun;
Ye, Jong Chul

Citations

WEB OF SCIENCE

13

Citations

SCOPUS

13

초록

The escalating demand for artificial intelligence (AI) systems that can monitor and supervise human errors and abnormalities in healthcare presents unique challenges. Recent advances in vision-language models reveal the challenges of monitoring AI by understanding both visual and textual concepts and their semantic correspondences. However, there has been limited success in the application of vision-language models in the medical domain. Current vision-language models and learning strategies for photographic images and captions call for a web-scale data corpus of image and text pairs which is not often feasible in the medical domain. To address this, we present a model named medical cross-attention vision-language model (Medical X-VL), which leverages key components to be tailored for the medical domain. The model is based on the following components: self-supervised unimodal models in medical domain and a fusion encoder to bridge them, momentum distillation, sentencewise contrastive learning for medical reports, and sentence similarity-adjusted hard negative mining. We experimentally demonstrated that our model enables various zero-shot tasks for monitoring AI, ranging from the zero-shot classification to zero-shot error correction. Our model outperformed current state-of-the-art models in two medical image datasets, suggesting a novel clinical application of our monitoring AI model to alleviate human errors. Our method demonstrates a more specialized capacity for fine-grained understanding, which presents a distinct advantage particularly applicable to the medical domain. © 2023

키워드

Error detection; Monitoring AI; Radiograph; Vision-language model

제목: Self-supervised multi-modal training from uncurated images and reports enables monitoring AI in radiology

저자: Park, Sangjoon; Lee, Eun Sun; Shin, Kyung Sook; Lee, Jeong Eun; Ye, Jong Chul

DOI: 10.1016/j.media.2023.103021

발행일: 2024-01

유형: Article

저널명: Medical Image Analysis

권: 91

ScholarWorks@중앙대학교

상세 보기

초록

키워드