상세 보기
- Park, Sangjoon;
- Lee, Eun Sun;
- Shin, Kyung Sook;
- Lee, Jeong Eun;
- Ye, Jong Chul
WEB OF SCIENCE
13SCOPUS
13초록
The escalating demand for artificial intelligence (AI) systems that can monitor and supervise human errors and abnormalities in healthcare presents unique challenges. Recent advances in vision-language models reveal the challenges of monitoring AI by understanding both visual and textual concepts and their semantic correspondences. However, there has been limited success in the application of vision-language models in the medical domain. Current vision-language models and learning strategies for photographic images and captions call for a web-scale data corpus of image and text pairs which is not often feasible in the medical domain. To address this, we present a model named medical cross-attention vision-language model (Medical X-VL), which leverages key components to be tailored for the medical domain. The model is based on the following components: self-supervised unimodal models in medical domain and a fusion encoder to bridge them, momentum distillation, sentencewise contrastive learning for medical reports, and sentence similarity-adjusted hard negative mining. We experimentally demonstrated that our model enables various zero-shot tasks for monitoring AI, ranging from the zero-shot classification to zero-shot error correction. Our model outperformed current state-of-the-art models in two medical image datasets, suggesting a novel clinical application of our monitoring AI model to alleviate human errors. Our method demonstrates a more specialized capacity for fine-grained understanding, which presents a distinct advantage particularly applicable to the medical domain. © 2023
키워드
- 제목
- Self-supervised multi-modal training from uncurated images and reports enables monitoring AI in radiology
- 저자
- Park, Sangjoon; Lee, Eun Sun; Shin, Kyung Sook; Lee, Jeong Eun; Ye, Jong Chul
- 발행일
- 2024-01
- 유형
- Article
- 권
- 91