Effective audio-visual event localization using CLIP-based global context regulation for mitigating event overconfidence

Lee, Sang-Rak; Moon, A-Seong; Sohn, Bong-Soo; Lee, Jaesung

doi:10.1016/j.patcog.2026.113312

상세 보기

Effective audio-visual event localization using CLIP-based global context regulation for mitigating event overconfidence

Lee, Sang-Rak;
Moon, A-Seong;
Sohn, Bong-Soo;
Lee, Jaesung

Citations

WEB OF SCIENCE

0

Citations

SCOPUS

0

초록

Audio-Visual event localization refers to identifying events that are visible and audible in videos using joint modeling of auditory and visual modalities to detect these events in temporal video segments. A challenge arises when the audio and visual contexts are inconsistent, and this information is clearly present (e.g., the on-screen visual shows a baby crying, while an off-screen female is speaking, resulting in conflicting information between the modalities). In such cases, both modalities exhibit high significance values, causing the model to misclassify background as an event. To address this, we propose a CLIP-based global context regulation method that leverages a pre-trained AudioCLIP encoder. This approach effectively regulates event-relevant scores through post-processing and performs well even with limited training data containing inconsistencies. We introduce a benchmark dataset annotated for inconsistent cases to facilitate robust evaluation. Experimental results demonstrate that our model outperforms existing methods and achieves state-of-the-art performance in event localization. These findings highlight the importance of regulating event overconfidence in multimodal inconsistency, contributing to more accurate event localization in real-world applications. Our code and dataset are available at: https://github.com/PangRAK/GCRN

키워드

Audio-visual event localization; Context regulation; Cross-modality attention; Multimodal learning

제목: Effective audio-visual event localization using CLIP-based global context regulation for mitigating event overconfidence

저자: Lee, Sang-Rak; Moon, A-Seong; Sohn, Bong-Soo; Lee, Jaesung

DOI: 10.1016/j.patcog.2026.113312

발행일: 2026-09

유형: Article

저널명: Pattern Recognition

권: 177

ScholarWorks@중앙대학교

상세 보기

초록

키워드