DRGI: Disentangled Representation Graph Infomax for Video Retrieval
Citations

WEB OF SCIENCE

0
Citations

SCOPUS

0

초록

Vision-language models pretrained on image-text pairs have demonstrated strong performance in text-to-video retrieval through contrastive learning. However, videos contain much richer temporal and spatial information than their paired captions. Due to this discrepancy, each caption in the training set only corresponds to a subset of frames in its video, which poses a challenge. This challenge is amplified in negative pairs, where negative captions can be partially relevant to the video despite being mismatched. These hard negatives deviate from the conventional unimodal contrastive setting, which requires further attention. To this end, we propose Disentangled Representation Graph Infomax (DRGI), a model-agnostic framework that better exploits hard negatives. DRGI constructs fully connected graphs from disentangled video and text representations, where graph attention captures inter-node dependencies within each modality. We optimize an InfoMax objective between node-level and graph-level representations using Deep Graph Infomax. Hard negatives are treated as semantically corrupted graphs, encouraging the model to separate misleading patterns from true alignments. Extensive experiments on MSR-VTT, LSMDC, MSVD, and ActivityNet demonstrate that DRGI consistently outperforms base models, achieving state-of-the-art performance with up to 2.3% improvement in R@1 on MSR-VTT. More encouragingly, our plug-and-play framework can be seamlessly integrated into existing CLIP-based retrieval models, adding only 0.05% of parameters during training with no additional inference cost. Our code is available at https://github.com/kang7734/_DRGI_

키워드

disentangled representationgraph attention networkhard negative sampletext video retrieval
제목
DRGI: Disentangled Representation Graph Infomax for Video Retrieval
저자
Kang, Seong-MinLee, Na-HyunPark, Ji-HoCho, Yoon-Sik
DOI
10.1109/ACCESS.2026.3662719
발행일
2026
유형
Article
저널명
IEEE Access
14
페이지
26504 ~ 26515

파일 다운로드