대규모 언어 모델 기반 애니메이션 숏폼 리프레이밍
LLM-based Animation Reframing for Short-Form Video
Citations

WEB OF SCIENCE

0
Citations

SCOPUS

0

초록

As most multimodal large language models (MLLMs) are trained on real-world data, MLLMs face challenges in accuratelyinterpreting animated videos that feature stylized visual characteristics such as exaggerated geometry and simplified shading. As aresult, video reframing often places key characters outside the frame, and recognition performance degrades when characters are notincluded in the pre-training data. To address this issue, this paper proposes a training-free short-form transformation pipeline thatjointly interprets animated video content and script-based text prompts while utilizing character images as visual queries. Theproposed approach first performs scene extraction using an MLLM, followed by object detection based on visual queries, and thenapplies Adaptive Zooming to mitigate object loss and cropping errors that may occur during the reframing process.

키워드

Large Language ModelSegment RetrievalVisual Query LocalizationAnimation
제목
대규모 언어 모델 기반 애니메이션 숏폼 리프레이밍
제목 (타언어)
LLM-based Animation Reframing for Short-Form Video
저자
이강희양해준배재형김탁훈최종원
DOI
10.5909/JBE.2026.31.1.152
발행일
2026-01
유형
Y
저널명
방송공학회 논문지
31
1
페이지
152 ~ 161

파일 다운로드