상세 보기
초록
As most multimodal large language models (MLLMs) are trained on real-world data, MLLMs face challenges in accuratelyinterpreting animated videos that feature stylized visual characteristics such as exaggerated geometry and simplified shading. As aresult, video reframing often places key characters outside the frame, and recognition performance degrades when characters are notincluded in the pre-training data. To address this issue, this paper proposes a training-free short-form transformation pipeline thatjointly interprets animated video content and script-based text prompts while utilizing character images as visual queries. Theproposed approach first performs scene extraction using an MLLM, followed by object detection based on visual queries, and thenapplies Adaptive Zooming to mitigate object loss and cropping errors that may occur during the reframing process.
키워드
- 제목
- 대규모 언어 모델 기반 애니메이션 숏폼 리프레이밍
- 제목 (타언어)
- LLM-based Animation Reframing for Short-Form Video
- 저자
- 이강희; 양해준; 배재형; 김탁훈; 최종원
- 발행일
- 2026-01
- 유형
- Y
- 저널명
- 방송공학회 논문지
- 권
- 31
- 호
- 1
- 페이지
- 152 ~ 161