상세 보기
- Lee, Seonghak;
- Park, Jisoo;
- Kwon, Junseok
WEB OF SCIENCE
0SCOPUS
0초록
We propose WildTalker∞, a novel framework for synthesizing high-quality talking portraits capable of effectively addressing challenges encountered in real-world scenarios. Conventional techniques typically struggle to manage transient visual artifacts and auditory disturbances, thereby compromising the realism and synchronization of synthesized talking portraits. To overcome these limitations, WildTalker∞ introduces flow-guided temporal masking, a strategy specifically designed to accurately identify and mitigate transient movements within dynamic scenes, thereby enhancing visual coherence. Furthermore, WildTalker∞ integrates a multi-scale spectral subtraction method for robust audio denoising, significantly improving lip synchronization and ensuring natural audio-visual alignment even under challenging acoustic conditions. Our comprehensive experiments validate that WildTalker∞ considerably advances the synthesis quality of audio-driven 3D talking portraits, demonstrating superior lip synchronization performance in dynamic environments characterized by auditory and visual complexity. Comparative results confirm that WildTalker∞ consistently outperforms state-of-the-art methods across both controlled and uncontrolled scenarios, underscoring its practical efficacy and broad applicability.
키워드
- 제목
- WildTalker∞ : Pushing the Limits of 3D Talking Portrait Synthesis in Unconstrained Environments
- 저자
- Lee, Seonghak; Park, Jisoo; Kwon, Junseok
- 발행일
- 2026
- 유형
- Article
- 저널명
- IEEE Transactions on Audio, Speech and Language Processing
- 권
- 34
- 페이지
- 1259 ~ 1271