WildTalker∞ : Pushing the Limits of 3D Talking Portrait Synthesis in Unconstrained Environments
Citations

WEB OF SCIENCE

0
Citations

SCOPUS

0

초록

We propose WildTalker∞, a novel framework for synthesizing high-quality talking portraits capable of effectively addressing challenges encountered in real-world scenarios. Conventional techniques typically struggle to manage transient visual artifacts and auditory disturbances, thereby compromising the realism and synchronization of synthesized talking portraits. To overcome these limitations, WildTalker∞ introduces flow-guided temporal masking, a strategy specifically designed to accurately identify and mitigate transient movements within dynamic scenes, thereby enhancing visual coherence. Furthermore, WildTalker∞ integrates a multi-scale spectral subtraction method for robust audio denoising, significantly improving lip synchronization and ensuring natural audio-visual alignment even under challenging acoustic conditions. Our comprehensive experiments validate that WildTalker∞ considerably advances the synthesis quality of audio-driven 3D talking portraits, demonstrating superior lip synchronization performance in dynamic environments characterized by auditory and visual complexity. Comparative results confirm that WildTalker∞ consistently outperforms state-of-the-art methods across both controlled and uncontrolled scenarios, underscoring its practical efficacy and broad applicability.

키워드

Flow-guided temporal maskingmulti-scale spectral subtractiontalking portraits synthesis
제목
WildTalker∞ : Pushing the Limits of 3D Talking Portrait Synthesis in Unconstrained Environments
저자
Lee, SeonghakPark, JisooKwon, Junseok
DOI
10.1109/TASLPRO.2026.3664208
발행일
2026
유형
Article
저널명
IEEE Transactions on Audio, Speech and Language Processing
34
페이지
1259 ~ 1271