상세 보기
Orthogonal Disentanglement for Generalizable Audio Deepfake Detection
- Park, Jisoo;
- Lee, Seonghak;
- Kwon, Junseok
SCOPUS
0초록
Recent advances in speech synthesis and voice conversion have produced highly realistic audio deepfakes, challenging existing detectors to generalize beyond seen attack types. Most current approaches overfit to forgery-specific artifacts, resulting in degraded performance on unseen attacks. In this paper, we present a lightweight yet effective framework that enhances generalization through orthogonal latent disentanglement and pseudo-fake augmentation. First, we decompose the latent embedding space into forgery-agnostic and forgery-specific subspaces using an orthogonal loss that explicitly decorrelates their representations, encouraging the detector to focus on domain-invariant cues. Second, we introduce a latent pseudofake generation method that perturbs the learned embeddings near the decision boundary, enriching the feature space and regularizing the classifier. Unlike prior works relying on adversarial domain adaptation or reconstruction decoders, our method is simple, stable, and compatible with existing Conformer-based backbones. Experiments on the ASVspoof 2019 LA and crossdataset evaluation with unseen vocoders demonstrate that our model achieves superior generalization to unseen attacks without additional data or complex training schemes.
키워드
- 제목
- Orthogonal Disentanglement for Generalizable Audio Deepfake Detection
- 저자
- Park, Jisoo; Lee, Seonghak; Kwon, Junseok
- 발행일
- 2026
- 유형
- Conference Paper
- 저널명
- International Conference on Information Networking
- 페이지
- 975 ~ 978