Orthogonal Disentanglement for Generalizable Audio Deepfake Detection

Park, Jisoo; Lee, Seonghak; Kwon, Junseok

doi:10.1109/ICOIN68469.2026.11480561

상세 보기

Orthogonal Disentanglement for Generalizable Audio Deepfake Detection

Park, Jisoo;
Lee, Seonghak;
Kwon, Junseok

Citations

SCOPUS

0

초록

Recent advances in speech synthesis and voice conversion have produced highly realistic audio deepfakes, challenging existing detectors to generalize beyond seen attack types. Most current approaches overfit to forgery-specific artifacts, resulting in degraded performance on unseen attacks. In this paper, we present a lightweight yet effective framework that enhances generalization through orthogonal latent disentanglement and pseudo-fake augmentation. First, we decompose the latent embedding space into forgery-agnostic and forgery-specific subspaces using an orthogonal loss that explicitly decorrelates their representations, encouraging the detector to focus on domain-invariant cues. Second, we introduce a latent pseudofake generation method that perturbs the learned embeddings near the decision boundary, enriching the feature space and regularizing the classifier. Unlike prior works relying on adversarial domain adaptation or reconstruction decoders, our method is simple, stable, and compatible with existing Conformer-based backbones. Experiments on the ASVspoof 2019 LA and crossdataset evaluation with unseen vocoders demonstrate that our model achieves superior generalization to unseen attacks without additional data or complex training schemes.

키워드

Audio deepfake detection; Disentanglement; Generalizable detector; Representation learning

제목: Orthogonal Disentanglement for Generalizable Audio Deepfake Detection

저자: Park, Jisoo; Lee, Seonghak; Kwon, Junseok

DOI: 10.1109/ICOIN68469.2026.11480561

발행일: 2026

유형: Conference Paper

저널명: International Conference on Information Networking

페이지: 975 ~ 978