상세 보기
- 이지민;
- 장항배
초록
The use of large language models (LLMs) is rapidly increasing, and prompt injection and jailbreak attacks are emerging as critical security threats. However, existing studies are predominantly English-centric and research specialized for Korean remains limited. To investigate linguistic vulnerabilities of LLMs, this study compares the attack success rate (ASR) of prompts written in English, plain Korean and honorific Korean. We conduct roleplay, parameter-manipulation, and MAC GCG attacks against both general-purpose and Korean-specialized LLMs. Experimental results show that, across the three attack types and three models, the average ASR for English and plain Korean is approximately 1.5%, while honorific Korean reaches about 2.0%. In particular, under roleplay attacks, the English–Llama combination records an ASR of 19.23%, and the honorific-Korean–kanana combination reaches 10.26%, revealing double-digit vulnerabilities for specific language–model pairs. These findings quantitatively de‘monstrate the security weaknesses of general-purpose LLMs in multilingual settings and suggest that honorific Korean can function as a new vector for bypassing safety alignment. By introducing sociolinguistic factors into LLM security analysis, this study refines safety evaluation criteria for multilingual LLMs and provides empirical evidence to strengthen the security of Korean-language models.
키워드
- 제목
- 다국어 Prompt Injection 기반 Jailbreak 취약점 분석 : 영어, 한국어 평어체, 경어체 비교를 중심으로
- 제목 (타언어)
- Multilingual Prompt-Injection-Based Jailbreak Vulnerability Analysis : A Comparative Study of English, Korean Plain Speech, and Honorific Style
- 저자
- 이지민; 장항배
- 발행일
- 2025-12
- 유형
- Y
- 저널명
- 융합보안 논문지
- 권
- 25
- 호
- 5
- 페이지
- 117 ~ 129