상세 보기
- Choe, Taeyoung;
- Shin, Mincheol;
- Kim, Kwangyoung;
- Yang, Myungseok;
- Man, Ka Lok;
- ... Kim, Mucheol
WEB OF SCIENCE
0SCOPUS
0초록
Integrating flood-response datasets across municipalities is often hindered by heterogeneous and non-standard variable names, a challenge amplified in Korean by local naming conventions and linguistic variation. This study addresses scalable schema alignment to standardize municipal flood datasets with reduced manual effort while maintaining semantic consistency for downstream modeling. We propose a BERT-based schema matching framework that augments standardized attribute names with paraphrases generated by a generative language model and filtered to reduce semantic drift. Both standardized and target variable names are encoded using a flood-domain-adapted Korean BERT model, and candidate correspondences are retrieved via cosine-similarity ranking to produce top-k match suggestions for automated or human-in-the-loop alignment. Experiments on real flood-related tables from Busan and Incheon, evaluated jointly to diversify variable expressions, show that augmentation substantially improves top-k retrieval accuracy. In the combined evaluation, Hit@5 improves from 0.71 to 0.95, supporting more reliable schema harmonization for simulation-ready inputs.
키워드
- 제목
- BERT-Based Schema Matching for Integrating Heterogeneous Flood Data: A Case Study in Korea
- 저자
- Choe, Taeyoung; Shin, Mincheol; Kim, Kwangyoung; Yang, Myungseok; Man, Ka Lok; Kim, Mucheol
- 발행일
- 2026-03
- 유형
- Article
- 저널명
- SYSTEMS
- 권
- 14
- 호
- 3