BERT-Based Schema Matching for Integrating Heterogeneous Flood Data: A Case Study in Korea
  • Choe, Taeyoung
  • Shin, Mincheol
  • Kim, Kwangyoung
  • Yang, Myungseok
  • Man, Ka Lok
  • ... Kim, Mucheol
Citations

WEB OF SCIENCE

0
Citations

SCOPUS

0

초록

Integrating flood-response datasets across municipalities is often hindered by heterogeneous and non-standard variable names, a challenge amplified in Korean by local naming conventions and linguistic variation. This study addresses scalable schema alignment to standardize municipal flood datasets with reduced manual effort while maintaining semantic consistency for downstream modeling. We propose a BERT-based schema matching framework that augments standardized attribute names with paraphrases generated by a generative language model and filtered to reduce semantic drift. Both standardized and target variable names are encoded using a flood-domain-adapted Korean BERT model, and candidate correspondences are retrieved via cosine-similarity ranking to produce top-k match suggestions for automated or human-in-the-loop alignment. Experiments on real flood-related tables from Busan and Incheon, evaluated jointly to diversify variable expressions, show that augmentation substantially improves top-k retrieval accuracy. In the combined evaluation, Hit@5 improves from 0.71 to 0.95, supporting more reliable schema harmonization for simulation-ready inputs.

키워드

disaster managementschema matchingLLMs
제목
BERT-Based Schema Matching for Integrating Heterogeneous Flood Data: A Case Study in Korea
저자
Choe, TaeyoungShin, MincheolKim, KwangyoungYang, MyungseokMan, Ka LokKim, Mucheol
DOI
10.3390/systems14030267
발행일
2026-03
유형
Article
저널명
SYSTEMS
14
3

파일 다운로드