오류 패턴 기반의 OCR 오류 수정
Correcting OCR Errors based on Error Patterns
Citations

WEB OF SCIENCE

0
Citations

SCOPUS

0

초록

The development of Optical Character Recognition (OCR) has made it possible to digitize analog documents. It shows very high recognition accuracy for standardized documents. However, OCR errors still occur frequently in complex documents. To resolve these issues, an OCR error correction procedure is required. The majority of OCR errors are repeated for the same characters. Accordingly, OCR error information has an important meaning in OCR error correction work. However, there are few studies utilizing OCR error information. In order to identify patterns, this study examines OCR error data. It then suggests an OCR mistake correction technique based on neural machine translation. Experiments were carried out using the English dataset from the ICDAR 2017/2019 Post-OCR text correction competition in order to validate the proposed method. The experimental results showed that the model using OCR error information demonstrated a higher improvement rate than the model without OCR error information. It also showed up to 8%P improved results compared to the existing state of the art.

키워드

Optical Character Recognition (OCR)OCR post-processingOCR error analysisOCR error correctionspell correction광학 문자 인식OCR 후처리OCR 오류 분석OCR 오류 수정철자 수정
제목
오류 패턴 기반의 OCR 오류 수정
제목 (타언어)
Correcting OCR Errors based on Error Patterns
저자
김나라조용석박호현
DOI
10.5626/JOK.2024.51.3.271
발행일
2024-03
저널명
정보과학회논문지
51
3
페이지
271 ~ 279