상세 보기
- 김나라;
- 조용석;
- 박호현
WEB OF SCIENCE
0SCOPUS
0초록
The development of Optical Character Recognition (OCR) has made it possible to digitize analog documents. It shows very high recognition accuracy for standardized documents. However, OCR errors still occur frequently in complex documents. To resolve these issues, an OCR error correction procedure is required. The majority of OCR errors are repeated for the same characters. Accordingly, OCR error information has an important meaning in OCR error correction work. However, there are few studies utilizing OCR error information. In order to identify patterns, this study examines OCR error data. It then suggests an OCR mistake correction technique based on neural machine translation. Experiments were carried out using the English dataset from the ICDAR 2017/2019 Post-OCR text correction competition in order to validate the proposed method. The experimental results showed that the model using OCR error information demonstrated a higher improvement rate than the model without OCR error information. It also showed up to 8%P improved results compared to the existing state of the art.
키워드
- 제목
- 오류 패턴 기반의 OCR 오류 수정
- 제목 (타언어)
- Correcting OCR Errors based on Error Patterns
- 저자
- 김나라; 조용석; 박호현
- 발행일
- 2024-03
- 저널명
- 정보과학회논문지
- 권
- 51
- 호
- 3
- 페이지
- 271 ~ 279