Handwritten and printed text separation in historical documents

Prikhodina, Anastasia

Repository landing page

oai:EVASTAR-Karlsruhe.de:1000141960

Handwritten and printed text separation in historical documents

Authors: Anastasia Prikhodina
Publication date: 17 January 2022
Publisher: Karlsruher Institut für Technologie
Doi

Abstract

Historical documents present many challenges for Optical Character Recognition Systems (OCR), especially documents of poor quality containing handwritten annotations, stamps, signatures, and historical fonts. As most OCRs recognize either machine-printed or handwritten texts, printed and handwritten parts have to be separated before using the respective recognition system. This thesis addresses the problem of segmentation of handwritings and printings in historical Latin text documents. To alleviate the problem of lack of data containing handwritten and machine-printed components located on the same page or even overlapping each other as well as their pixel-wise annotations, the data synthesis method proposed in [12] was applied and new datasets were generated. The newly created images and their pixel-level labels were used to train Fully Convolutional Model (FCN) introduced in [5]. The newly trained model has shown better results in the separation of machine-printed and handwritten text in historical documents

Similar works

Full text

Open in the Core reader

Download PDF

KITopen

oai:EVASTAR-Karlsruhe.de:10001...

Last time updated on 29/06/2022

This paper was published in KITopen.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.