Scanning Single Shot Detector for Math in Document Images

Mali, Parag Shrikrishna

Repository landing page

oai:repository.rit.edu:theses-11368

Scanning Single Shot Detector for Math in Document Images

Authors: Parag Shrikrishna Mali
Publication date: 1 August 2019
Publisher: RIT Digital Institutional Repository

Abstract

We introduce the Scanning Single Shot Detector (ScanSSD) for detecting both embedded and displayed math expressions in document images using a single-stage network that does not require page layout, font, or, character information. ScanSSD uses sliding windows to generate sub-images of large document page images rendered at 600 dpi and applies Single Shot Detector (SSD) on each sub-image. Detection results from sub-images are pooled to generate page-level results. For pooling sub-image level detections, we introduce new methods based on the confidence scores and density of detections. ScanSSD is a modular architecture that can be easily applied to detecting other objects in document images. For the math expression detection task, we have created a new dataset called TFD-ICDAR 2019 from the existing GTDB datasets. Our dataset has 569 pages for training with 26,396 math expressions and 236 pages for testing with 11,885 math expressions. ScanSSD achieves an 80.19% F-score at IOU50 and a 72.96% F-score at IOU75 on TFD-ICDAR 2019 test dataset. An earlier version of ScanSSD placed 2nd in the ICDAR 2019 competition on the Typeset Formula Detection (TFD). Our data and code are publicly available at https://github.com/MaliParag/TFD-ICDAR2019 and https://github.com/MaliParag/ScanSSD, respectively

Similar works

Full text

RIT Scholar Works

oai:repository.rit.edu:theses-...

Last time updated on 12/01/2024

This paper was published in RIT Scholar Works.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.