Repository landing page

We are not able to resolve this OAI Identifier to the repository landing page. If you are the repository manager for this record, please head to the Dashboard and adjust the settings.

Using object detection to extract structured content from documents

Abstract

Structured content such as figures, tables, graphs, captions, and other graphical material often capture the essence of a document. Experienced readers often review the graphical material in a document first to quickly grasp the contents of the document. It is thus evident that identifying and extracting the structured content of a document, e.g., graphical components, is important in building a deeper semantic understanding of the document. Techniques presented herein automatically extract the structured content of documents. Machine-learning techniques, e.g., object detection, computer vision, etc., are used to recognize and extract the structured content. The techniques work well regardless of the tool used to create the document. For example, the document can be a PDF file, captured via screenshot, generated by a computer-aided design tool, etc. The techniques work across fields of study, across publishing conventions, languages and written scripts, and are robust to different formats of graphical content, e.g., vector/raster graphics

Similar works

This paper was published in Technical Disclosure Common.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.