Repository landing page

We are not able to resolve this OAI Identifier to the repository landing page. If you are the repository manager for this record, please head to the Dashboard and adjust the settings.

A Survey on Region Extractors from Web Documents

Abstract

Extracting information from web documents has become a research area in which new proposals sprout out year after year. This has motivated several researchers to work on surveys that attempt to provide an overall picture of the many existing proposals. Unfortunately, none of these surveys provide a complete picture, because they do not take region extractors into account. These tools are kind of preprocessors, because they help information extractors focus on the regions of a web document that contain relevant information. With the increasing complexity of web documents, region extractors are becoming a must to extract information from many websites. Beyond information extraction, region extractors have also found their way into information retrieval, focused web crawling, topic distillation, adaptive content delivery, mashups, and metasearch engines. In this paper, we survey the existing proposals regarding region extractors and compare them side by side.Ministerio de Educación y Ciencia TIN2007-64119Junta de Andalucía P07-TIC-2602Junta de Andalucía P08- TIC-4100Ministerio de Ciencia e Innovación TIN2008-04718-EMinisterio de Ciencia e Innovación TIN2010-21744Ministerio de Economía, Industria y Competitividad TIN2010-09809-EMinisterio de Ciencia e Innovación TIN2010-10811-EMinisterio de Ciencia e Innovación TIN2010-09988-

Similar works

Full text

thumbnail-image

idUS. Depósito de Investigación Universidad de Sevilla

redirect
Last time updated on 01/04/2023

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.