Repository landing page

We are not able to resolve this OAI Identifier to the repository landing page. If you are the repository manager for this record, please head to the Dashboard and adjust the settings.

Entity-Centric Stream Filtering and Ranking: Filtering and Unfilterable Documents

Abstract

Cumulative Citation Recommendation (CCR) is defined as: \ngiven a stream of documents on one hand and Knowledge Base (KB) entities on the other, filter, rank and recommend citation-worthy documents. \nThe pipeline encountered in systems that approach this problem involves \nfour stages: filtering, classification, ranking (or scoring), and evaluation. \nFiltering is only an initial step that reduces the web-scale corpus into a \nworking set of documents more manageable for the subsequent stages. \nNevertheless, this step has a large impact on the recall that can be at- \ntained maximally. This study analyzes in-depth the main factors that \naffect recall in the filtering stage. We investigate the impact of choices \nfor corpus cleansing, entity profile construction, entity type, document \ntype, and relevance grade. Because failing on recall in this first step of \nthe pipeline cannot be repaired later on, we identify and characterize \nthe citation-worthy documents that do not pass the filtering stage by \nexamining their contents

Similar works

This paper was published in CWI's Institutional Repository.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.