Efficient supervised and semi-supervised approaches for affliations disambiguation

Cuxac, Pascal; Bonvallot, Valérie; Lamirel, Jean-Charles

Repository landing page

Efficient supervised and semi-supervised approaches for affliations disambiguation

Authors: Pascal Cuxac
Valérie Bonvallot
Jean-Charles Lamirel
Publication date: 23 October 2012
Publisher: HAL CCSD

Abstract

International audienceThe disambiguation of named entities is a challenge in many elds such as sciento- metrics, social networks, record linkage, citation analysis, semantic web...etc. The names ambiguities can arise from misspelling, typographical or OCR mistakes, abbreviations, omissions... So the search of names of persons or of organization is di cult, a single name can appear in di erent forms. This paper proposes two approaches to disambiguate on the a liations of authors of sci- enti c papers in bibliographic databases: the rst way, considers that we have a training corpus, and uses a Naive Bayesian model. The second way assumes that we have not re- source learning, and uses a semi-supervised approach, mixing soft-clustering and Bayesian learning. The results are encouraging and are already partially applied in a scienti c survey department. However, we aware that our approach may have limitations: we can't process e ciently highly unbalanced data but solutions are possible for future developments

Similar works

Full text

HAL Descartes

oai:HAL:hal-00956386v1

Last time updated on 14/04/2021

This paper was published in HAL Descartes.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.