A large multilingual and multi-domain dataset for recommender systems

Giorgia Di Tommaso; Stefano Faralli; Paola Velardi

Repository landing page

research

oai:iris.uniroma1.it:11573/1112827

A large multilingual and multi-domain dataset for recommender systems

Authors: Giorgia Di Tommaso
Stefano Faralli
Paola Velardi
Publication date: 1 January 2018
Publisher

Abstract

This paper presents a multi-domain interests dataset to train and test Recommender Systems, and the methodology to create the dataset from Twitter messages in English and Italian. The English dataset includes an average of 90 preferences per user on music, books, movies, celebrities, sport, politics and much more, for about half million users. Preferences are either extracted from messages of users who use Spotify, Goodreads and other similar content sharing platforms, or induced from their ”topical” friends, i.e., followees representing an interest rather than a social relation between peers. In addition, preferred items are matched with Wikipedia articles describing them. This unique feature of our dataset provides a mean to derive a semantic categorization of the preferred items, exploiting available semantic resources linked to Wikipedia such as the Wikipedia Category Graph, DBpedia, BabelNet and others

Similar works

Full text

Open in the Core reader

Download PDF

Archivio della ricerca- Università di Roma La Sapienza

oai:iris.uniroma1.it:11573/111...

Last time updated on 12/06/2018

This paper was published in Archivio della ricerca- Università di Roma La Sapienza.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.