UniRank: Unimodal Bandit Algorithm for Online Ranking

Gauthier, Camille-Sovanneary; Gaudel, Romaric; Fromont, Elisa

Repository landing page

oai:HAL:hal-03740981v1

UniRank: Unimodal Bandit Algorithm for Online Ranking

Authors: Camille-Sovanneary Gauthier
Romaric Gaudel
Elisa Fromont
Publication date: 17 July 2022
Publisher: HAL CCSD

Abstract

International audienceWe tackle, in the multiple-play bandit setting, the online ranking problem of assigning

L

items to

K

predefined positions on a web page in order to maximize the number of user clicks. We propose a generic algorithm, UniRank, that tackles state-of-the-art click models. The regret bound of this algorithm is a direct consequence of the unimodality-like property of the bandit setting with respect toa graph where nodes are ordered sets of indistinguishable items.The main contribution of UniRank is its

O\left(L/\Delta \log T\right)

regret for

T

consecutive assignments, where

\Delta

relates to the reward-gap between two items.This regret bound is based on the usually implicit condition that two items may not have the same attractiveness.Experiments against state-of-the-art learning algorithms specialized or not for different click models, show that our method has better regret performance than other generic algorithms on real life and synthetic datasets

Similar works

Full text

HAL Descartes

oai:HAL:hal-03740981v1

Last time updated on 19/08/2022

This paper was published in HAL Descartes.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.