Repository landing page

We are not able to resolve this OAI Identifier to the repository landing page. If you are the repository manager for this record, please head to the Dashboard and adjust the settings.

Bandits with many optimal arms

Abstract

We consider a stochastic bandit problem with a possibly infinite number of arms. We write p\xe2\x88\x97 for the proportion of optimal arms and \xce\x94 for the minimal mean-gap between optimal and sub-optimal arms. We characterize the optimal learning rates both in the cumulative regret setting, and in the best-arm identification setting in terms of the problem parameters T (the budget), p\xe2\x88\x97 and \xce\x94. For the objective of minimizing the cumulative regret, we provide a lower bound of order \xce\xa9(log(T)/(p\xe2\x88\x97\xce\x94)) and a UCB-style algorithm with matching upper bound up to a factor of log(1/\xce\x94). Our algorithm needs p\xe2\x88\x97 to calibrate its parameters, and we prove that this knowledge is necessary, since adapting to p\xe2\x88\x97 in this setting is impossible. For best-arm identification we also provide a lower bound of order \xce\xa9(exp(\xe2\x88\x92cT\xce\x942p\xe2\x88\x97)) on the probability of outputting a sub-optimal arm where c>0 is an absolute constant. We also provide an elimination algorithm with an upper bound matching the lower bound up to a factor of order log(T) in the exponential, and that does not need p\xe2\x88\x97 or \xce\x94 as parameter. Our results apply directly to the three related problems of competing against the j-th best arm, identifying an \xcf\xb5 good arm, and finding an arm with mean larger than a quantile of a known order

Similar works

Full text

thumbnail-image

CWI's Institutional Repository

redirect
Last time updated on 10/01/2022

This paper was published in CWI's Institutional Repository.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.