We are not able to resolve this OAI Identifier to the repository landing page. If you are the repository manager for this record, please head to the Dashboard and adjust the settings.
We consider a stochastic bandit problem with a possibly infinite number of arms. We write p\xe2\x88\x97 for the proportion of optimal arms and \xce\x94 for the minimal mean-gap between optimal and sub-optimal arms. We characterize the optimal learning rates both in the cumulative regret setting, and in the best-arm identification setting in terms of the problem parameters T (the budget), p\xe2\x88\x97 and \xce\x94. For the objective of minimizing the cumulative regret, we provide a lower bound of order \xce\xa9(log(T)/(p\xe2\x88\x97\xce\x94)) and a UCB-style algorithm with matching upper bound up to a factor of log(1/\xce\x94). Our algorithm needs p\xe2\x88\x97 to calibrate its parameters, and we prove that this knowledge is necessary, since adapting to p\xe2\x88\x97 in this setting is impossible. For best-arm identification we also provide a lower bound of order \xce\xa9(exp(\xe2\x88\x92cT\xce\x942p\xe2\x88\x97)) on the probability of outputting a sub-optimal arm where c>0 is an absolute constant. We also provide an elimination algorithm with an upper bound matching the lower bound up to a factor of order log(T) in the exponential, and that does not need p\xe2\x88\x97 or \xce\x94 as parameter. Our results apply directly to the three related problems of competing against the j-th best arm, identifying an \xcf\xb5 good arm, and finding an arm with mean larger than a quantile of a known order
Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.