Contextual bandits with cross-learning

Balseiro, Santiago; Golrezaei, Negin; Mahdian, Mohammad; Mirrokni, Vahab; Schneider, Jon

Repository landing page

oai:dspace.mit.edu:1721.1/137415.2

Contextual bandits with cross-learning

Authors: Santiago Balseiro
Negin Golrezaei
Mohammad Mahdian
Vahab Mirrokni
Jon Schneider
Publication date: 26 March 2021
Publisher

Abstract

© 2019 Neural information processing systems foundation. All rights reserved. In the classical contextual bandits problem, in each round t, a learner observes some context c, chooses some action a to perform, and receives some reward ra,t(c). We consider the variant of this problem where in addition to receiving the reward ra,t(c), the learner also learns the values of ra,t(c0) for all other contexts c0; i.e., the rewards that would have been achieved by performing that action under different contexts. This variant arises in several strategic settings, such as learning how to bid in non-truthful repeated auctions, which has gained a lot of attention lately as many platforms have switched to running first-price auctions. We call this problem the contextual bandits problem with cross-learning. The best algorithms for the classical contextual bandits problem achieve Õ(vCKT) regret against all stationary policies, where C is the number of contexts, K the number of actions, and T the number of rounds. We demonstrate algorithms for the contextual bandits problem with cross-learning that remove the dependence on C and achieve regret Õ(vKT). We simulate our algorithms on real auction data from an ad exchange running first-price auctions (showing that they outperform traditional contextual bandit algorithms)

Similar works

Full text

DSpace@MIT

oai:dspace.mit.edu:1721.1/1374...

Last time updated on 28/12/2021

This paper was published in DSpace@MIT.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.