Incorporating Behavioral Constraints in Online AI Systems

Balakrishnan, Avinash; Bouneffouf, Djallel; Mattei, Nicholas; Rossi, Francesca

Repository landing page

oai:ojs.aaai.org:article/3762

Incorporating Behavioral Constraints in Online AI Systems

Authors: Avinash Balakrishnan
Djallel Bouneffouf
Nicholas Mattei
Francesca Rossi
Publication date: 17 July 2019
Publisher: Association for the Advancement of Artificial Intelligence
Doi

Abstract

AI systems that learn through reward feedback about the actions they take are increasingly deployed in domains that have significant impact on our daily life. However, in many cases the online rewards should not be the only guiding criteria, as there are additional constraints and/or priorities imposed by regulations, values, preferences, or ethical principles. We detail a novel online agent that learns a set of behavioral constraints by observation and uses these learned constraints as a guide when making decisions in an online setting while still being reactive to reward feedback. To define this agent, we propose to adopt a novel extension to the classical contextual multi-armed bandit setting and we provide a new algorithm called Behavior Constrained Thompson Sampling (BCTS) that allows for online learning while obeying exogenous constraints. Our agent learns a constrained policy that implements the observed behavioral constraints demonstrated by a teacher agent, and then uses this constrained policy to guide the reward-based online exploration and exploitation. We characterize the upper bound on the expected regret of the contextual bandit algorithm that underlies our agent and provide a case study with real world data in two application domains. Our experiments show that the designed agent is able to act within the set of behavior constraints without significantly degrading its overall reward performance

Similar works

Full text

Association for the Advancement of Artificial Intelligence: AAAI Publications

oai:ojs.aaai.org:article/3762

Last time updated on 30/11/2020

This paper was published in Association for the Advancement of Artificial Intelligence: AAAI Publications.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.