A policy search method for temporal logic specified reinforcement learning tasks

Li, Xiao; Ma, Yao; Belta, Calin

Repository landing page

research

oai:open.bu.edu:2144/29603

A policy search method for temporal logic specified reinforcement learning tasks

Authors: Xiao Li
Yao Ma
Calin Belta
Publication date: 1 January 2017
Publisher

Abstract

Reward engineering is an important aspect of reinforcement learning. Whether or not the users’ intentions can be correctly encapsulated in the reward function can significantly impact the learning outcome. Current methods rely on manually crafted reward functions that often requires parameter tuning to obtain the desired behavior. This operation can be expensive when exploration requires systems to interact with the physical world. In this paper, we explore the use of temporal logic (TL) to specify tasks in reinforcement learning. TL formula can be translated to a real-valued function that measures its level of satisfaction against a trajectory. We take advantage of this function and propose temporal logic policy search (TLPS), a model-free learning technique that finds a policy that satisfies the TL specification. A set of simulated experiments are conducted to evaluate the proposed approach

Similar works

Full text

Boston University Institutional Repository (OpenBU)

oai:open.bu.edu:2144/29603

Last time updated on 09/07/2019

This paper was published in Boston University Institutional Repository (OpenBU).

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.