Beyond the one-step greedy approach in reinforcement learning

Efroni, Yonathan; Dalal, Gal; Scherrer, Bruno; Mannor, Shie

Repository landing page

oai:HAL:hal-01927939v1

Beyond the one-step greedy approach in reinforcement learning

Authors: Yonathan Efroni
Gal Dalal
Bruno Scherrer
Shie Mannor
Publication date: 10 July 2018
Publisher: HAL CCSD

Abstract

International audienceThe famous Policy Iteration algorithm alternates between policy improvement and policy evaluation. Implementations of this algorithm with several variants of the latter evaluation stage, e.g, n-step and trace-based returns, have been analyzed in previous works. However, the case of multiple-step lookahead policy improvement, despite the recent increase in empirical evidence of its strength, has to our knowledge not been carefully analyzed yet. In this work, we introduce the first such analysis. Namely, we formulate variants of multiple-step policy improvement, derive new algorithms using these definitions and prove their convergence. Moreover, we show that recent prominent Reinforcement Learning algorithms fit well into our unified framework. We thus shed light on their empirical success and give a recipe for deriving new algorithms for future study

Similar works

Full text

HAL-Rennes 1

oai:HAL:hal-01927939v1

Last time updated on 31/01/2024

This paper was published in HAL-Rennes 1.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.