A Graph-Based Reinforcement Learning Method with Converged State Exploration and Exploitation

Li, Han; Chen, Tianding; Teng, Hualiang; Jiang, Yingtao

Repository landing page

research

oai:digitalscholarship.unlv.edu:fac_articles-1582

A Graph-Based Reinforcement Learning Method with Converged State Exploration and Exploitation

Authors: Han Li
Tianding Chen
Hualiang Teng
Yingtao Jiang
Publication date: 1 January 2019
Publisher: Digital Scholarship@UNLV
Doi

Abstract

In any classical value-based reinforcement learning method, an agent, despite of its continuous interactions with the environment, is yet unable to quickly generate a complete and independent description of the entire environment, leaving the learning method to struggle with a difficult dilemma of choosing between the two tasks, namely exploration and exploitation. This problem becomes more pronounced when the agent has to deal with a dynamic environment, of which the configuration and/or parameters are constantly changing. In this paper, this problem is approached by first mapping a reinforcement learning scheme to a directed graph, and the set that contains all the states already explored shall continue to be exploited in the context of such a graph. We have proved that the two tasks of exploration and exploitation eventually converge in the decision-making process, and thus, there is no need to face the exploration vs. exploitation tradeoff as all the existing reinforcement learning methods do. Rather this observation indicates that a reinforcement learning scheme is essentially the same as searching for the shortest path in a dynamic environment, which is readily tackled by a modified Floyd-Warshall algorithm as proposed in the paper. The experimental results have confirmed that the proposed graph-based reinforcement learning algorithm has significantly higher performance than both standard Q-learning algorithm and improved Q-learning algorithm in solving mazes, rendering it an algorithm of choice in applications involving dynamic environments

Similar works

Full text

Open in the Core reader

Download PDF

University of Nevada, Las Vegas Repository

oai:digitalscholarship.unlv.ed...

Last time updated on 09/07/2019

This paper was published in University of Nevada, Las Vegas Repository.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.