Playing FronzenLake with simple RL Agent (Q-Table)

This is a simple RL Agent playing the FronzenLake in Open AI gym. Q-Table is useful when states space and action space are small. In realistic complex problem, you should consider using Q-Network instaed of Q-Table. I’ll deal Q-Network based RL Agent in the next post.

Require:

  • gym - Reinforcement Learning toolkit
  • numpy - scientific computing library with Python
  • matplotlib - Python 2D plotting library

Python Script

First, we need to register our game environment. We don’t use ‘is_slippery’ option at this time. This makes our environment detemistic, so agent’s action in the given state makes same result all the time.

import gym
import numpy as np
import matplotlib.pyplot as plt
from gym,envs.registration import register

register(
	id='FrozenLake-v3',
	entry_point='gym.envs.toy_text:FrozenLakeEnv',
	kwargs={'map_name': '4x4', 'is_slippery': False}
)
env = gym.make('FrozenLake-v3')

And initialize Q-table with all zeros. Our Q-Table has 16 x 4 spaces (size of map * # of actions). Discount factor is used to decay future reward. You can adjust the number of episodes to train more or less.

# Initialize table with all zeros
Q = np.zeros([env.observation_space.n, env.action_space.n])
# Set learning parameters
dis = .99 # discount factor
num_episodes = 2000

Here is the main script of RL Agent. E-greedy is used for exploitation & exploration strategy. According to probability, agent chooses an action randomly to exploration, otherwise choose to maximize a reward, exploitation.

# create lists to contain total rewards and steps per episode
rList = []
for i in range(num_episodes):
	# Probability of E-greedy
	e = 1. / ((i // 100)+1)

	# Reset environment and get first new observation
	state = env.reset()
	rAll = 0
	done = False

	# The Q-Table learning algorithm
	while not done:
		# Choose an action by E-greedy
		if np.random.rand(1) < e:
			action = env.action_space.sample()
		else:
			action = np.argmax(Q[state, :])

		"""
		# Using random noise
		action = np.argmax(Q[state,:] + np.random.randn(1,env.action_space.n) / (i+1))
		"""

		# Get new state and reward from environment
		new_state, reward, done, _ = env.step(action)

		# Update Q-Table with new knowledge using learning rate
		Q[state,action] = reward + dis*np.max(Q[new_state,:])

		rAll += reward
		state = new_state

	rList.append(rAll)

Now, Enjoy FrozenLake with your RL Agent!
Full code is on Github

  1. Playing FronzenLake with simple RL Agent (Q-Table)You’re here
  2. Playing FronzenLake with simple RL Agent (Q-Network) with Tensorflow
Written on March 21, 2017