# Playing FronzenLake with simple RL Agent (Q-Table)

This is a simple RL Agent playing the FronzenLake in Open AI gym. Q-Table is useful when states space and action space are small. In realistic complex problem, you should consider using Q-Network instaed of Q-Table. I’ll deal Q-Network based RL Agent in the next post.

### Require:

• gym - Reinforcement Learning toolkit
• numpy - scientific computing library with Python
• matplotlib - Python 2D plotting library

### Python Script

First, we need to register our game environment. We don’t use ‘is_slippery’ option at this time. This makes our environment detemistic, so agent’s action in the given state makes same result all the time.

``````import gym
import numpy as np
import matplotlib.pyplot as plt
from gym,envs.registration import register

register(
id='FrozenLake-v3',
entry_point='gym.envs.toy_text:FrozenLakeEnv',
kwargs={'map_name': '4x4', 'is_slippery': False}
)
env = gym.make('FrozenLake-v3')
``````

And initialize Q-table with all zeros. Our Q-Table has 16 x 4 spaces (size of map * # of actions). Discount factor is used to decay future reward. You can adjust the number of episodes to train more or less.

``````# Initialize table with all zeros
Q = np.zeros([env.observation_space.n, env.action_space.n])
# Set learning parameters
dis = .99 # discount factor
num_episodes = 2000
``````

Here is the main script of RL Agent. E-greedy is used for exploitation & exploration strategy. According to probability, agent chooses an action randomly to exploration, otherwise choose to maximize a reward, exploitation.

``````# create lists to contain total rewards and steps per episode
rList = []
for i in range(num_episodes):
# Probability of E-greedy
e = 1. / ((i // 100)+1)

# Reset environment and get first new observation
state = env.reset()
rAll = 0
done = False

# The Q-Table learning algorithm
while not done:
# Choose an action by E-greedy
if np.random.rand(1) < e:
action = env.action_space.sample()
else:
action = np.argmax(Q[state, :])

"""
# Using random noise
action = np.argmax(Q[state,:] + np.random.randn(1,env.action_space.n) / (i+1))
"""

# Get new state and reward from environment
new_state, reward, done, _ = env.step(action)

# Update Q-Table with new knowledge using learning rate
Q[state,action] = reward + dis*np.max(Q[new_state,:])

rAll += reward
state = new_state

rList.append(rAll)
``````

Now, Enjoy FrozenLake with your RL Agent!
Full code is on Github

1. Playing FronzenLake with simple RL Agent (Q-Table)You’re here
2. Playing FronzenLake with simple RL Agent (Q-Network) with Tensorflow
Written on March 21, 2017