FrozenLake: Revise the unattainable reward_threshold to an attainable value #2205

ZhiqingXiao · 2021-03-28T07:10:24Z

Issues: The current reward_threhold for FrozenLake-v0 and FrozenLake8x8-v0 is too high to be attained.

Commit: df515de @joschu

Solution: Reduce the reward_threhold to make them attainable.

Reference: Codes to calculate the theoretic optimal reward expectations:

import gym
env = gym.make('FrozenLake-v0')
print(env.observation_space.n) # 16
print(env.action_space.n) # 4
print(env.spec.reward_threshold) # 0.78, should be smaller
print(env.spec.max_episode_steps) # 100

import numpy as np
v = np.zeros((101, 16), dtype=float)
q = np.zeros((101, 16, 4), dtype=float)
pi = np.zeros((101, 16), dtype=float)
for t in range(99, -1, -1): # backward
    for s in range(16):
        for a in range(4):
            for p, next_s, r, d in env.P[s][a]:
                q[t, s, a] += p * (r + (1. - float(d)) * v[t+1, next_s])
        v[t, s] = q[t, s].max()
        pi[t, s] = q[t, s].argmax()
print(v[0, 0]) # ~0.74 < 0.78

import gym
env = gym.make('FrozenLake8x8-v0')
print(env.observation_space.n) # 64
print(env.action_space.n) # 4
print(env.spec.reward_threshold) # 0.99, should be smaller
print(env.spec.max_episode_steps) # 200

import numpy as np
v = np.zeros((201, 64), dtype=float)
q = np.zeros((201, 64, 4), dtype=float)
pi = np.zeros((201, 64), dtype=float)
for t in range(199, -1, -1): # backward
    for s in range(64):
        for a in range(4):
            for p, next_s, r, d in env.P[s][a]:
                q[t, s, a] += p * (r + (1. - float(d)) * v[t+1, next_s])
        v[t, s] = q[t, s].max()
        pi[t, s] = q[t, s].argmax()
print(v[0, 0]) # ~0.91 < 0.99

@joschu

**Issues:** The current `reward_threhold` for `FrozenLake-v0` and `FrozenLake8x8-v0` is too high to be attained. Commit: df515de @joschu **Solution:** Reduce the `reward_threhold` to make them attainable. **Reference:** Codes to calculate the theoretic optimal reward expectations: ```python import gym env = gym.make('FrozenLake-v0') print(env.observation_space.n) # 16 print(env.action_space.n) # 4 print(env.spec.reward_threshold) # 0.78, should be smaller print(env.spec.max_episode_steps) # 100 import numpy as np v = np.zeros((101, 16), dtype=float) q = np.zeros((101, 16, 4), dtype=float) pi = np.zeros((101, 16), dtype=float) for t in range(99, -1, -1): # backward for s in range(16): for a in range(4): for p, next_s, r, d in env.P[s][a]: q[t, s, a] += p * (r + (1. - float(d)) * v[t+1, next_s]) v[t, s] = q[t, s].max() pi[t, s] = q[t, s].argmax() print(v[0, 0]) # ~0.74 < 0.78 ``` ```python import gym env = gym.make('FrozenLake8x8-v0') print(env.observation_space.n) # 64 print(env.action_space.n) # 4 print(env.spec.reward_threshold) # 0.99, should be smaller print(env.spec.max_episode_steps) # 200 import numpy as np v = np.zeros((201, 64), dtype=float) q = np.zeros((201, 64, 4), dtype=float) pi = np.zeros((201, 64), dtype=float) for t in range(199, -1, -1): # backward for s in range(64): for a in range(4): for p, next_s, r, d in env.P[s][a]: q[t, s, a] += p * (r + (1. - float(d)) * v[t+1, next_s]) v[t, s] = q[t, s].max() pi[t, s] = q[t, s].argmax() print(v[0, 0]) # ~0.91 < 0.99 ```

joschu · 2021-03-31T18:02:06Z

Thanks, will merge

ZhiqingXiao · 2021-04-01T12:58:19Z

Thanks for confirming and merging :)

…st. These changes need to go together because the build is broke due to both reasons in the past 12-24 hours. - Pillow was likely (I have not looked into that yet) being installed by another required package and is no longer a dependency of that package and thus the failure began. - OpenAI Gym is not longer loading FrozenLake-v0 and FrozenLake-v1 is new and might not be available to all users. Moving the kwargs test to use KellyCoinflip resolves the problem. Here is what happened with FrozenLake-v0: openai/gym#2205 and openai/gym#2315 PiperOrigin-RevId: 391328529 Change-Id: I09e6e962a32330b24b1fc44ebe222fb1d842d5c3

@joschu

…ai#2205) **Issues:** The current `reward_threhold` for `FrozenLake-v0` and `FrozenLake8x8-v0` is too high to be attained. Commit: openai@df515de @joschu **Solution:** Reduce the `reward_threhold` to make them attainable. **Reference:** Codes to calculate the theoretic optimal reward expectations: ```python import gym env = gym.make('FrozenLake-v0') print(env.observation_space.n) # 16 print(env.action_space.n) # 4 print(env.spec.reward_threshold) # 0.78, should be smaller print(env.spec.max_episode_steps) # 100 import numpy as np v = np.zeros((101, 16), dtype=float) q = np.zeros((101, 16, 4), dtype=float) pi = np.zeros((101, 16), dtype=float) for t in range(99, -1, -1): # backward for s in range(16): for a in range(4): for p, next_s, r, d in env.P[s][a]: q[t, s, a] += p * (r + (1. - float(d)) * v[t+1, next_s]) v[t, s] = q[t, s].max() pi[t, s] = q[t, s].argmax() print(v[0, 0]) # ~0.74 < 0.78 ``` ```python import gym env = gym.make('FrozenLake8x8-v0') print(env.observation_space.n) # 64 print(env.action_space.n) # 4 print(env.spec.reward_threshold) # 0.99, should be smaller print(env.spec.max_episode_steps) # 200 import numpy as np v = np.zeros((201, 64), dtype=float) q = np.zeros((201, 64, 4), dtype=float) pi = np.zeros((201, 64), dtype=float) for t in range(199, -1, -1): # backward for s in range(64): for a in range(4): for p, next_s, r, d in env.P[s][a]: q[t, s, a] += p * (r + (1. - float(d)) * v[t+1, next_s]) v[t, s] = q[t, s].max() pi[t, s] = q[t, s].argmax() print(v[0, 0]) # ~0.91 < 0.99 ```

ZhiqingXiao changed the title ~~Revise the unattainable reward_threshold to an attainable value~~ FrozenLake: Revise the unattainable reward_threshold to an attainable value Mar 28, 2021

joschu merged commit 151ba40 into openai:master Mar 31, 2021

jkterry1 mentioned this pull request Aug 13, 2021

Bump frozen lake Versions #2315

Merged

Arcify mentioned this pull request Feb 17, 2022

update frozen_lake docs #2619

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FrozenLake: Revise the unattainable reward_threshold to an attainable value #2205

FrozenLake: Revise the unattainable reward_threshold to an attainable value #2205

ZhiqingXiao commented Mar 28, 2021

joschu commented Mar 31, 2021

ZhiqingXiao commented Apr 1, 2021

FrozenLake: Revise the unattainable reward_threshold to an attainable value #2205

FrozenLake: Revise the unattainable reward_threshold to an attainable value #2205

Conversation

ZhiqingXiao commented Mar 28, 2021

joschu commented Mar 31, 2021

ZhiqingXiao commented Apr 1, 2021