Skip to content

MountainCarContinuous v0

Calvin Hass edited this page Oct 29, 2017 · 10 revisions

Overview

Details

Description

An underpowered car must climb a one-dimensional hill to reach a target. Unlike MountainCar v0, the action (engine force applied) is allowed to be a continuous value.

The target is on top of a hill on the right-hand side of the car. If the car reaches it or goes beyond, the episode terminates.

On the left-hand side, there is another hill. Climbing this hill can be used to gain potential energy and accelerate towards the target. On top of this second hill, the car cannot go further than a position equal to -1, as if there was a wall. Hitting this limit does not generate a penalty (it might in a more challenging version).

Source

This environment corresponds to the continuous version of the mountain car environment described in Andrew Moore's PhD thesis (apart from the reward function).

Such a continuous version has been used in several research papers, e.g.:
http://image.diku.dk/igel/paper/VMRLMAttNMCP.pdf

Recently, it has been used to compare DDPG to CMA-ES in this paper:
http://arxiv.org/abs/1606.09152

Environment

Observation

Type: Box(2)

Num Observation Min Max
0 Car Position -1.2 0.6
1 Car Velocity -0.07 0.07

Note that velocity has been constrained to facilitate exploration, but this constraint might be relaxed in a more challenging version.

Actions

Type: Box(1)

Num Action
0 Push car to the left (negative value) or to the right (positive value)

Reward

Reward is 100 for reaching the target of the hill on the right hand side, minus the squared sum of actions from start to goal.

This reward function raises an exploration challenge, because if the agent does not reach the target soon enough, it will figure out that it is better not to move, and won't find the target anymore.

Note that this reward is unusual with respect to most published work, where the goal was to reach the target as fast as possible, hence favouring a bang-bang strategy.

Starting State

Position between -0.6 and -0.4, null velocity.

Episode Termination

Position equal to 0.5. A constraint on velocity might be added in a more challenging version.

Adding a maximum number of steps might be a good idea.

Solved Requirements

Get a reward over 90. This value might be tuned.