Minimalistic Gridworld Environment (MiniGrid)
|  | 7 lat temu | |
|---|---|---|
| figures | 7 lat temu | |
| gym_minigrid | 7 lat temu | |
| .gitignore | 7 lat temu | |
| .travis.yml | 7 lat temu | |
| LICENSE | 7 lat temu | |
| README.md | 7 lat temu | |
| run_tests.py | 7 lat temu | |
| setup.py | 7 lat temu | |
| standalone.py | 7 lat temu | 
There are other gridworld Gym environments out there, but this one is designed to be particularly simple, lightweight and fast. The code has very few dependencies, making it less likely to break or fail to install. It loads no external sprites/textures, and it can run at up to 6000 FPS on a quad-core i7 laptop, which means you can run your experiments faster. Batteries are included: a known-working RL implementation is supplied in this repository to help you get started.
Requirements:
Please use this bibtex if you want to cite this repository in your publications:
@misc{gym_minigrid,
  author = {Maxime Chevalier-Boisvert, Lucas Willems},
  title = {Minimalistic Gridworld Environment for OpenAI Gym},
  year = {2018},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/maximecb/gym-minigrid}},
}
This environment has been built as part of work done at the MILA.
Clone this repository and install the dependencies with pip3:
git clone https://github.com/maximecb/gym-minigrid.git
cd gym-minigrid
pip3 install -e .
To run the standalone UI application, which allows you to manually control the agent with the arrow keys:
./standalone.py
The environment being run can be selected with the --env-name option, eg:
./standalone.py --env-name MiniGrid-Empty-8x8-v0
If you want to train an agent with reinforcement learning, I recommend using the code found in the pytorch-a2c-ppo repository. This code has been tested and is known to work with this environment. The default hyper-parameters are also known to converge.
A sample training command is:
cd pytorch-a2c-ppo
python3 -m scripts.train --env MiniGrid-Empty-8x8-v0 --algo ppo
MiniGrid is built to support tasks involving natural language and sparse rewards.
The observations are dictionaries, with an 'image' field, partially observable
view of the environment, a 'mission' field which is a textual string
describing the objective the agent should reach to get a reward, and a 'direction'
field which can be used as an optional compass. Using dictionaries makes it
easy for you to add additional information to observations
if you need to, without having to force everything into a single tensor.
If your RL code expects one single tensor for observations, please take a look at
FlatObsWrapper in
gym_minigrid/wrappers.py.
The partially observable view of the environment uses a compact and efficient
encoding, with just 3 input values per visible grid cell, 7x7x3 values total.
If you want to obtain an array of RGB pixels instead, see the get_obs_render method in
gym_minigrid/minigrid.py.
Structure of the world:
NoneActions in the basic environment:
By default, sparse rewards are given for reaching a green goal tile. A
reward of 1 is given for success, and zero for failure. There is also an
environment-specific time step limit for completing the task.
You can define your own reward function by creating a class derived
from MiniGridEnv. Extending the environment with new object types or action
should be very easy. If you wish to do this, you should take a look at the
gym_minigrid/minigrid.py source file.
The environments listed below are implemented in the gym_minigrid/envs directory. Each environment provides one or more configurations registered with OpenAI gym. Each environment is also programmatically tunable in terms of size/complexity, which is useful for curriculum learning or to fine-tune difficulty.
Registered configurations:
MiniGrid-Empty-6x6-v0MiniGrid-Empty-8x8-v0MiniGrid-Empty-16x16-v0This environment is an empty room, and the goal of the agent is to reach the green goal square, which provides a sparse reward. A small penalty is subtracted for the number of steps to reach the goal. This environment is useful, with small rooms, to validate that your RL algorithm works correctly, and with large rooms to experiment with sparse rewards.
Registered configurations:
MiniGrid-DoorKey-5x5-v0MiniGrid-DoorKey-6x6-v0MiniGrid-DoorKey-8x8-v0MiniGrid-DoorKey-16x16-v0This environment has a key that the agent must pick up in order to unlock a goal and then get to the green goal square. This environment is difficult, because of the sparse reward, to solve using classical RL algorithms. It is useful to experiment with curiosity or curriculum learning.
Registered configurations:
MiniGrid-MultiRoom-N2-S4-v0 (two small rooms)MiniGrid-MultiRoom-N6-v0 (six room)This environment has a series of connected rooms with doors that must be opened in order to get to the next room. The final room has the green goal square the agent must get to. This environment is extremely difficult to solve using classical RL. However, by gradually increasing the number of rooms and building a curriculum, the environment can be solved.
Registered configurations:
MiniGrid-Fetch-5x5-N2-v0MiniGrid-Fetch-6x6-N2-v0MiniGrid-Fetch-8x8-N3-v0This environment has multiple objects of assorted types and colors. The agent receives a textual string as part of its observation telling it which object to pick up. Picking up the wrong object produces a negative reward.
Registered configurations:
MiniGrid-GoToDoor-5x5-v0MiniGrid-GoToDoor-6x6-v0MiniGrid-GoToDoor-8x8-v0This environment is a room with four doors, one on each wall. The agent
receives a textual (mission) string as input, telling it which door to go to,
(eg: "go to the red door"). It receives a positive reward for performing the
wait action next to the correct door, as indicated in the mission string.
Registered configurations:
MiniGrid-PutNear-6x6-N2-v0MiniGrid-PutNear-8x8-N3-v0The agent is instructed through a textual string to pick up an object and place it next to another object. This environment is easy to solve with two objects, but difficult to solve with more, as it involves both textual understanding and spatial reasoning involving multiple objects.
Registered configurations:
MiniGrid-RedBlueDoors-6x6-v0MiniGrid-RedBlueDoors-8x8-v0The agent is randomly placed within a room with one red and one blue door facing opposite directions. The agent has to open the red door and then open the blue door, in that order. The purpose of this environment is to test memory. The agent, when facing one door, cannot see the door behind him. Hence, the agent needs to remember whether or not he has previously opened the other door in order to reliably succeed at completing the task.
Registed configurations:
MiniGrid-LockedRoom-v0The environment has six rooms, one of which is locked. The agent receives a textual mission string as input, telling it which room to go to in order to get the key that opens the locked room. It then has to go into the locked room in order to reach the final goal. This environment is extremely difficult to solve with vanilla reinforcement learning alone.