Minimalistic Gridworld Environment (MiniGrid)

Maxime Chevalier-Boisvert ed662a5597 Implemented mechanism to prevent seeing through walls před 6 roky
figures c8b35cb515 Updated README před 6 roky
gym_minigrid ed662a5597 Implemented mechanism to prevent seeing through walls před 6 roky
pytorch_rl e00764ebf8 Lazily importing visdom před 6 roky
.gitignore 8fc72cda9f Fixed basicrl code před 7 roky
LICENSE 51a5d9079d Initial commit před 7 roky
README.md 3096a3b7f4 Updated README před 6 roky
run_tests.py 25fe4664fa Modified environments so they all produce observations in a dict před 6 roky
setup.py 17eba78e33 Fixed pyqt5 package version před 6 roky
standalone.py c4049b13ed Added Playground-v0 environment for experiments před 6 roky

README.md

Minimalistic Gridworld Environment (MiniGrid)

There are other gridworld Gym environments out there, but this one is designed to be particularly simple, lightweight and fast. The code has very few dependencies, making it less likely to break or fail to install. It loads no external sprites/textures, and it can run at up to 6000 FPS on a quad-core i7 laptop, which means you can run your experiments faster. Batteries are included: a known-working RL implementation is supplied in this repository to help you get started.

Requirements:

  • Python 3
  • OpenAI Gym
  • NumPy
  • PyQT 5 for graphics

This environment has been built at the MILA as part of the Baby AI Game project.

Installation

Clone this repository and install the other dependencies with pip3:

git clone https://github.com/maximecb/gym-minigrid.git
cd gym-minigrid
pip3 install -e .

Optionally, if you wish use the reinforcement learning code included under /pytorch_rl, you should install PyTorch as follows:

# PyTorch
conda install pytorch torchvision -c pytorch

Note: the pytorch_rl code is a custom fork of this repository, which was modified to work with this environment.

Basic Usage

To run the standalone UI application, which allows you to manually control the agent with the arrow keys:

./standalone.py

The environment being run can be selected with the --env-name option, eg:

./standalone.py --env-name MiniGrid-Empty-8x8-v0

Basic reinforcement learning code is provided in the pytorch_rl subdirectory. You can perform training using the A2C algorithm with:

python3 pytorch_rl/main.py --env-name MiniGrid-Empty-6x6-v0 --no-vis --num-processes 48 --algo a2c

You can view the result of training using the enjoy.py script:

python3 pytorch_rl/enjoy.py --env-name MiniGrid-Empty-6x6-v0 --load-dir ./trained_models/a2c

Design

MiniGrid is built to support tasks involving natural language and sparse rewards. The observations are dictionaries, with an 'image' field, partially observable view of the environment, and a 'mission' field which is a textual string describing the objective the agent should reach to get a reward. Using dictionaries makes it easy for you to add additional information to observations if you need to, without having to force everything into a single tensor. If your RL code expects a tensor for observations, please take a look at FlatObsWrapper in gym_minigrid/wrappers.py.

The partially observable view of the environment uses a compact and efficient encoding, with just 3 input values per visible grid cell, 147 values total. If you want to obtain an array of RGB pixels instead, see the getObsRender method in gym_minigrid/minigrid.py.

Structure of the world:

  • The world is an NxM grid of tiles
  • Each tile in the grid world contains zero or one object
    • Cells that do not contain an object have the value None
  • Each object has an associated discrete color (string)
  • Each object has an associated type (string)
    • Provided object types are: wall, door, locked_doors, key, ball, box and goal
  • The agent can pick up and carry exactly one object (eg: ball or key)

Actions in the basic environment:

  • Turn left
  • Turn right
  • Move forward
  • Toggle (pick up or interact with objects)
  • Wait (noop, do nothing)

By default, sparse rewards for reaching a goal square are provided, but you can define your own reward function by creating a class derived from MiniGridEnv. Extending the environment with new object types or action should be very easy very easy. If you wish to do this, you should take a look at the gym_minigrid/minigrid.py source file.

Included Environments

The environments listed below are implemented in the gym_minigrid/envs directory. Each environment provides one or more configurations registered with OpenAI gym. Each environment is also programmatically tunable in terms of size/complexity, which is useful for curriculum learning or to fine-tune difficulty.

Empty environment

Registered configurations:

  • MiniGrid-Empty-6x6-v0
  • MiniGrid-Empty-8x8-v0
  • MiniGrid-Empty-16x16-v0

This environment is an empty room, and the goal of the agent is to reach the green goal square, which provides a sparse reward. A small penalty is subtracted for the number of steps to reach the goal. This environment is useful, with small rooms, to validate that your RL algorithm works correctly, and with large rooms to experiment with sparse rewards.

Door & key environment

Registered configurations:

  • MiniGrid-DoorKey-5x5-v0
  • MiniGrid-DoorKey-6x6-v0
  • MiniGrid-DoorKey-8x8-v0
  • MiniGrid-DoorKey-16x16-v0

This environment has a key that the agent must pick up in order to unlock a goal and then get to the green goal square. This environment is difficult, because of the sparse reward, to solve using classical RL algorithms. It is useful to experiment with curiosity or curriculum learning.

Multi-room environment

Registered configurations:

  • MiniGrid-MultiRoom-N2-S4-v0 (two small rooms)
  • MiniGrid-MultiRoom-N6-v0 (six room)

This environment has a series of connected rooms with doors that must be opened in order to get to the next room. The final room has the green goal square the agent must get to. This environment is extremely difficult to solve using classical RL. However, by gradually increasing the number of rooms and building a curriculum, the environment can be solved.

Fetch environment

Registered configurations:

  • MiniGrid-Fetch-5x5-N2-v0
  • MiniGrid-Fetch-6x6-N2-v0
  • MiniGrid-Fetch-8x8-N3-v0

This environment has multiple objects of assorted types and colors. The agent receives a textual string as part of its observation telling it which object to pick up. Picking up the wrong object produces a negative reward.

Go-to-door environment

Registered configurations:

  • MiniGrid-GoToDoor-5x5-v0
  • MiniGrid-GoToDoor-6x6-v0
  • MiniGrid-GoToDoor-8x8-v0

This environment is a room with four doors, one on each wall. The agent receives a textual (mission) string as input, telling it which door to go to, (eg: "go to the red door"). It receives a positive reward for performing the wait action next to the correct door, as indicated in the mission string.

Put-near environment

Registered configurations:

  • MiniGrid-PutNear-6x6-N2-v0
  • MiniGrid-PutNear-8x8-N3-v0

The agent is instructed through a textual string to pick up an object and place it next to another object. This environment is easy to solve with two objects, but difficult to solve with more, as it involves both textual understanding and spatial reasoning involving multiple objects.

Locked Room Environment

Registed configurations:

  • MiniGrid-LockedRoom-v0

The environment has six rooms, one of which is locked. The agent receives a textual mission string as input, telling it which room to go to in order to get the key that opens the locked room. It then has to go into the locked room in order to reach the final goal. This environment is extremely difficult to solve with vanilla reinforcement learning alone.

Four room question answering environment

Registered configurations:

  • MiniGrid-FourRoomQA-v0

This environment is inspired by the Embodied Question Answering paper. The question are of the form:

Are there any keys in the red room?

There are four colored rooms, and the agent starts at a random position in the grid. Multiple objects of different types and colors are also placed at random positions in random rooms. A question and answer pair is generated, the question is given to the agent as an observation, and the agent has a limited number of time steps to explore the environment and produce a response. This environment can be easily modified to add more question types or to diversify the way the questions are phrased.