# Minimalistic Gridworld Environment (MiniGrid) There are other gridworld Gym environments out there, but this one is designed to be particularly simple, lightweight and fast. The code has very few dependencies, making it less likely to break or fail to install. It loads no external sprites/textures, and it can run at up to 6000 FPS on a quad-core i7 laptop, which means you can run your experiments faster. Batteries are included: a known-working RL implementation is supplied in this repository to help you get started. Requirements: - Python 3 - OpenAI Gym - NumPy - PyQT 5 for graphics This environment has been built at the [MILA](https://mila.quebec/en/) as part of the [Baby AI Game](https://github.com/maximecb/baby-ai-game) project. ## Installation Clone this repository and install the other dependencies with `pip3`: ``` git clone https://github.com/maximecb/gym-minigrid.git cd gym-minigrid pip3 install -e . ``` Optionally, if you wish use the reinforcement learning code included under [/pytorch_rl](/pytorch_rl), you can install its dependencies as follows: ``` cd pytorch_rl # PyTorch conda install pytorch torchvision -c soumith # Other requirements pip3 install -r requirements.txt ``` Note: the pytorch_rl code is a custom fork of [this repository](https://github.com/ikostrikov/pytorch-a2c-ppo-acktr), which was modified to work with this environment. ## Basic Usage To run the standalone UI application, which allows you to manually control the agent with the arrow keys: ``` ./standalone.py ``` The environment being run can be selected with the `--env-name` option, eg: ``` ./standalone.py --env-name MiniGrid-Empty-8x8-v0 ``` Basic reinforcement learning code is provided in the `pytorch_rl` subdirectory. You can perform training using the A2C algorithm with: ``` python3 pytorch_rl/main.py --env-name MiniGrid-Empty-6x6-v0 --no-vis --num-processes 48 --algo a2c ``` You can view the result of training using the `enjoy.py` script: ``` python3 pytorch_rl/enjoy.py --env-name MiniGrid-Empty-6x6-v0 --load-dir ./trained_models/a2c ``` ## Design MiniGrid is built to support tasks involving natural language and sparse rewards. The observations are dictionaries, with an 'image' field, partially observable view of the environment, and a 'mission' field which is a textual string describing the objective the agent should reach to get a reward. Using dictionaries makes it easy for you to add additional information to observations if you need to, without having to force everything into a single tensor. If your RL code expects a tensor for observations, please take a look at `FlatObsWrapper` in [gym_minigrid/wrappers.py](/gym_minigrid/wrappers.py). The partially observable view of the environment uses a compact and efficient encoding, with just 3 input values per visible grid cell, 147 values total. If you want to obtain an array of RGB pixels instead, see the `getObsRender` method in [gym_minigrid/minigrid.py](gym_minigrid/minigrid.py). Structure of the world: - The world is an NxM grid of tiles - Each tile in the grid world contains zero or one object - Cells that do not contain an object have the value `None` - Each object has an associated discrete color (string) - Each object has an associated type (string) - Provided object types are: wall, door, locked_doors, key, ball, box and goal - The agent can pick up and carry exactly one object (eg: ball or key) Actions in the basic environment: - Turn left - Turn right - Move forward - Toggle (pick up or interact with objects) - Wait (noop, do nothing) By default, sparse rewards for reaching a goal square are provided, but you can define your own reward function by creating a class derived from MiniGridEnv. Extending the environment with new object types or action should be very easy very easy. If you wish to do this, you should take a look at the [gym_minigrid/minigrid.py](gym_minigrid/minigrid.py) source file. ## Included Environments The environments listed below are implemented in the [gym_minigrid/envs](/gym_minigrid/envs) directory. Each environment provides one or more configurations registered with OpenAI gym. Each environment is also programmatically tunable in terms of size/complexity, which is useful for curriculum learning or to fine-tune difficulty. ### Empty environment Registered configurations: - `MiniGrid-Empty-6x6-v0` - `MiniGrid-Empty-8x8-v0` - `MiniGrid-Empty-16x16-v0`