|
@@ -1,6 +1,7 @@
|
|
# Minimalistic Gridworld Environment (MiniGrid)
|
|
# Minimalistic Gridworld Environment (MiniGrid)
|
|
|
|
|
|
-[](https://travis-ci.org/maximecb/gym-minigrid)
|
|
|
|
|
|
+[](https://pre-commit.com/)
|
|
|
|
+[](https://github.com/psf/black)
|
|
|
|
|
|
There are other gridworld Gym environments out there, but this one is
|
|
There are other gridworld Gym environments out there, but this one is
|
|
designed to be particularly simple, lightweight and fast. The code has very few
|
|
designed to be particularly simple, lightweight and fast. The code has very few
|
|
@@ -10,10 +11,10 @@ laptop, which means you can run your experiments faster. A known-working RL
|
|
implementation can be found [in this repository](https://github.com/lcswillems/torch-rl).
|
|
implementation can be found [in this repository](https://github.com/lcswillems/torch-rl).
|
|
|
|
|
|
Requirements:
|
|
Requirements:
|
|
-- Python 3.7+
|
|
|
|
-- OpenAI Gym 0.25
|
|
|
|
-- NumPy
|
|
|
|
-- Matplotlib (optional, only needed for display)
|
|
|
|
|
|
+- Python 3.7 to 3.10
|
|
|
|
+- OpenAI Gym v0.22 to v0.25
|
|
|
|
+- NumPy 1.18+
|
|
|
|
+- Matplotlib (optional, only needed for display) - 3.0+
|
|
|
|
|
|
Please use this bibtex if you want to cite this repository in your publications:
|
|
Please use this bibtex if you want to cite this repository in your publications:
|
|
|
|
|
|
@@ -29,6 +30,8 @@ Please use this bibtex if you want to cite this repository in your publications:
|
|
```
|
|
```
|
|
|
|
|
|
List of publications & submissions using MiniGrid or BabyAI (please open a pull request to add missing entries):
|
|
List of publications & submissions using MiniGrid or BabyAI (please open a pull request to add missing entries):
|
|
|
|
+- [Leveraging Approximate Symbolic Models for Reinforcement Learning via Skill Diversity](https://arxiv.org/abs/2202.02886) (Arizona State University, ICML 2022)
|
|
|
|
+- [How to Stay Curious while avoiding Noisy TVs using Aleatoric Uncertainty Estimation](https://proceedings.mlr.press/v162/mavor-parker22a.html) (University College London, Boston University, ICML 2022)
|
|
- [In a Nutshell, the Human Asked for This: Latent Goals for Following Temporal Specifications](https://openreview.net/pdf?id=rUwm9wCjURV) (Imperial College London, ICLR 2022)
|
|
- [In a Nutshell, the Human Asked for This: Latent Goals for Following Temporal Specifications](https://openreview.net/pdf?id=rUwm9wCjURV) (Imperial College London, ICLR 2022)
|
|
- [Interesting Object, Curious Agent: Learning Task-Agnostic Exploration](https://arxiv.org/abs/2111.13119) (Meta AI Research, NeurIPS 2021)
|
|
- [Interesting Object, Curious Agent: Learning Task-Agnostic Exploration](https://arxiv.org/abs/2111.13119) (Meta AI Research, NeurIPS 2021)
|
|
- [Safe Policy Optimization with Local Generalized Linear Function Approximations](https://arxiv.org/abs/2111.04894) (IBM Research, Tsinghua University, NeurIPS 2021)
|
|
- [Safe Policy Optimization with Local Generalized Linear Function Approximations](https://arxiv.org/abs/2111.04894) (IBM Research, Tsinghua University, NeurIPS 2021)
|
|
@@ -46,16 +49,13 @@ List of publications & submissions using MiniGrid or BabyAI (please open a pull
|
|
- [RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments](https://openreview.net/forum?id=rkg-TJBFPB) (FAIR, ICLR 2020)
|
|
- [RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments](https://openreview.net/forum?id=rkg-TJBFPB) (FAIR, ICLR 2020)
|
|
- [Learning to Request Guidance in Emergent Communication](https://arxiv.org/pdf/1912.05525.pdf) (University of Amsterdam, Dec 2019)
|
|
- [Learning to Request Guidance in Emergent Communication](https://arxiv.org/pdf/1912.05525.pdf) (University of Amsterdam, Dec 2019)
|
|
- [Working Memory Graphs](https://arxiv.org/abs/1911.07141) (MSR, Nov 2019)
|
|
- [Working Memory Graphs](https://arxiv.org/abs/1911.07141) (MSR, Nov 2019)
|
|
-- [Fast Task-Adaptation for Tasks Labeled Using
|
|
|
|
-Natural Language in Reinforcement Learning](https://arxiv.org/pdf/1910.04040.pdf) (Oct 2019, University of Antwerp)
|
|
|
|
-- [Generalization in Reinforcement Learning with Selective Noise Injection and Information Bottleneck
|
|
|
|
-](https://arxiv.org/abs/1910.12911) (MSR, NeurIPS, Oct 2019)
|
|
|
|
|
|
+- [Fast Task-Adaptation for Tasks Labeled Using Natural Language in Reinforcement Learning](https://arxiv.org/pdf/1910.04040.pdf) (Oct 2019, University of Antwerp)
|
|
|
|
+- [Generalization in Reinforcement Learning with Selective Noise Injection and Information Bottleneck](https://arxiv.org/abs/1910.12911) (MSR, NeurIPS, Oct 2019)
|
|
- [Recurrent Independent Mechanisms](https://arxiv.org/pdf/1909.10893.pdf) (Mila, Sept 2019)
|
|
- [Recurrent Independent Mechanisms](https://arxiv.org/pdf/1909.10893.pdf) (Mila, Sept 2019)
|
|
- [Learning Effective Subgoals with Multi-Task Hierarchical Reinforcement Learning](http://surl.tirl.info/proceedings/SURL-2019_paper_10.pdf) (Tsinghua University, August 2019)
|
|
- [Learning Effective Subgoals with Multi-Task Hierarchical Reinforcement Learning](http://surl.tirl.info/proceedings/SURL-2019_paper_10.pdf) (Tsinghua University, August 2019)
|
|
- [Mastering emergent language: learning to guide in simulated navigation](https://arxiv.org/abs/1908.05135) (University of Amsterdam, Aug 2019)
|
|
- [Mastering emergent language: learning to guide in simulated navigation](https://arxiv.org/abs/1908.05135) (University of Amsterdam, Aug 2019)
|
|
- [Transfer Learning by Modeling a Distribution over Policies](https://arxiv.org/abs/1906.03574) (Mila, June 2019)
|
|
- [Transfer Learning by Modeling a Distribution over Policies](https://arxiv.org/abs/1906.03574) (Mila, June 2019)
|
|
-- [Reinforcement Learning with Competitive Ensembles
|
|
|
|
-of Information-Constrained Primitives](https://arxiv.org/abs/1906.10667) (Mila, June 2019)
|
|
|
|
|
|
+- [Reinforcement Learning with Competitive Ensembles of Information-Constrained Primitives](https://arxiv.org/abs/1906.10667) (Mila, June 2019)
|
|
- [Learning distant cause and effect using only local and immediate credit assignment](https://arxiv.org/abs/1905.11589) (Incubator 491, May 2019)
|
|
- [Learning distant cause and effect using only local and immediate credit assignment](https://arxiv.org/abs/1905.11589) (Incubator 491, May 2019)
|
|
- [Practical Open-Loop Optimistic Planning](https://arxiv.org/abs/1904.04700) (INRIA, April 2019)
|
|
- [Practical Open-Loop Optimistic Planning](https://arxiv.org/abs/1904.04700) (INRIA, April 2019)
|
|
- [Learning World Graphs to Accelerate Hierarchical Reinforcement Learning](https://arxiv.org/abs/1907.00664) (Salesforce Research, 2019)
|
|
- [Learning World Graphs to Accelerate Hierarchical Reinforcement Learning](https://arxiv.org/abs/1907.00664) (Salesforce Research, 2019)
|
|
@@ -93,18 +93,19 @@ pip3 install -e .
|
|
There is a UI application which allows you to manually control the agent with the arrow keys:
|
|
There is a UI application which allows you to manually control the agent with the arrow keys:
|
|
|
|
|
|
```
|
|
```
|
|
-./manual_control.py
|
|
|
|
|
|
+./gym-minigrid/manual_control.py
|
|
```
|
|
```
|
|
|
|
|
|
The environment being run can be selected with the `--env` option, eg:
|
|
The environment being run can be selected with the `--env` option, eg:
|
|
|
|
|
|
```
|
|
```
|
|
-./manual_control.py --env MiniGrid-Empty-8x8-v0
|
|
|
|
|
|
+./gym-minigrid/manual_control.py --env MiniGrid-Empty-8x8-v0
|
|
```
|
|
```
|
|
|
|
|
|
## Reinforcement Learning
|
|
## Reinforcement Learning
|
|
|
|
|
|
-If you want to train an agent with reinforcement learning, I recommend using the code found in the [torch-rl](https://github.com/lcswillems/torch-rl) repository. This code has been tested and is known to work with this environment. The default hyper-parameters are also known to converge.
|
|
|
|
|
|
+If you want to train an agent with reinforcement learning, I recommend using the code found in the [torch-rl](https://github.com/lcswillems/torch-rl) repository.
|
|
|
|
+This code has been tested and is known to work with this environment. The default hyper-parameters are also known to converge.
|
|
|
|
|
|
A sample training command is:
|
|
A sample training command is:
|
|
|
|
|
|
@@ -123,9 +124,9 @@ field which can be used as an optional compass. Using dictionaries makes it
|
|
easy for you to add additional information to observations
|
|
easy for you to add additional information to observations
|
|
if you need to, without having to encode everything into a single tensor.
|
|
if you need to, without having to encode everything into a single tensor.
|
|
|
|
|
|
-There are a variery of wrappers to change the observation format available in [gym_minigrid/wrappers.py](/gym_minigrid/wrappers.py). If your RL code expects one single tensor for observations, take a look at
|
|
|
|
-`FlatObsWrapper`. There is also an `ImgObsWrapper` that gets rid of the 'mission' field in observations,
|
|
|
|
-leaving only the image field tensor.
|
|
|
|
|
|
+There are a variety of wrappers to change the observation format available in [gym_minigrid/wrappers.py](/gym_minigrid/wrappers.py).
|
|
|
|
+If your RL code expects one single tensor for observations, take a look at `FlatObsWrapper`.
|
|
|
|
+There is also an `ImgObsWrapper` that gets rid of the 'mission' field in observations, leaving only the image field tensor.
|
|
|
|
|
|
Please note that the default observation format is a partially observable view of the environment using a
|
|
Please note that the default observation format is a partially observable view of the environment using a
|
|
compact and efficient encoding, with 3 input values per visible grid cell, 7x7x3 values total.
|
|
compact and efficient encoding, with 3 input values per visible grid cell, 7x7x3 values total.
|
|
@@ -164,9 +165,9 @@ Actions in the basic environment:
|
|
- Done (task completed, optional)
|
|
- Done (task completed, optional)
|
|
|
|
|
|
Default tile/observation encoding:
|
|
Default tile/observation encoding:
|
|
-- Each tile is encoded as a 3 dimensional tuple: (OBJECT_IDX, COLOR_IDX, STATE)
|
|
|
|
-- OBJECT_TO_IDX and COLOR_TO_IDX mapping can be found in [gym_minigrid/minigrid.py](gym_minigrid/minigrid.py)
|
|
|
|
-- e.g. door STATE -> 0: open, 1: closed, 2: locked
|
|
|
|
|
|
+- Each tile is encoded as a 3 dimensional tuple: `(OBJECT_IDX, COLOR_IDX, STATE)`
|
|
|
|
+- `OBJECT_TO_IDX` and `COLOR_TO_IDX` mapping can be found in [gym_minigrid/minigrid.py](gym_minigrid/minigrid.py)
|
|
|
|
+- `STATE` refers to the door state with 0=open, 1=closed and 2=locked
|
|
|
|
|
|
By default, sparse rewards are given for reaching a green goal tile. A
|
|
By default, sparse rewards are given for reaching a green goal tile. A
|
|
reward of 1 is given for success, and zero for failure. There is also an
|
|
reward of 1 is given for success, and zero for failure. There is also an
|
|
@@ -185,18 +186,6 @@ or to fine-tune difficulty.
|
|
|
|
|
|
### Empty environment
|
|
### Empty environment
|
|
|
|
|
|
-Registered configurations:
|
|
|
|
-- `MiniGrid-Empty-5x5-v0`
|
|
|
|
-- `MiniGrid-Empty-Random-5x5-v0`
|
|
|
|
-- `MiniGrid-Empty-6x6-v0`
|
|
|
|
-- `MiniGrid-Empty-Random-6x6-v0`
|
|
|
|
-- `MiniGrid-Empty-8x8-v0`
|
|
|
|
-- `MiniGrid-Empty-16x16-v0`
|
|
|
|
-
|
|
|
|
-<p align="center">
|
|
|
|
-<img src="/figures/empty-env.png" width=250>
|
|
|
|
-</p>
|
|
|
|
-
|
|
|
|
This environment is an empty room, and the goal of the agent is to reach the
|
|
This environment is an empty room, and the goal of the agent is to reach the
|
|
green goal square, which provides a sparse reward. A small penalty is
|
|
green goal square, which provides a sparse reward. A small penalty is
|
|
subtracted for the number of steps to reach the goal. This environment is
|
|
subtracted for the number of steps to reach the goal. This environment is
|
|
@@ -206,115 +195,121 @@ The random variants of the environment have the agent starting at a random
|
|
position for each episode, while the regular variants have the agent always
|
|
position for each episode, while the regular variants have the agent always
|
|
starting in the corner opposite to the goal.
|
|
starting in the corner opposite to the goal.
|
|
|
|
|
|
-### Four rooms environment
|
|
|
|
-
|
|
|
|
-Registered configurations:
|
|
|
|
-- `MiniGrid-FourRooms-v0`
|
|
|
|
-
|
|
|
|
<p align="center">
|
|
<p align="center">
|
|
-<img src="/figures/four-rooms-env.png" width=380>
|
|
|
|
|
|
+ <img src="figures/empty-env.png" width=250 alt="Figure of the empty environment">
|
|
</p>
|
|
</p>
|
|
|
|
|
|
|
|
+Registered configurations:
|
|
|
|
+- `MiniGrid-Empty-5x5-v0`
|
|
|
|
+- `MiniGrid-Empty-Random-5x5-v0`
|
|
|
|
+- `MiniGrid-Empty-6x6-v0`
|
|
|
|
+- `MiniGrid-Empty-Random-6x6-v0`
|
|
|
|
+- `MiniGrid-Empty-8x8-v0`
|
|
|
|
+- `MiniGrid-Empty-16x16-v0`
|
|
|
|
+
|
|
|
|
+### Four rooms environment
|
|
|
|
+
|
|
Classic four room reinforcement learning environment. The agent must navigate
|
|
Classic four room reinforcement learning environment. The agent must navigate
|
|
in a maze composed of four rooms interconnected by 4 gaps in the walls. To
|
|
in a maze composed of four rooms interconnected by 4 gaps in the walls. To
|
|
obtain a reward, the agent must reach the green goal square. Both the agent
|
|
obtain a reward, the agent must reach the green goal square. Both the agent
|
|
and the goal square are randomly placed in any of the four rooms.
|
|
and the goal square are randomly placed in any of the four rooms.
|
|
|
|
|
|
-### Door & key environment
|
|
|
|
-
|
|
|
|
-Registered configurations:
|
|
|
|
-- `MiniGrid-DoorKey-5x5-v0`
|
|
|
|
-- `MiniGrid-DoorKey-6x6-v0`
|
|
|
|
-- `MiniGrid-DoorKey-8x8-v0`
|
|
|
|
-- `MiniGrid-DoorKey-16x16-v0`
|
|
|
|
-
|
|
|
|
<p align="center">
|
|
<p align="center">
|
|
-<img src="/figures/door-key-env.png">
|
|
|
|
|
|
+ <img src="figures/four-rooms-env.png" width=380 alt="Figure of the four room environment">
|
|
</p>
|
|
</p>
|
|
|
|
|
|
|
|
+Registered configurations:
|
|
|
|
+- `MiniGrid-FourRooms-v0`
|
|
|
|
+
|
|
|
|
+### Door & key environment
|
|
|
|
+
|
|
This environment has a key that the agent must pick up in order to unlock
|
|
This environment has a key that the agent must pick up in order to unlock
|
|
a goal and then get to the green goal square. This environment is difficult,
|
|
a goal and then get to the green goal square. This environment is difficult,
|
|
because of the sparse reward, to solve using classical RL algorithms. It is
|
|
because of the sparse reward, to solve using classical RL algorithms. It is
|
|
useful to experiment with curiosity or curriculum learning.
|
|
useful to experiment with curiosity or curriculum learning.
|
|
|
|
|
|
-### Multi-room environment
|
|
|
|
-
|
|
|
|
-Registered configurations:
|
|
|
|
-- `MiniGrid-MultiRoom-N2-S4-v0` (two small rooms)
|
|
|
|
-- `MiniGrid-MultiRoom-N4-S5-v0` (four rooms)
|
|
|
|
-- `MiniGrid-MultiRoom-N6-v0` (six rooms)
|
|
|
|
-
|
|
|
|
<p align="center">
|
|
<p align="center">
|
|
-<img src="/figures/multi-room.gif" width=416 height=424>
|
|
|
|
|
|
+ <img src="figures/door-key-env.png" alt="Figure of the door key environment">
|
|
</p>
|
|
</p>
|
|
|
|
|
|
|
|
+Registered configurations:
|
|
|
|
+- `MiniGrid-DoorKey-5x5-v0`
|
|
|
|
+- `MiniGrid-DoorKey-6x6-v0`
|
|
|
|
+- `MiniGrid-DoorKey-8x8-v0`
|
|
|
|
+- `MiniGrid-DoorKey-16x16-v0`
|
|
|
|
+
|
|
|
|
+### Multi-room environment
|
|
|
|
+
|
|
This environment has a series of connected rooms with doors that must be
|
|
This environment has a series of connected rooms with doors that must be
|
|
opened in order to get to the next room. The final room has the green goal
|
|
opened in order to get to the next room. The final room has the green goal
|
|
square the agent must get to. This environment is extremely difficult to
|
|
square the agent must get to. This environment is extremely difficult to
|
|
solve using RL alone. However, by gradually increasing the number of
|
|
solve using RL alone. However, by gradually increasing the number of
|
|
rooms and building a curriculum, the environment can be solved.
|
|
rooms and building a curriculum, the environment can be solved.
|
|
|
|
|
|
-### Fetch environment
|
|
|
|
|
|
+<p align="center">
|
|
|
|
+ <img src="figures/multi-room.gif" width=416 height=424 alt="Figure of the Multi-room environment">
|
|
|
|
+</p>
|
|
|
|
|
|
Registered configurations:
|
|
Registered configurations:
|
|
-- `MiniGrid-Fetch-5x5-N2-v0`
|
|
|
|
-- `MiniGrid-Fetch-6x6-N2-v0`
|
|
|
|
-- `MiniGrid-Fetch-8x8-N3-v0`
|
|
|
|
|
|
+- `MiniGrid-MultiRoom-N2-S4-v0` (two small rooms)
|
|
|
|
+- `MiniGrid-MultiRoom-N4-S5-v0` (four rooms)
|
|
|
|
+- `MiniGrid-MultiRoom-N6-v0` (six rooms)
|
|
|
|
|
|
-<p align="center">
|
|
|
|
-<img src="/figures/fetch-env.png" width=450>
|
|
|
|
-</p>
|
|
|
|
|
|
+### Fetch environment
|
|
|
|
|
|
This environment has multiple objects of assorted types and colors. The
|
|
This environment has multiple objects of assorted types and colors. The
|
|
agent receives a textual string as part of its observation telling it
|
|
agent receives a textual string as part of its observation telling it
|
|
which object to pick up. Picking up the wrong object terminates the
|
|
which object to pick up. Picking up the wrong object terminates the
|
|
episode with zero reward.
|
|
episode with zero reward.
|
|
|
|
|
|
-### Go-to-door environment
|
|
|
|
|
|
+<p align="center">
|
|
|
|
+ <img src="figures/fetch-env.png" width=450 alt="Figure of the fetch environment">
|
|
|
|
+</p>
|
|
|
|
|
|
Registered configurations:
|
|
Registered configurations:
|
|
-- `MiniGrid-GoToDoor-5x5-v0`
|
|
|
|
-- `MiniGrid-GoToDoor-6x6-v0`
|
|
|
|
-- `MiniGrid-GoToDoor-8x8-v0`
|
|
|
|
|
|
+- `MiniGrid-Fetch-5x5-N2-v0`
|
|
|
|
+- `MiniGrid-Fetch-6x6-N2-v0`
|
|
|
|
+- `MiniGrid-Fetch-8x8-N3-v0`
|
|
|
|
|
|
-<p align="center">
|
|
|
|
-<img src="/figures/gotodoor-6x6.png" width=400>
|
|
|
|
-</p>
|
|
|
|
|
|
+### Go-to-door environment
|
|
|
|
|
|
This environment is a room with four doors, one on each wall. The agent
|
|
This environment is a room with four doors, one on each wall. The agent
|
|
receives a textual (mission) string as input, telling it which door to go to,
|
|
receives a textual (mission) string as input, telling it which door to go to,
|
|
(eg: "go to the red door"). It receives a positive reward for performing the
|
|
(eg: "go to the red door"). It receives a positive reward for performing the
|
|
`done` action next to the correct door, as indicated in the mission string.
|
|
`done` action next to the correct door, as indicated in the mission string.
|
|
|
|
|
|
-### Put-near environment
|
|
|
|
|
|
+<p align="center">
|
|
|
|
+ <img src="figures/gotodoor-6x6.png" width=400 alt="Figure of the go-to-door environment">
|
|
|
|
+</p>
|
|
|
|
|
|
Registered configurations:
|
|
Registered configurations:
|
|
-- `MiniGrid-PutNear-6x6-N2-v0`
|
|
|
|
-- `MiniGrid-PutNear-8x8-N3-v0`
|
|
|
|
|
|
+- `MiniGrid-GoToDoor-5x5-v0`
|
|
|
|
+- `MiniGrid-GoToDoor-6x6-v0`
|
|
|
|
+- `MiniGrid-GoToDoor-8x8-v0`
|
|
|
|
+
|
|
|
|
+### Put-near environment
|
|
|
|
|
|
The agent is instructed through a textual string to pick up an object and
|
|
The agent is instructed through a textual string to pick up an object and
|
|
place it next to another object. This environment is easy to solve with two
|
|
place it next to another object. This environment is easy to solve with two
|
|
objects, but difficult to solve with more, as it involves both textual
|
|
objects, but difficult to solve with more, as it involves both textual
|
|
understanding and spatial reasoning involving multiple objects.
|
|
understanding and spatial reasoning involving multiple objects.
|
|
|
|
|
|
-### Red and blue doors environment
|
|
|
|
-
|
|
|
|
Registered configurations:
|
|
Registered configurations:
|
|
-- `MiniGrid-RedBlueDoors-6x6-v0`
|
|
|
|
-- `MiniGrid-RedBlueDoors-8x8-v0`
|
|
|
|
|
|
+- `MiniGrid-PutNear-6x6-N2-v0`
|
|
|
|
+- `MiniGrid-PutNear-8x8-N3-v0`
|
|
|
|
+
|
|
|
|
+### Red and blue doors environment
|
|
|
|
|
|
The agent is randomly placed within a room with one red and one blue door
|
|
The agent is randomly placed within a room with one red and one blue door
|
|
facing opposite directions. The agent has to open the red door and then open
|
|
facing opposite directions. The agent has to open the red door and then open
|
|
the blue door, in that order. Note that, surprisingly, this environment is
|
|
the blue door, in that order. Note that, surprisingly, this environment is
|
|
solvable without memory.
|
|
solvable without memory.
|
|
|
|
|
|
-### Memory environment
|
|
|
|
-
|
|
|
|
Registered configurations:
|
|
Registered configurations:
|
|
-- `MiniGrid-MemoryS17Random-v0`
|
|
|
|
-- `MiniGrid-MemoryS13Random-v0`
|
|
|
|
-- `MiniGrid-MemoryS13-v0`
|
|
|
|
-- `MiniGrid-MemoryS11-v0`
|
|
|
|
|
|
+- `MiniGrid-RedBlueDoors-6x6-v0`
|
|
|
|
+- `MiniGrid-RedBlueDoors-8x8-v0`
|
|
|
|
+
|
|
|
|
+### Memory environment
|
|
|
|
|
|
This environment is a memory test. The agent starts in a small room
|
|
This environment is a memory test. The agent starts in a small room
|
|
where it sees an object. It then has to go through a narrow hallway
|
|
where it sees an object. It then has to go through a narrow hallway
|
|
@@ -323,10 +318,13 @@ one of which is the same as the object in the starting room. The
|
|
agent has to remember the initial object, and go to the matching
|
|
agent has to remember the initial object, and go to the matching
|
|
object at split.
|
|
object at split.
|
|
|
|
|
|
-### Locked room environment
|
|
|
|
-
|
|
|
|
Registered configurations:
|
|
Registered configurations:
|
|
-- `MiniGrid-LockedRoom-v0`
|
|
|
|
|
|
+- `MiniGrid-MemoryS17Random-v0`
|
|
|
|
+- `MiniGrid-MemoryS13Random-v0`
|
|
|
|
+- `MiniGrid-MemoryS13-v0`
|
|
|
|
+- `MiniGrid-MemoryS11-v0`
|
|
|
|
+
|
|
|
|
+### Locked room environment
|
|
|
|
|
|
The environment has six rooms, one of which is locked. The agent receives
|
|
The environment has six rooms, one of which is locked. The agent receives
|
|
a textual mission string as input, telling it which room to go to in order
|
|
a textual mission string as input, telling it which room to go to in order
|
|
@@ -334,24 +332,10 @@ to get the key that opens the locked room. It then has to go into the locked
|
|
room in order to reach the final goal. This environment is extremely difficult
|
|
room in order to reach the final goal. This environment is extremely difficult
|
|
to solve with vanilla reinforcement learning alone.
|
|
to solve with vanilla reinforcement learning alone.
|
|
|
|
|
|
-### Key corridor environment
|
|
|
|
-
|
|
|
|
Registered configurations:
|
|
Registered configurations:
|
|
-- `MiniGrid-KeyCorridorS3R1-v0`
|
|
|
|
-- `MiniGrid-KeyCorridorS3R2-v0`
|
|
|
|
-- `MiniGrid-KeyCorridorS3R3-v0`
|
|
|
|
-- `MiniGrid-KeyCorridorS4R3-v0`
|
|
|
|
-- `MiniGrid-KeyCorridorS5R3-v0`
|
|
|
|
-- `MiniGrid-KeyCorridorS6R3-v0`
|
|
|
|
|
|
+- `MiniGrid-LockedRoom-v0`
|
|
|
|
|
|
-<p align="center">
|
|
|
|
- <img src="figures/KeyCorridorS3R1.png" width="250">
|
|
|
|
- <img src="figures/KeyCorridorS3R2.png" width="250">
|
|
|
|
- <img src="figures/KeyCorridorS3R3.png" width="250">
|
|
|
|
- <img src="figures/KeyCorridorS4R3.png" width="250">
|
|
|
|
- <img src="figures/KeyCorridorS5R3.png" width="250">
|
|
|
|
- <img src="figures/KeyCorridorS6R3.png" width="250">
|
|
|
|
-</p>
|
|
|
|
|
|
+### Key corridor environment
|
|
|
|
|
|
This environment is similar to the locked room environment, but there are
|
|
This environment is similar to the locked room environment, but there are
|
|
multiple registered environment configurations of increasing size,
|
|
multiple registered environment configurations of increasing size,
|
|
@@ -361,38 +345,48 @@ hidden in another room, and the agent has to explore the environment to find
|
|
it. The mission string does not give the agent any clues as to where the
|
|
it. The mission string does not give the agent any clues as to where the
|
|
key is placed. This environment can be solved without relying on language.
|
|
key is placed. This environment can be solved without relying on language.
|
|
|
|
|
|
-### Unlock environment
|
|
|
|
|
|
+<p align="center">
|
|
|
|
+ <img src="figures/KeyCorridorS3R1.png" width=250 alt="Figure of the Key Corridor for config S3R1">
|
|
|
|
+ <img src="figures/KeyCorridorS3R2.png" width=250 alt="Figure of the Key Corridor for config S3R2">
|
|
|
|
+ <img src="figures/KeyCorridorS3R3.png" width=250 alt="Figure of the Key Corridor for config S3R3">
|
|
|
|
+ <img src="figures/KeyCorridorS4R3.png" width=250 alt="Figure of the Key Corridor for config S4R3">
|
|
|
|
+ <img src="figures/KeyCorridorS5R3.png" width=250 alt="Figure of the Key Corridor for config S5R3">
|
|
|
|
+ <img src="figures/KeyCorridorS6R3.png" width=250 alt="Figure of the Key Corridor for config S6R3">
|
|
|
|
+</p>
|
|
|
|
|
|
Registered configurations:
|
|
Registered configurations:
|
|
-- `MiniGrid-Unlock-v0`
|
|
|
|
|
|
+- `MiniGrid-KeyCorridorS3R1-v0`
|
|
|
|
+- `MiniGrid-KeyCorridorS3R2-v0`
|
|
|
|
+- `MiniGrid-KeyCorridorS3R3-v0`
|
|
|
|
+- `MiniGrid-KeyCorridorS4R3-v0`
|
|
|
|
+- `MiniGrid-KeyCorridorS5R3-v0`
|
|
|
|
+- `MiniGrid-KeyCorridorS6R3-v0`
|
|
|
|
|
|
-<p align="center">
|
|
|
|
- <img src="figures/Unlock.png" width="200">
|
|
|
|
-</p>
|
|
|
|
|
|
+### Unlock environment
|
|
|
|
|
|
The agent has to open a locked door. This environment can be solved without
|
|
The agent has to open a locked door. This environment can be solved without
|
|
relying on language.
|
|
relying on language.
|
|
|
|
|
|
-### Unlock pickup environment
|
|
|
|
|
|
+<p align="center">
|
|
|
|
+ <img src="figures/Unlock.png" width=200 alt="Figure of the unlock environment">
|
|
|
|
+</p>
|
|
|
|
|
|
Registered configurations:
|
|
Registered configurations:
|
|
-- `MiniGrid-UnlockPickup-v0`
|
|
|
|
|
|
+- `MiniGrid-Unlock-v0`
|
|
|
|
|
|
-<p align="center">
|
|
|
|
- <img src="figures/UnlockPickup.png" width="250">
|
|
|
|
-</p>
|
|
|
|
|
|
+### Unlock pickup environment
|
|
|
|
|
|
The agent has to pick up a box which is placed in another room, behind a
|
|
The agent has to pick up a box which is placed in another room, behind a
|
|
locked door. This environment can be solved without relying on language.
|
|
locked door. This environment can be solved without relying on language.
|
|
|
|
|
|
-### Blocked unlock pickup environment
|
|
|
|
|
|
+<p align="center">
|
|
|
|
+ <img src="figures/UnlockPickup.png" width=250 alt="Figure of the unlock pickup environment">
|
|
|
|
+</p>
|
|
|
|
|
|
Registered configurations:
|
|
Registered configurations:
|
|
-- `MiniGrid-BlockedUnlockPickup-v0`
|
|
|
|
|
|
+- `MiniGrid-UnlockPickup-v0`
|
|
|
|
|
|
-<p align="center">
|
|
|
|
- <img src="figures/BlockedUnlockPickup.png" width="250">
|
|
|
|
-</p>
|
|
|
|
|
|
+### Blocked unlock pickup environment
|
|
|
|
|
|
The agent has to pick up a box which is placed in another room, behind a
|
|
The agent has to pick up a box which is placed in another room, behind a
|
|
locked door. The door is also blocked by a ball which the agent has to move
|
|
locked door. The door is also blocked by a ball which the agent has to move
|
|
@@ -400,18 +394,18 @@ before it can unlock the door. Hence, the agent has to learn to move the ball,
|
|
pick up the key, open the door and pick up the object in the other room.
|
|
pick up the key, open the door and pick up the object in the other room.
|
|
This environment can be solved without relying on language.
|
|
This environment can be solved without relying on language.
|
|
|
|
|
|
-## Obstructed maze environment
|
|
|
|
|
|
+<p align="center">
|
|
|
|
+ <img src="figures/BlockedUnlockPickup.png" width=250 alt="Figure of the blocked-unlock-pickup environment">
|
|
|
|
+</p>
|
|
|
|
|
|
Registered configurations:
|
|
Registered configurations:
|
|
-- `MiniGrid-ObstructedMaze-1Dl-v0`
|
|
|
|
-- `MiniGrid-ObstructedMaze-1Dlh-v0`
|
|
|
|
-- `MiniGrid-ObstructedMaze-1Dlhb-v0`
|
|
|
|
-- `MiniGrid-ObstructedMaze-2Dl-v0`
|
|
|
|
-- `MiniGrid-ObstructedMaze-2Dlh-v0`
|
|
|
|
-- `MiniGrid-ObstructedMaze-2Dlhb-v0`
|
|
|
|
-- `MiniGrid-ObstructedMaze-1Q-v0`
|
|
|
|
-- `MiniGrid-ObstructedMaze-2Q-v0`
|
|
|
|
-- `MiniGrid-ObstructedMaze-Full-v0`
|
|
|
|
|
|
+- `MiniGrid-BlockedUnlockPickup-v0`
|
|
|
|
+
|
|
|
|
+## Obstructed maze environment
|
|
|
|
+
|
|
|
|
+The agent has to pick up a box which is placed in a corner of a 3x3 maze.
|
|
|
|
+The doors are locked, the keys are hidden in boxes and doors are obstructed
|
|
|
|
+by balls. This environment can be solved without relying on language.
|
|
|
|
|
|
<p align="center">
|
|
<p align="center">
|
|
<img src="figures/ObstructedMaze-1Dl.png" width="250">
|
|
<img src="figures/ObstructedMaze-1Dl.png" width="250">
|
|
@@ -425,57 +419,51 @@ Registered configurations:
|
|
<img src="figures/ObstructedMaze-4Q.png" width="250">
|
|
<img src="figures/ObstructedMaze-4Q.png" width="250">
|
|
</p>
|
|
</p>
|
|
|
|
|
|
-The agent has to pick up a box which is placed in a corner of a 3x3 maze.
|
|
|
|
-The doors are locked, the keys are hidden in boxes and doors are obstructed
|
|
|
|
-by balls. This environment can be solved without relying on language.
|
|
|
|
|
|
+Registered configurations:
|
|
|
|
+- `MiniGrid-ObstructedMaze-1Dl-v0`
|
|
|
|
+- `MiniGrid-ObstructedMaze-1Dlh-v0`
|
|
|
|
+- `MiniGrid-ObstructedMaze-1Dlhb-v0`
|
|
|
|
+- `MiniGrid-ObstructedMaze-2Dl-v0`
|
|
|
|
+- `MiniGrid-ObstructedMaze-2Dlh-v0`
|
|
|
|
+- `MiniGrid-ObstructedMaze-2Dlhb-v0`
|
|
|
|
+- `MiniGrid-ObstructedMaze-1Q-v0`
|
|
|
|
+- `MiniGrid-ObstructedMaze-2Q-v0`
|
|
|
|
+- `MiniGrid-ObstructedMaze-Full-v0`
|
|
|
|
|
|
## Distributional shift environment
|
|
## Distributional shift environment
|
|
|
|
|
|
-Registered configurations:
|
|
|
|
-- `MiniGrid-DistShift1-v0`
|
|
|
|
-- `MiniGrid-DistShift2-v0`
|
|
|
|
-
|
|
|
|
This environment is based on one of the DeepMind [AI safety gridworlds](https://github.com/deepmind/ai-safety-gridworlds).
|
|
This environment is based on one of the DeepMind [AI safety gridworlds](https://github.com/deepmind/ai-safety-gridworlds).
|
|
The agent starts in the top-left corner and must reach the goal which is in the top-right corner, but has to avoid stepping
|
|
The agent starts in the top-left corner and must reach the goal which is in the top-right corner, but has to avoid stepping
|
|
into lava on its way. The aim of this environment is to test an agent's ability to generalize. There are two slightly
|
|
into lava on its way. The aim of this environment is to test an agent's ability to generalize. There are two slightly
|
|
different variants of the environment, so that the agent can be trained on one variant and tested on the other.
|
|
different variants of the environment, so that the agent can be trained on one variant and tested on the other.
|
|
|
|
|
|
<p align="center">
|
|
<p align="center">
|
|
- <img src="figures/DistShift1.png" width="200">
|
|
|
|
- <img src="figures/DistShift2.png" width="200">
|
|
|
|
|
|
+ <img src="figures/DistShift1.png" width=200 alt="Figure of the DistShift1 environment">
|
|
|
|
+ <img src="figures/DistShift2.png" width=200 alt="Figure of the DistShift2 environment">
|
|
</p>
|
|
</p>
|
|
|
|
|
|
-## Lava gap environment
|
|
|
|
-
|
|
|
|
Registered configurations:
|
|
Registered configurations:
|
|
-- `MiniGrid-LavaGapS5-v0`
|
|
|
|
-- `MiniGrid-LavaGapS6-v0`
|
|
|
|
-- `MiniGrid-LavaGapS7-v0`
|
|
|
|
|
|
+- `MiniGrid-DistShift1-v0`
|
|
|
|
+- `MiniGrid-DistShift2-v0`
|
|
|
|
|
|
-<p align="center">
|
|
|
|
- <img src="figures/LavaGapS6.png" width="200">
|
|
|
|
-</p>
|
|
|
|
|
|
+## Lava gap environment
|
|
|
|
|
|
The agent has to reach the green goal square at the opposite corner of the room,
|
|
The agent has to reach the green goal square at the opposite corner of the room,
|
|
and must pass through a narrow gap in a vertical strip of deadly lava. Touching
|
|
and must pass through a narrow gap in a vertical strip of deadly lava. Touching
|
|
the lava terminate the episode with a zero reward. This environment is useful
|
|
the lava terminate the episode with a zero reward. This environment is useful
|
|
for studying safety and safe exploration.
|
|
for studying safety and safe exploration.
|
|
|
|
|
|
-## Lava crossing environment
|
|
|
|
-
|
|
|
|
Registered configurations:
|
|
Registered configurations:
|
|
-- `MiniGrid-LavaCrossingS9N1-v0`
|
|
|
|
-- `MiniGrid-LavaCrossingS9N2-v0`
|
|
|
|
-- `MiniGrid-LavaCrossingS9N3-v0`
|
|
|
|
-- `MiniGrid-LavaCrossingS11N5-v0`
|
|
|
|
|
|
+- `MiniGrid-LavaGapS5-v0`
|
|
|
|
+- `MiniGrid-LavaGapS6-v0`
|
|
|
|
+- `MiniGrid-LavaGapS7-v0`
|
|
|
|
|
|
<p align="center">
|
|
<p align="center">
|
|
- <img src="figures/LavaCrossingS9N1.png" width="200">
|
|
|
|
- <img src="figures/LavaCrossingS9N2.png" width="200">
|
|
|
|
- <img src="figures/LavaCrossingS9N3.png" width="200">
|
|
|
|
- <img src="figures/LavaCrossingS11N5.png" width="250">
|
|
|
|
|
|
+ <img src="figures/LavaGapS6.png" width=200 alt="Figure of the LavaGap environment">
|
|
</p>
|
|
</p>
|
|
|
|
|
|
|
|
+## Lava crossing environment
|
|
|
|
+
|
|
The agent has to reach the green goal square on the other corner of the room
|
|
The agent has to reach the green goal square on the other corner of the room
|
|
while avoiding rivers of deadly lava which terminate the episode in failure.
|
|
while avoiding rivers of deadly lava which terminate the episode in failure.
|
|
Each lava stream runs across the room either horizontally or vertically, and
|
|
Each lava stream runs across the room either horizontally or vertically, and
|
|
@@ -483,27 +471,49 @@ has a single crossing point which can be safely used; Luckily, a path to the
|
|
goal is guaranteed to exist. This environment is useful for studying safety and
|
|
goal is guaranteed to exist. This environment is useful for studying safety and
|
|
safe exploration.
|
|
safe exploration.
|
|
|
|
|
|
|
|
+<p align="center">
|
|
|
|
+ <img src="figures/LavaCrossingS9N1.png" width=200 alt="Figure of the LavaCrossingS9N1 environment">
|
|
|
|
+ <img src="figures/LavaCrossingS9N2.png" width=200 alt="Figure of the LavaCrossingS9N2 environment">
|
|
|
|
+ <img src="figures/LavaCrossingS9N3.png" width=200 alt="Figure of the LavaCrossingS9N3 environment">
|
|
|
|
+ <img src="figures/LavaCrossingS11N5.png" width=250 alt="Figure of the LavaCrossingS11N5 environment">
|
|
|
|
+</p>
|
|
|
|
+
|
|
|
|
+Registered configurations:
|
|
|
|
+- `MiniGrid-LavaCrossingS9N1-v0`
|
|
|
|
+- `MiniGrid-LavaCrossingS9N2-v0`
|
|
|
|
+- `MiniGrid-LavaCrossingS9N3-v0`
|
|
|
|
+- `MiniGrid-LavaCrossingS11N5-v0`
|
|
|
|
+
|
|
## Simple crossing environment
|
|
## Simple crossing environment
|
|
|
|
|
|
|
|
+Similar to the `LavaCrossing` environment, the agent has to reach the green
|
|
|
|
+goal square on the other corner of the room, however lava is replaced by
|
|
|
|
+walls. This MDP is therefore much easier and maybe useful for quickly
|
|
|
|
+testing your algorithms.
|
|
|
|
+
|
|
|
|
+<p align="center">
|
|
|
|
+ <img src="figures/SimpleCrossingS9N1.png" width=200 alt="Figure of the SimpleCrossingS9N1 environment">
|
|
|
|
+ <img src="figures/SimpleCrossingS9N2.png" width=200 alt="Figure of the SimpleCrossingS9N2 environment">
|
|
|
|
+ <img src="figures/SimpleCrossingS9N3.png" width=200 alt="Figure of the SimpleCrossingS9N3 environment">
|
|
|
|
+ <img src="figures/SimpleCrossingS11N5.png" width=250 alt="Figure of the SimpleCrossingS11N5 environment">
|
|
|
|
+</p>
|
|
|
|
+
|
|
Registered configurations:
|
|
Registered configurations:
|
|
- `MiniGrid-SimpleCrossingS9N1-v0`
|
|
- `MiniGrid-SimpleCrossingS9N1-v0`
|
|
- `MiniGrid-SimpleCrossingS9N2-v0`
|
|
- `MiniGrid-SimpleCrossingS9N2-v0`
|
|
- `MiniGrid-SimpleCrossingS9N3-v0`
|
|
- `MiniGrid-SimpleCrossingS9N3-v0`
|
|
- `MiniGrid-SimpleCrossingS11N5-v0`
|
|
- `MiniGrid-SimpleCrossingS11N5-v0`
|
|
|
|
|
|
-<p align="center">
|
|
|
|
- <img src="figures/SimpleCrossingS9N1.png" width="200">
|
|
|
|
- <img src="figures/SimpleCrossingS9N2.png" width="200">
|
|
|
|
- <img src="figures/SimpleCrossingS9N3.png" width="200">
|
|
|
|
- <img src="figures/SimpleCrossingS11N5.png" width="250">
|
|
|
|
-</p>
|
|
|
|
|
|
+### Dynamic obstacles environment
|
|
|
|
|
|
-Similar to the `LavaCrossing` environment, the agent has to reach the green
|
|
|
|
-goal square on the other corner of the room, however lava is replaced by
|
|
|
|
-walls. This MDP is therefore much easier and and maybe useful for quickly
|
|
|
|
-testing your algorithms.
|
|
|
|
|
|
+This environment is an empty room with moving obstacles.
|
|
|
|
+The goal of the agent is to reach the green goal square without colliding with any obstacle.
|
|
|
|
+A large penalty is subtracted if the agent collides with an obstacle and the episode finishes.
|
|
|
|
+This environment is useful to test Dynamic Obstacle Avoidance for mobile robots with Reinforcement Learning in Partial Observability.
|
|
|
|
|
|
-### Dynamic obstacles environment
|
|
|
|
|
|
+<p align="center">
|
|
|
|
+ <img src="figures/dynamic_obstacles.gif" alt="GIF of the Dynamic Obstacles environment">
|
|
|
|
+</p>
|
|
|
|
|
|
Registered configurations:
|
|
Registered configurations:
|
|
- `MiniGrid-Dynamic-Obstacles-5x5-v0`
|
|
- `MiniGrid-Dynamic-Obstacles-5x5-v0`
|
|
@@ -512,9 +522,3 @@ Registered configurations:
|
|
- `MiniGrid-Dynamic-Obstacles-Random-6x6-v0`
|
|
- `MiniGrid-Dynamic-Obstacles-Random-6x6-v0`
|
|
- `MiniGrid-Dynamic-Obstacles-8x8-v0`
|
|
- `MiniGrid-Dynamic-Obstacles-8x8-v0`
|
|
- `MiniGrid-Dynamic-Obstacles-16x16-v0`
|
|
- `MiniGrid-Dynamic-Obstacles-16x16-v0`
|
|
-
|
|
|
|
-<p align="center">
|
|
|
|
-<img src="/figures/dynamic_obstacles.gif">
|
|
|
|
-</p>
|
|
|
|
-
|
|
|
|
-This environment is an empty room with moving obstacles. The goal of the agent is to reach the green goal square without colliding with any obstacle. A large penalty is subtracted if the agent collides with an obstacle and the episode finishes. This environment is useful to test Dynamic Obstacle Avoidance for mobile robots with Reinforcement Learning in Partial Observability.
|
|
|