瀏覽代碼

Merge branch 'master' into newstep

saleml 2 年之前
父節點
當前提交
b1f0799f59

+ 59 - 0
.github/workflows/build-publish.yml

@@ -0,0 +1,59 @@
+# This workflow will build and (if release) publish Python distributions to PyPI
+# For more information see:
+#   - https://help.github.com/actions/language-and-framework-guides/using-python-with-github-actions
+#   - https://packaging.python.org/en/latest/guides/publishing-package-distribution-releases-using-github-actions-ci-cd-workflows/
+#
+
+---
+name: build-publish
+
+on:
+  push:
+    branches: [master]
+  pull_request:
+    branches: [master]
+  release:
+    types: [published]
+
+jobs:
+  build-wheels:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v3
+      - name: Set up Python
+        uses: actions/setup-python@v4
+        with:
+          python-version: '3.x'
+      - name: Install pypa/build
+        run: >-
+          python -m
+          pip install -U
+          build
+      - name: Build a binary wheel and a source tarball
+        run: >-
+          python -m
+          build
+          --sdist
+          --wheel
+          --outdir dist/
+          .
+      - name: Store wheels
+        uses: actions/upload-artifact@v3
+        with:
+          path: dist
+
+  publish:
+    runs-on: ubuntu-latest
+    needs:
+      - build-wheels
+    if: github.event_name == 'release' && github.event.action == 'published'
+    steps:
+      - name: Download dists
+        uses: actions/download-artifact@v3
+        with:
+          name: artifact
+          path: dist
+      - name: Publish
+        uses: pypa/gh-action-pypi-publish@release/v1
+        with:
+          password: ${{ secrets.PYPI_API_TOKEN }}

+ 4 - 6
.github/workflows/build.yml

@@ -6,14 +6,12 @@ jobs:
     runs-on: ubuntu-latest
     strategy:
       matrix:
-        python-version: ['3.6', '3.7', '3.8', '3.9', '3.10']
+        python-version: ['3.7', '3.8', '3.9', '3.10']
     steps:
       - uses: actions/checkout@v2
       - run: |
            docker build -f py.Dockerfile \
              --build-arg PYTHON_VERSION=${{ matrix.python-version }} \
-             --tag gym-minigrid-docker .
-      
-      # TODO: Add and fix tests for pytest
-      # - name: Run tests
-      #   run: docker run gym-docker pytest
+             --tag gym-minigrid-docker .      
+      - name: Run tests
+        run: docker run gym-minigrid-docker pytest

+ 174 - 170
README.md

@@ -1,6 +1,7 @@
 # Minimalistic Gridworld Environment (MiniGrid)
 
-[![Build Status](https://travis-ci.org/maximecb/gym-minigrid.svg?branch=master)](https://travis-ci.org/maximecb/gym-minigrid)
+[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white)](https://pre-commit.com/) 
+[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
 
 There are other gridworld Gym environments out there, but this one is
 designed to be particularly simple, lightweight and fast. The code has very few
@@ -10,10 +11,10 @@ laptop, which means you can run your experiments faster. A known-working RL
 implementation can be found [in this repository](https://github.com/lcswillems/torch-rl).
 
 Requirements:
-- Python 3.7+
-- OpenAI Gym 0.25
-- NumPy
-- Matplotlib (optional, only needed for display)
+- Python 3.7 to 3.10
+- OpenAI Gym v0.22 to v0.25
+- NumPy 1.18+
+- Matplotlib (optional, only needed for display) - 3.0+
 
 Please use this bibtex if you want to cite this repository in your publications:
 
@@ -29,6 +30,8 @@ Please use this bibtex if you want to cite this repository in your publications:
 ```
 
 List of publications & submissions using MiniGrid or BabyAI (please open a pull request to add missing entries):
+- [Leveraging Approximate Symbolic Models for Reinforcement Learning via Skill Diversity](https://arxiv.org/abs/2202.02886) (Arizona State University, ICML 2022)
+- [How to Stay Curious while avoiding Noisy TVs using Aleatoric Uncertainty Estimation](https://proceedings.mlr.press/v162/mavor-parker22a.html) (University College London, Boston University, ICML 2022)
 - [In a Nutshell, the Human Asked for This: Latent Goals for Following Temporal Specifications](https://openreview.net/pdf?id=rUwm9wCjURV) (Imperial College London, ICLR 2022)
 - [Interesting Object, Curious Agent: Learning Task-Agnostic Exploration](https://arxiv.org/abs/2111.13119) (Meta AI Research, NeurIPS 2021)
 - [Safe Policy Optimization with Local Generalized Linear Function Approximations](https://arxiv.org/abs/2111.04894) (IBM Research, Tsinghua University, NeurIPS 2021)
@@ -46,16 +49,13 @@ List of publications & submissions using MiniGrid or BabyAI (please open a pull
 - [RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments](https://openreview.net/forum?id=rkg-TJBFPB) (FAIR, ICLR 2020)
 - [Learning to Request Guidance in Emergent Communication](https://arxiv.org/pdf/1912.05525.pdf) (University of Amsterdam, Dec 2019)
 - [Working Memory Graphs](https://arxiv.org/abs/1911.07141) (MSR, Nov 2019)
-- [Fast Task-Adaptation for Tasks Labeled Using
-Natural Language in Reinforcement Learning](https://arxiv.org/pdf/1910.04040.pdf) (Oct 2019, University of Antwerp)
-- [Generalization in Reinforcement Learning with Selective Noise Injection and Information Bottleneck
-](https://arxiv.org/abs/1910.12911) (MSR, NeurIPS, Oct 2019)
+- [Fast Task-Adaptation for Tasks Labeled Using Natural Language in Reinforcement Learning](https://arxiv.org/pdf/1910.04040.pdf) (Oct 2019, University of Antwerp)
+- [Generalization in Reinforcement Learning with Selective Noise Injection and Information Bottleneck](https://arxiv.org/abs/1910.12911) (MSR, NeurIPS, Oct 2019)
 - [Recurrent Independent Mechanisms](https://arxiv.org/pdf/1909.10893.pdf) (Mila, Sept 2019) 
 - [Learning Effective Subgoals with Multi-Task Hierarchical Reinforcement Learning](http://surl.tirl.info/proceedings/SURL-2019_paper_10.pdf) (Tsinghua University, August 2019)
 - [Mastering emergent language: learning to guide in simulated navigation](https://arxiv.org/abs/1908.05135) (University of Amsterdam, Aug 2019)
 - [Transfer Learning by Modeling a Distribution over Policies](https://arxiv.org/abs/1906.03574) (Mila, June 2019)
-- [Reinforcement Learning with Competitive Ensembles
-of Information-Constrained Primitives](https://arxiv.org/abs/1906.10667) (Mila, June 2019)
+- [Reinforcement Learning with Competitive Ensembles of Information-Constrained Primitives](https://arxiv.org/abs/1906.10667) (Mila, June 2019)
 - [Learning distant cause and effect using only local and immediate credit assignment](https://arxiv.org/abs/1905.11589) (Incubator 491, May 2019)
 - [Practical Open-Loop Optimistic Planning](https://arxiv.org/abs/1904.04700) (INRIA, April 2019)
 - [Learning World Graphs to Accelerate Hierarchical Reinforcement Learning](https://arxiv.org/abs/1907.00664) (Salesforce Research, 2019)
@@ -93,18 +93,19 @@ pip3 install -e .
 There is a UI application which allows you to manually control the agent with the arrow keys:
 
 ```
-./manual_control.py
+./gym-minigrid/manual_control.py
 ```
 
 The environment being run can be selected with the `--env` option, eg:
 
 ```
-./manual_control.py --env MiniGrid-Empty-8x8-v0
+./gym-minigrid/manual_control.py --env MiniGrid-Empty-8x8-v0
 ```
 
 ## Reinforcement Learning
 
-If you want to train an agent with reinforcement learning, I recommend using the code found in the [torch-rl](https://github.com/lcswillems/torch-rl) repository. This code has been tested and is known to work with this environment. The default hyper-parameters are also known to converge.
+If you want to train an agent with reinforcement learning, I recommend using the code found in the [torch-rl](https://github.com/lcswillems/torch-rl) repository. 
+This code has been tested and is known to work with this environment. The default hyper-parameters are also known to converge.
 
 A sample training command is:
 
@@ -123,9 +124,9 @@ field which can be used as an optional compass. Using dictionaries makes it
 easy for you to add additional information to observations
 if you need to, without having to encode everything into a single tensor.
 
-There are a variery of wrappers to change the observation format available in [gym_minigrid/wrappers.py](/gym_minigrid/wrappers.py). If your RL code expects one single tensor for observations, take a look at
-`FlatObsWrapper`. There is also an `ImgObsWrapper` that gets rid of the 'mission' field in observations,
-leaving only the image field tensor.
+There are a variety of wrappers to change the observation format available in [gym_minigrid/wrappers.py](/gym_minigrid/wrappers.py). 
+If your RL code expects one single tensor for observations, take a look at `FlatObsWrapper`. 
+There is also an `ImgObsWrapper` that gets rid of the 'mission' field in observations, leaving only the image field tensor.
 
 Please note that the default observation format is a partially observable view of the environment using a
 compact and efficient encoding, with 3 input values per visible grid cell, 7x7x3 values total.
@@ -164,9 +165,9 @@ Actions in the basic environment:
 - Done (task completed, optional)
 
 Default tile/observation encoding:
-- Each tile is encoded as a 3 dimensional tuple: (OBJECT_IDX, COLOR_IDX, STATE) 
-- OBJECT_TO_IDX and COLOR_TO_IDX mapping can be found in [gym_minigrid/minigrid.py](gym_minigrid/minigrid.py)
-- e.g. door STATE -> 0: open, 1: closed, 2: locked
+- Each tile is encoded as a 3 dimensional tuple: `(OBJECT_IDX, COLOR_IDX, STATE)` 
+- `OBJECT_TO_IDX` and `COLOR_TO_IDX` mapping can be found in [gym_minigrid/minigrid.py](gym_minigrid/minigrid.py)
+- `STATE` refers to the door state with 0=open, 1=closed and 2=locked
 
 By default, sparse rewards are given for reaching a green goal tile. A
 reward of 1 is given for success, and zero for failure. There is also an
@@ -185,18 +186,6 @@ or to fine-tune difficulty.
 
 ### Empty environment
 
-Registered configurations:
-- `MiniGrid-Empty-5x5-v0`
-- `MiniGrid-Empty-Random-5x5-v0`
-- `MiniGrid-Empty-6x6-v0`
-- `MiniGrid-Empty-Random-6x6-v0`
-- `MiniGrid-Empty-8x8-v0`
-- `MiniGrid-Empty-16x16-v0`
-
-<p align="center">
-<img src="/figures/empty-env.png" width=250>
-</p>
-
 This environment is an empty room, and the goal of the agent is to reach the
 green goal square, which provides a sparse reward. A small penalty is
 subtracted for the number of steps to reach the goal. This environment is
@@ -206,115 +195,121 @@ The random variants of the environment have the agent starting at a random
 position for each episode, while the regular variants have the agent always
 starting in the corner opposite to the goal.
 
-### Four rooms environment
-
-Registered configurations:
-- `MiniGrid-FourRooms-v0`
-
 <p align="center">
-<img src="/figures/four-rooms-env.png" width=380>
+    <img src="figures/empty-env.png" width=250 alt="Figure of the empty environment">
 </p>
 
+Registered configurations: 
+- `MiniGrid-Empty-5x5-v0`
+- `MiniGrid-Empty-Random-5x5-v0`
+- `MiniGrid-Empty-6x6-v0`
+- `MiniGrid-Empty-Random-6x6-v0`
+- `MiniGrid-Empty-8x8-v0`
+- `MiniGrid-Empty-16x16-v0`
+
+### Four rooms environment
+
 Classic four room reinforcement learning environment. The agent must navigate
 in a maze composed of four rooms interconnected by 4 gaps in the walls. To
 obtain a reward, the agent must reach the green goal square. Both the agent
 and the goal square are randomly placed in any of the four rooms.
 
-### Door & key environment
-
-Registered configurations:
-- `MiniGrid-DoorKey-5x5-v0`
-- `MiniGrid-DoorKey-6x6-v0`
-- `MiniGrid-DoorKey-8x8-v0`
-- `MiniGrid-DoorKey-16x16-v0`
-
 <p align="center">
-<img src="/figures/door-key-env.png">
+    <img src="figures/four-rooms-env.png" width=380 alt="Figure of the four room environment">
 </p>
 
+Registered configurations: 
+- `MiniGrid-FourRooms-v0`
+
+### Door & key environment
+
 This environment has a key that the agent must pick up in order to unlock
 a goal and then get to the green goal square. This environment is difficult,
 because of the sparse reward, to solve using classical RL algorithms. It is
 useful to experiment with curiosity or curriculum learning.
 
-### Multi-room environment
-
-Registered configurations:
-- `MiniGrid-MultiRoom-N2-S4-v0` (two small rooms)
-- `MiniGrid-MultiRoom-N4-S5-v0` (four rooms)
-- `MiniGrid-MultiRoom-N6-v0` (six rooms)
-
 <p align="center">
-<img src="/figures/multi-room.gif" width=416 height=424>
+    <img src="figures/door-key-env.png" alt="Figure of the door key environment">
 </p>
 
+Registered configurations: 
+- `MiniGrid-DoorKey-5x5-v0`
+- `MiniGrid-DoorKey-6x6-v0`
+- `MiniGrid-DoorKey-8x8-v0`
+- `MiniGrid-DoorKey-16x16-v0`
+
+### Multi-room environment
+
 This environment has a series of connected rooms with doors that must be
 opened in order to get to the next room. The final room has the green goal
 square the agent must get to. This environment is extremely difficult to
 solve using RL alone. However, by gradually increasing the number of
 rooms and building a curriculum, the environment can be solved.
 
-### Fetch environment
+<p align="center">
+    <img src="figures/multi-room.gif" width=416 height=424 alt="Figure of the Multi-room environment">
+</p>
 
 Registered configurations:
-- `MiniGrid-Fetch-5x5-N2-v0`
-- `MiniGrid-Fetch-6x6-N2-v0`
-- `MiniGrid-Fetch-8x8-N3-v0`
+- `MiniGrid-MultiRoom-N2-S4-v0` (two small rooms)
+- `MiniGrid-MultiRoom-N4-S5-v0` (four rooms)
+- `MiniGrid-MultiRoom-N6-v0` (six rooms)
 
-<p align="center">
-<img src="/figures/fetch-env.png" width=450>
-</p>
+### Fetch environment
 
 This environment has multiple objects of assorted types and colors. The
 agent receives a textual string as part of its observation telling it
 which object to pick up. Picking up the wrong object terminates the
 episode with zero reward.
 
-### Go-to-door environment
+<p align="center">
+    <img src="figures/fetch-env.png" width=450 alt="Figure of the fetch environment">
+</p>
 
 Registered configurations:
-- `MiniGrid-GoToDoor-5x5-v0`
-- `MiniGrid-GoToDoor-6x6-v0`
-- `MiniGrid-GoToDoor-8x8-v0`
+- `MiniGrid-Fetch-5x5-N2-v0`
+- `MiniGrid-Fetch-6x6-N2-v0`
+- `MiniGrid-Fetch-8x8-N3-v0`
 
-<p align="center">
-<img src="/figures/gotodoor-6x6.png" width=400>
-</p>
+### Go-to-door environment
 
 This environment is a room with four doors, one on each wall. The agent
 receives a textual (mission) string as input, telling it which door to go to,
 (eg: "go to the red door"). It receives a positive reward for performing the
 `done` action next to the correct door, as indicated in the mission string.
 
-### Put-near environment
+<p align="center">
+    <img src="figures/gotodoor-6x6.png" width=400 alt="Figure of the go-to-door environment">
+</p>
 
 Registered configurations:
-- `MiniGrid-PutNear-6x6-N2-v0`
-- `MiniGrid-PutNear-8x8-N3-v0`
+- `MiniGrid-GoToDoor-5x5-v0`
+- `MiniGrid-GoToDoor-6x6-v0`
+- `MiniGrid-GoToDoor-8x8-v0`
+
+### Put-near environment
 
 The agent is instructed through a textual string to pick up an object and
 place it next to another object. This environment is easy to solve with two
 objects, but difficult to solve with more, as it involves both textual
 understanding and spatial reasoning involving multiple objects.
 
-### Red and blue doors environment
-
 Registered configurations:
-- `MiniGrid-RedBlueDoors-6x6-v0`
-- `MiniGrid-RedBlueDoors-8x8-v0`
+- `MiniGrid-PutNear-6x6-N2-v0`
+- `MiniGrid-PutNear-8x8-N3-v0`
+
+### Red and blue doors environment
 
 The agent is randomly placed within a room with one red and one blue door
 facing opposite directions. The agent has to open the red door and then open
 the blue door, in that order. Note that, surprisingly, this environment is
 solvable without memory.
 
-### Memory environment
-
 Registered configurations:
-- `MiniGrid-MemoryS17Random-v0`
-- `MiniGrid-MemoryS13Random-v0`
-- `MiniGrid-MemoryS13-v0`
-- `MiniGrid-MemoryS11-v0`
+- `MiniGrid-RedBlueDoors-6x6-v0`
+- `MiniGrid-RedBlueDoors-8x8-v0`
+
+### Memory environment
 
 This environment is a memory test. The agent starts in a small room
 where it sees an object. It then has to go through a narrow hallway
@@ -323,10 +318,13 @@ one of which is the same as the object in the starting room. The
 agent has to remember the initial object, and go to the matching
 object at split.
 
-### Locked room environment
-
 Registered configurations:
-- `MiniGrid-LockedRoom-v0`
+- `MiniGrid-MemoryS17Random-v0`
+- `MiniGrid-MemoryS13Random-v0`
+- `MiniGrid-MemoryS13-v0`
+- `MiniGrid-MemoryS11-v0`
+
+### Locked room environment
 
 The environment has six rooms, one of which is locked. The agent receives
 a textual mission string as input, telling it which room to go to in order
@@ -334,24 +332,10 @@ to get the key that opens the locked room. It then has to go into the locked
 room in order to reach the final goal. This environment is extremely difficult
 to solve with vanilla reinforcement learning alone.
 
-### Key corridor environment
-
 Registered configurations:
-- `MiniGrid-KeyCorridorS3R1-v0`
-- `MiniGrid-KeyCorridorS3R2-v0`
-- `MiniGrid-KeyCorridorS3R3-v0`
-- `MiniGrid-KeyCorridorS4R3-v0`
-- `MiniGrid-KeyCorridorS5R3-v0`
-- `MiniGrid-KeyCorridorS6R3-v0`
+- `MiniGrid-LockedRoom-v0`
 
-<p align="center">
-    <img src="figures/KeyCorridorS3R1.png" width="250">
-    <img src="figures/KeyCorridorS3R2.png" width="250">
-    <img src="figures/KeyCorridorS3R3.png" width="250">
-    <img src="figures/KeyCorridorS4R3.png" width="250">
-    <img src="figures/KeyCorridorS5R3.png" width="250">
-    <img src="figures/KeyCorridorS6R3.png" width="250">
-</p>
+### Key corridor environment
 
 This environment is similar to the locked room environment, but there are
 multiple registered environment configurations of increasing size,
@@ -361,38 +345,48 @@ hidden in another room, and the agent has to explore the environment to find
 it. The mission string does not give the agent any clues as to where the
 key is placed. This environment can be solved without relying on language.
 
-### Unlock environment
+<p align="center">
+    <img src="figures/KeyCorridorS3R1.png" width=250 alt="Figure of the Key Corridor for config S3R1">
+    <img src="figures/KeyCorridorS3R2.png" width=250 alt="Figure of the Key Corridor for config S3R2">
+    <img src="figures/KeyCorridorS3R3.png" width=250 alt="Figure of the Key Corridor for config S3R3">
+    <img src="figures/KeyCorridorS4R3.png" width=250 alt="Figure of the Key Corridor for config S4R3">
+    <img src="figures/KeyCorridorS5R3.png" width=250 alt="Figure of the Key Corridor for config S5R3">
+    <img src="figures/KeyCorridorS6R3.png" width=250 alt="Figure of the Key Corridor for config S6R3">
+</p>
 
 Registered configurations:
-- `MiniGrid-Unlock-v0`
+- `MiniGrid-KeyCorridorS3R1-v0`
+- `MiniGrid-KeyCorridorS3R2-v0`
+- `MiniGrid-KeyCorridorS3R3-v0`
+- `MiniGrid-KeyCorridorS4R3-v0`
+- `MiniGrid-KeyCorridorS5R3-v0`
+- `MiniGrid-KeyCorridorS6R3-v0`
 
-<p align="center">
-    <img src="figures/Unlock.png" width="200">
-</p>
+### Unlock environment
 
 The agent has to open a locked door. This environment can be solved without
 relying on language.
 
-### Unlock pickup environment
+<p align="center">
+    <img src="figures/Unlock.png" width=200 alt="Figure of the unlock environment">
+</p>
 
 Registered configurations:
-- `MiniGrid-UnlockPickup-v0`
+- `MiniGrid-Unlock-v0`
 
-<p align="center">
-    <img src="figures/UnlockPickup.png" width="250">
-</p>
+### Unlock pickup environment
 
 The agent has to pick up a box which is placed in another room, behind a
 locked door. This environment can be solved without relying on language.
 
-### Blocked unlock pickup environment
+<p align="center">
+    <img src="figures/UnlockPickup.png" width=250 alt="Figure of the unlock pickup environment">
+</p>
 
 Registered configurations:
-- `MiniGrid-BlockedUnlockPickup-v0`
+- `MiniGrid-UnlockPickup-v0`
 
-<p align="center">
-    <img src="figures/BlockedUnlockPickup.png" width="250">
-</p>
+### Blocked unlock pickup environment
 
 The agent has to pick up a box which is placed in another room, behind a
 locked door. The door is also blocked by a ball which the agent has to move
@@ -400,18 +394,18 @@ before it can unlock the door. Hence, the agent has to learn to move the ball,
 pick up the key, open the door and pick up the object in the other room.
 This environment can be solved without relying on language.
 
-## Obstructed maze environment
+<p align="center">
+    <img src="figures/BlockedUnlockPickup.png" width=250 alt="Figure of the blocked-unlock-pickup environment">
+</p>
 
 Registered configurations:
-- `MiniGrid-ObstructedMaze-1Dl-v0`
-- `MiniGrid-ObstructedMaze-1Dlh-v0`
-- `MiniGrid-ObstructedMaze-1Dlhb-v0`
-- `MiniGrid-ObstructedMaze-2Dl-v0`
-- `MiniGrid-ObstructedMaze-2Dlh-v0`
-- `MiniGrid-ObstructedMaze-2Dlhb-v0`
-- `MiniGrid-ObstructedMaze-1Q-v0`
-- `MiniGrid-ObstructedMaze-2Q-v0`
-- `MiniGrid-ObstructedMaze-Full-v0`
+- `MiniGrid-BlockedUnlockPickup-v0`
+
+## Obstructed maze environment
+
+The agent has to pick up a box which is placed in a corner of a 3x3 maze.
+The doors are locked, the keys are hidden in boxes and doors are obstructed
+by balls. This environment can be solved without relying on language.
 
 <p align="center">
   <img src="figures/ObstructedMaze-1Dl.png" width="250">
@@ -425,57 +419,51 @@ Registered configurations:
   <img src="figures/ObstructedMaze-4Q.png" width="250">
 </p>
 
-The agent has to pick up a box which is placed in a corner of a 3x3 maze.
-The doors are locked, the keys are hidden in boxes and doors are obstructed
-by balls. This environment can be solved without relying on language.
+Registered configurations:
+- `MiniGrid-ObstructedMaze-1Dl-v0`
+- `MiniGrid-ObstructedMaze-1Dlh-v0`
+- `MiniGrid-ObstructedMaze-1Dlhb-v0`
+- `MiniGrid-ObstructedMaze-2Dl-v0`
+- `MiniGrid-ObstructedMaze-2Dlh-v0`
+- `MiniGrid-ObstructedMaze-2Dlhb-v0`
+- `MiniGrid-ObstructedMaze-1Q-v0`
+- `MiniGrid-ObstructedMaze-2Q-v0`
+- `MiniGrid-ObstructedMaze-Full-v0`
 
 ## Distributional shift environment
 
-Registered configurations:
-- `MiniGrid-DistShift1-v0`
-- `MiniGrid-DistShift2-v0`
-
 This environment is based on one of the DeepMind [AI safety gridworlds](https://github.com/deepmind/ai-safety-gridworlds).
 The agent starts in the top-left corner and must reach the goal which is in the top-right corner, but has to avoid stepping
 into lava on its way. The aim of this environment is to test an agent's ability to generalize. There are two slightly
 different variants of the environment, so that the agent can be trained on one variant and tested on the other.
 
 <p align="center">
-  <img src="figures/DistShift1.png" width="200">
-  <img src="figures/DistShift2.png" width="200">
+  <img src="figures/DistShift1.png" width=200 alt="Figure of the DistShift1 environment">
+  <img src="figures/DistShift2.png" width=200 alt="Figure of the DistShift2 environment">
 </p>
 
-## Lava gap environment
-
 Registered configurations:
-- `MiniGrid-LavaGapS5-v0`
-- `MiniGrid-LavaGapS6-v0`
-- `MiniGrid-LavaGapS7-v0`
+- `MiniGrid-DistShift1-v0`
+- `MiniGrid-DistShift2-v0`
 
-<p align="center">
-  <img src="figures/LavaGapS6.png" width="200">
-</p>
+## Lava gap environment
 
 The agent has to reach the green goal square at the opposite corner of the room,
 and must pass through a narrow gap in a vertical strip of deadly lava. Touching
 the lava terminate the episode with a zero reward. This environment is useful
 for studying safety and safe exploration.
 
-## Lava crossing environment
-
 Registered configurations:
-- `MiniGrid-LavaCrossingS9N1-v0`
-- `MiniGrid-LavaCrossingS9N2-v0`
-- `MiniGrid-LavaCrossingS9N3-v0`
-- `MiniGrid-LavaCrossingS11N5-v0`
+- `MiniGrid-LavaGapS5-v0`
+- `MiniGrid-LavaGapS6-v0`
+- `MiniGrid-LavaGapS7-v0`
 
 <p align="center">
-  <img src="figures/LavaCrossingS9N1.png" width="200">
-  <img src="figures/LavaCrossingS9N2.png" width="200">
-  <img src="figures/LavaCrossingS9N3.png" width="200">
-  <img src="figures/LavaCrossingS11N5.png" width="250">
+  <img src="figures/LavaGapS6.png" width=200 alt="Figure of the LavaGap environment">
 </p>
 
+## Lava crossing environment
+
 The agent has to reach the green goal square on the other corner of the room
 while avoiding rivers of deadly lava which terminate the episode in failure.
 Each lava stream runs across the room either horizontally or vertically, and
@@ -483,27 +471,49 @@ has a single crossing point which can be safely used;  Luckily, a path to the
 goal is guaranteed to exist. This environment is useful for studying safety and
 safe exploration.
 
+<p align="center">
+  <img src="figures/LavaCrossingS9N1.png" width=200 alt="Figure of the LavaCrossingS9N1 environment">
+  <img src="figures/LavaCrossingS9N2.png" width=200 alt="Figure of the LavaCrossingS9N2 environment">
+  <img src="figures/LavaCrossingS9N3.png" width=200 alt="Figure of the LavaCrossingS9N3 environment">
+  <img src="figures/LavaCrossingS11N5.png" width=250 alt="Figure of the LavaCrossingS11N5 environment">
+</p>
+
+Registered configurations:
+- `MiniGrid-LavaCrossingS9N1-v0`
+- `MiniGrid-LavaCrossingS9N2-v0`
+- `MiniGrid-LavaCrossingS9N3-v0`
+- `MiniGrid-LavaCrossingS11N5-v0`
+
 ## Simple crossing environment
 
+Similar to the `LavaCrossing` environment, the agent has to reach the green
+goal square on the other corner of the room, however lava is replaced by
+walls. This MDP is therefore much easier and maybe useful for quickly
+testing your algorithms.
+
+<p align="center">
+  <img src="figures/SimpleCrossingS9N1.png" width=200 alt="Figure of the SimpleCrossingS9N1 environment">
+  <img src="figures/SimpleCrossingS9N2.png" width=200 alt="Figure of the SimpleCrossingS9N2 environment">
+  <img src="figures/SimpleCrossingS9N3.png" width=200 alt="Figure of the SimpleCrossingS9N3 environment">
+  <img src="figures/SimpleCrossingS11N5.png" width=250 alt="Figure of the SimpleCrossingS11N5 environment">
+</p>
+
 Registered configurations:
 - `MiniGrid-SimpleCrossingS9N1-v0`
 - `MiniGrid-SimpleCrossingS9N2-v0`
 - `MiniGrid-SimpleCrossingS9N3-v0`
 - `MiniGrid-SimpleCrossingS11N5-v0`
 
-<p align="center">
-  <img src="figures/SimpleCrossingS9N1.png" width="200">
-  <img src="figures/SimpleCrossingS9N2.png" width="200">
-  <img src="figures/SimpleCrossingS9N3.png" width="200">
-  <img src="figures/SimpleCrossingS11N5.png" width="250">
-</p>
+### Dynamic obstacles environment
 
-Similar to the `LavaCrossing` environment, the agent has to reach the green
-goal square on the other corner of the room, however lava is replaced by
-walls. This MDP is therefore much easier and and maybe useful for quickly
-testing your algorithms.
+This environment is an empty room with moving obstacles. 
+The goal of the agent is to reach the green goal square without colliding with any obstacle. 
+A large penalty is subtracted if the agent collides with an obstacle and the episode finishes. 
+This environment is useful to test Dynamic Obstacle Avoidance for mobile robots with Reinforcement Learning in Partial Observability.
 
-### Dynamic obstacles environment
+<p align="center">
+    <img src="figures/dynamic_obstacles.gif" alt="GIF of the Dynamic Obstacles environment">
+</p>
 
 Registered configurations:
 - `MiniGrid-Dynamic-Obstacles-5x5-v0`
@@ -512,9 +522,3 @@ Registered configurations:
 - `MiniGrid-Dynamic-Obstacles-Random-6x6-v0`
 - `MiniGrid-Dynamic-Obstacles-8x8-v0`
 - `MiniGrid-Dynamic-Obstacles-16x16-v0`
-
-<p align="center">
-<img src="/figures/dynamic_obstacles.gif">
-</p>
-
-This environment is an empty room with moving obstacles. The goal of the agent is to reach the green goal square without colliding with any obstacle. A large penalty is subtracted if the agent collides with an obstacle and the episode finishes. This environment is useful to test Dynamic Obstacle Avoidance for mobile robots with Reinforcement Learning in Partial Observability.

+ 505 - 3
gym_minigrid/__init__.py

@@ -1,3 +1,505 @@
-# Import the envs module so that envs register themselves
-# Import wrappers so it's accessible when installing with pip
-from gym_minigrid import envs, wrappers
+from gym.envs.registration import register
+
+from gym_minigrid.minigrid import Wall
+
+
+def register_minigrid_envs():
+    # BlockedUnlockPickup
+    # ----------------------------------------
+
+    register(
+        id="MiniGrid-BlockedUnlockPickup-v0",
+        entry_point="gym_minigrid.envs:BlockedUnlockPickupEnv",
+    )
+
+    # LavaCrossing
+    # ----------------------------------------
+    register(
+        id="MiniGrid-LavaCrossingS9N1-v0",
+        entry_point="gym_minigrid.envs:CrossingEnv",
+        kwargs={"size": 9, "num_crossings": 1},
+    )
+
+    register(
+        id="MiniGrid-LavaCrossingS9N2-v0",
+        entry_point="gym_minigrid.envs:CrossingEnv",
+        kwargs={"size": 9, "num_crossings": 2},
+    )
+
+    register(
+        id="MiniGrid-LavaCrossingS9N3-v0",
+        entry_point="gym_minigrid.envs:CrossingEnv",
+        kwargs={"size": 9, "num_crossings": 3},
+    )
+
+    register(
+        id="MiniGrid-LavaCrossingS11N5-v0",
+        entry_point="gym_minigrid.envs:CrossingEnv",
+        kwargs={"size": 11, "num_crossings": 5},
+    )
+
+    # SimpleCrossing
+    # ----------------------------------------
+
+    register(
+        id="MiniGrid-SimpleCrossingS9N1-v0",
+        entry_point="gym_minigrid.envs:CrossingEnv",
+        kwargs={"size": 9, "num_crossings": 1, "obstacle_type": Wall},
+    )
+
+    register(
+        id="MiniGrid-SimpleCrossingS9N2-v0",
+        entry_point="gym_minigrid.envs:CrossingEnv",
+        kwargs={"size": 9, "num_crossings": 2, "obstacle_type": Wall},
+    )
+
+    register(
+        id="MiniGrid-SimpleCrossingS9N3-v0",
+        entry_point="gym_minigrid.envs:CrossingEnv",
+        kwargs={"size": 9, "num_crossings": 3, "obstacle_type": Wall},
+    )
+
+    register(
+        id="MiniGrid-SimpleCrossingS11N5-v0",
+        entry_point="gym_minigrid.envs:CrossingEnv",
+        kwargs={"size": 11, "num_crossings": 5, "obstacle_type": Wall},
+    )
+
+    # DistShift
+    # ----------------------------------------
+
+    register(
+        id="MiniGrid-DistShift1-v0",
+        entry_point="gym_minigrid.envs:DistShiftEnv",
+        kwargs={"strip2_row": 2},
+    )
+
+    register(
+        id="MiniGrid-DistShift2-v0",
+        entry_point="gym_minigrid.envs:DistShiftEnv",
+        kwargs={"strip2_row": 5},
+    )
+
+    # DoorKey
+    # ----------------------------------------
+
+    register(
+        id="MiniGrid-DoorKey-5x5-v0",
+        entry_point="gym_minigrid.envs:DoorKeyEnv",
+        kwargs={"size": 5},
+    )
+
+    register(
+        id="MiniGrid-DoorKey-6x6-v0",
+        entry_point="gym_minigrid.envs:DoorKeyEnv",
+        kwargs={"size": 5},
+    )
+
+    register(
+        id="MiniGrid-DoorKey-8x8-v0",
+        entry_point="gym_minigrid.envs:DoorKeyEnv",
+        kwargs={"size": 8},
+    )
+
+    register(
+        id="MiniGrid-DoorKey-16x16-v0",
+        entry_point="gym_minigrid.envs:DoorKeyEnv",
+        kwargs={"size": 16},
+    )
+
+    # Dynamic-Obstacles
+    # ----------------------------------------
+
+    register(
+        id="MiniGrid-Dynamic-Obstacles-5x5-v0",
+        entry_point="gym_minigrid.envs:DynamicObstaclesEnv",
+        kwargs={"size": 5, "n_obstacles": 2},
+    )
+
+    register(
+        id="MiniGrid-Dynamic-Obstacles-Random-5x5-v0",
+        entry_point="gym_minigrid.envs:DynamicObstaclesEnv",
+        kwargs={"size": 5, "agent_start_pos": None, "n_obstacles": 2},
+    )
+
+    register(
+        id="MiniGrid-Dynamic-Obstacles-6x6-v0",
+        entry_point="gym_minigrid.envs:DynamicObstaclesEnv",
+        kwargs={"size": 6, "n_obstacles": 3},
+    )
+
+    register(
+        id="MiniGrid-Dynamic-Obstacles-Random-6x6-v0",
+        entry_point="gym_minigrid.envs:DynamicObstaclesEnv",
+        kwargs={"size": 6, "agent_start_pos": None, "n_obstacles": 3},
+    )
+
+    register(
+        id="MiniGrid-Dynamic-Obstacles-8x8-v0",
+        entry_point="gym_minigrid.envs:DynamicObstaclesEnv",
+    )
+
+    register(
+        id="MiniGrid-Dynamic-Obstacles-16x16-v0",
+        entry_point="gym_minigrid.envs:DynamicObstaclesEnv",
+        kwargs={"size": 16, "n_obstacles": 8},
+    )
+
+    # Empty
+    # ----------------------------------------
+
+    register(
+        id="MiniGrid-Empty-5x5-v0",
+        entry_point="gym_minigrid.envs:EmptyEnv",
+        kwargs={"size": 5},
+    )
+
+    register(
+        id="MiniGrid-Empty-Random-5x5-v0",
+        entry_point="gym_minigrid.envs:EmptyEnv",
+        kwargs={"size": 5, "agent_start_pos": None},
+    )
+
+    register(
+        id="MiniGrid-Empty-6x6-v0",
+        entry_point="gym_minigrid.envs:EmptyEnv",
+        kwargs={"size": 6},
+    )
+
+    register(
+        id="MiniGrid-Empty-Random-6x6-v0",
+        entry_point="gym_minigrid.envs:EmptyEnv",
+        kwargs={"size": 6, "agent_start_pos": None},
+    )
+
+    register(
+        id="MiniGrid-Empty-8x8-v0",
+        entry_point="gym_minigrid.envs:EmptyEnv",
+    )
+
+    register(
+        id="MiniGrid-Empty-16x16-v0",
+        entry_point="gym_minigrid.envs:EmptyEnv",
+        kwargs={"size": 16},
+    )
+
+    # Fetch
+    # ----------------------------------------
+
+    register(
+        id="MiniGrid-Fetch-5x5-N2-v0",
+        entry_point="gym_minigrid.envs:FetchEnv",
+        kwargs={"size": 5, "numObjs": 2},
+    )
+
+    register(
+        id="MiniGrid-Fetch-6x6-N2-v0",
+        entry_point="gym_minigrid.envs:FetchEnv",
+        kwargs={"size": 6, "numObjs": 2},
+    )
+
+    register(id="MiniGrid-Fetch-8x8-N3-v0", entry_point="gym_minigrid.envs:FetchEnv")
+
+    # FourRooms
+    # ----------------------------------------
+
+    register(
+        id="MiniGrid-FourRooms-v0",
+        entry_point="gym_minigrid.envs:FourRoomsEnv",
+    )
+
+    # GoToDoor
+    # ----------------------------------------
+
+    register(
+        id="MiniGrid-GoToDoor-5x5-v0",
+        entry_point="gym_minigrid.envs:GoToDoorEnv",
+    )
+
+    register(
+        id="MiniGrid-GoToDoor-6x6-v0",
+        entry_point="gym_minigrid.envs:GoToDoorEnv",
+        kwargs={"size": 6},
+    )
+
+    register(
+        id="MiniGrid-GoToDoor-8x8-v0",
+        entry_point="gym_minigrid.envs:GoToDoorEnv",
+        kwargs={"size": 8},
+    )
+
+    # GoToObject
+    # ----------------------------------------
+
+    register(
+        id="MiniGrid-GoToObject-6x6-N2-v0",
+        entry_point="gym_minigrid.envs:GoToObjectEnv",
+    )
+
+    register(
+        id="MiniGrid-GoToObject-8x8-N2-v0",
+        entry_point="gym_minigrid.envs:GoToObjectEnv",
+        kwargs={"size": 8, "numObjs": 2},
+    )
+
+    # KeyCorridor
+    # ----------------------------------------
+
+    register(
+        id="MiniGrid-KeyCorridorS3R1-v0",
+        entry_point="gym_minigrid.envs:KeyCorridorEnv",
+        kwargs={"room_size": 3, "num_rows": 1},
+    )
+
+    register(
+        id="MiniGrid-KeyCorridorS3R2-v0",
+        entry_point="gym_minigrid.envs:KeyCorridorEnv",
+        kwargs={"room_size": 3, "num_rows": 2},
+    )
+
+    register(
+        id="MiniGrid-KeyCorridorS3R3-v0",
+        entry_point="gym_minigrid.envs:KeyCorridorEnv",
+        kwargs={"room_size": 3, "num_rows": 3},
+    )
+
+    register(
+        id="MiniGrid-KeyCorridorS4R3-v0",
+        entry_point="gym_minigrid.envs:KeyCorridorEnv",
+        kwargs={"room_size": 4, "num_rows": 3},
+    )
+
+    register(
+        id="MiniGrid-KeyCorridorS5R3-v0",
+        entry_point="gym_minigrid.envs:KeyCorridorEnv",
+        kwargs={"room_size": 5, "num_rows": 3},
+    )
+
+    register(
+        id="MiniGrid-KeyCorridorS6R3-v0",
+        entry_point="gym_minigrid.envs:KeyCorridorEnv",
+        kwargs={"room_size": 6, "num_rows": 3},
+    )
+
+    # LavaGap
+    # ----------------------------------------
+
+    register(
+        id="MiniGrid-LavaGapS5-v0",
+        entry_point="gym_minigrid.envs:LavaGapEnv",
+        kwargs={"size": 5},
+    )
+
+    register(
+        id="MiniGrid-LavaGapS6-v0",
+        entry_point="gym_minigrid.envs:LavaGapEnv",
+        kwargs={"size": 6},
+    )
+
+    register(
+        id="MiniGrid-LavaGapS7-v0",
+        entry_point="gym_minigrid.envs:LavaGapEnv",
+        kwargs={"size": 7},
+    )
+
+    # LockedRoom
+    # ----------------------------------------
+
+    register(
+        id="MiniGrid-LockedRoom-v0",
+        entry_point="gym_minigrid.envs:LockedRoomEnv",
+    )
+
+    # Memory
+    # ----------------------------------------
+
+    register(
+        id="MiniGrid-MemoryS17Random-v0",
+        entry_point="gym_minigrid.envs:MemoryEnv",
+        kwargs={"size": 17, "random_length": True},
+    )
+
+    register(
+        id="MiniGrid-MemoryS13Random-v0",
+        entry_point="gym_minigrid.envs:MemoryEnv",
+        kwargs={"size": 13, "random_length": True},
+    )
+
+    register(
+        id="MiniGrid-MemoryS13-v0",
+        entry_point="gym_minigrid.envs:MemoryEnv",
+        kwargs={"size": 13},
+    )
+
+    register(
+        id="MiniGrid-MemoryS11-v0",
+        entry_point="gym_minigrid.envs:MemoryEnv",
+        kwargs={"size": 11},
+    )
+
+    register(
+        id="MiniGrid-MemoryS9-v0",
+        entry_point="gym_minigrid.envs:MemoryEnv",
+        kwargs={"size": 9},
+    )
+
+    register(
+        id="MiniGrid-MemoryS7-v0",
+        entry_point="gym_minigrid.envs:MemoryEnv",
+        kwargs={"size": 7},
+    )
+
+    # MultiRoom
+    # ----------------------------------------
+
+    register(
+        id="MiniGrid-MultiRoom-N2-S4-v0",
+        entry_point="gym_minigrid.envs:MultiRoomEnv",
+        kwargs={"minNumRooms": 2, "maxNumRooms": 2, "maxRoomSize": 4},
+    )
+
+    register(
+        id="MiniGrid-MultiRoom-N4-S5-v0",
+        entry_point="gym_minigrid.envs:MultiRoomEnv",
+        kwargs={"minNumRooms": 6, "maxNumRooms": 6, "maxRoomSize": 5},
+    )
+
+    register(
+        id="MiniGrid-MultiRoom-N6-v0",
+        entry_point="gym_minigrid.envs:MultiRoomEnv",
+        kwargs={"minNumRooms": 6, "maxNumRooms": 6},
+    )
+
+    # ObstructedMaze
+    # ----------------------------------------
+
+    register(
+        id="MiniGrid-ObstructedMaze-1Dl-v0",
+        entry_point="gym_minigrid.envs:ObstructedMaze_1Dlhb",
+        kwargs={"key_in_box": False, "blocked": False},
+    )
+
+    register(
+        id="MiniGrid-ObstructedMaze-1Dlh-v0",
+        entry_point="gym_minigrid.envs:ObstructedMaze_1Dlhb",
+        kwargs={"key_in_box": True, "blocked": False},
+    )
+
+    register(
+        id="MiniGrid-ObstructedMaze-1Dlhb-v0",
+        entry_point="gym_minigrid.envs:ObstructedMaze_1Dlhb",
+    )
+
+    register(
+        id="MiniGrid-ObstructedMaze-2Dl-v0",
+        entry_point="gym_minigrid.envs:ObstructedMaze_Full",
+        kwargs={
+            "agent_room": (2, 1),
+            "key_in_box": False,
+            "blocked": False,
+            "num_quarters": 1,
+            "num_rooms_visited": 4,
+        },
+    )
+
+    register(
+        id="MiniGrid-ObstructedMaze-2Dlh-v0",
+        entry_point="gym_minigrid.envs:ObstructedMaze_Full",
+        kwargs={
+            "agent_room": (2, 1),
+            "key_in_box": True,
+            "blocked": False,
+            "num_quarters": 1,
+            "num_rooms_visited": 4,
+        },
+    )
+
+    register(
+        id="MiniGrid-ObstructedMaze-2Dlhb-v0",
+        entry_point="gym_minigrid.envs:ObstructedMaze_Full",
+        kwargs={
+            "agent_room": (2, 1),
+            "key_in_box": True,
+            "blocked": True,
+            "num_quarters": 1,
+            "num_rooms_visited": 4,
+        },
+    )
+
+    register(
+        id="MiniGrid-ObstructedMaze-1Q-v0",
+        entry_point="gym_minigrid.envs:ObstructedMaze_Full",
+        kwargs={
+            "agent_room": (1, 1),
+            "key_in_box": True,
+            "blocked": True,
+            "num_quarters": 1,
+            "num_rooms_visited": 5,
+        },
+    )
+
+    register(
+        id="MiniGrid-ObstructedMaze-2Q-v0",
+        entry_point="gym_minigrid.envs:ObstructedMaze_Full",
+        kwargs={
+            "agent_room": (2, 1),
+            "key_in_box": True,
+            "blocked": True,
+            "num_quarters": 2,
+            "num_rooms_visited": 11,
+        },
+    )
+
+    register(
+        id="MiniGrid-ObstructedMaze-Full-v0",
+        entry_point="gym_minigrid.envs:ObstructedMaze_Full",
+    )
+
+    # Playground
+    # ----------------------------------------
+
+    register(
+        id="MiniGrid-Playground-v0",
+        entry_point="gym_minigrid.envs:PlaygroundEnv",
+    )
+
+    # PutNear
+    # ----------------------------------------
+
+    register(
+        id="MiniGrid-PutNear-6x6-N2-v0",
+        entry_point="gym_minigrid.envs:PutNearEnv",
+    )
+
+    register(
+        id="MiniGrid-PutNear-8x8-N3-v0",
+        entry_point="gym_minigrid.envs:PutNearEnv",
+        kwargs={"size": 8, "numObjs": 3},
+    )
+
+    # RedBlueDoors
+    # ----------------------------------------
+
+    register(
+        id="MiniGrid-RedBlueDoors-6x6-v0",
+        entry_point="gym_minigrid.envs:RedBlueDoorEnv",
+        kwargs={"size": 6},
+    )
+
+    register(
+        id="MiniGrid-RedBlueDoors-8x8-v0",
+        entry_point="gym_minigrid.envs:RedBlueDoorEnv",
+    )
+
+    # Unlock
+    # ----------------------------------------
+
+    register(id="MiniGrid-Unlock-v0", entry_point="gym_minigrid.envs:UnlockEnv")
+
+    # UnlockPickup
+    # ----------------------------------------
+
+    register(
+        id="MiniGrid-UnlockPickup-v0",
+        entry_point="gym_minigrid.envs:UnlockPickupEnv",
+    )

+ 2 - 1
benchmark.py

@@ -18,7 +18,7 @@ parser.add_argument("--num_resets", default=200)
 parser.add_argument("--num_frames", default=5000)
 args = parser.parse_args()
 
-env = gym.make(args.env_name, render_mode="rgb_array", new_step_api=True)
+env = gym.make(args.env_name, new_step_api=True)
 
 # Benchmark env.reset
 t0 = time.time()
@@ -41,6 +41,7 @@ env = gym.make(args.env_name, new_step_api=True)
 env = RGBImgPartialObsWrapper(env)
 env = ImgObsWrapper(env)
 
+env.reset()
 # Benchmark rendering
 t0 = time.time()
 for i in range(args.num_frames):

+ 6 - 8
gym_minigrid/envs/blockedunlockpickup.py

@@ -1,5 +1,4 @@
-from gym_minigrid.minigrid import Ball
-from gym_minigrid.register import register
+from gym_minigrid.minigrid import COLOR_NAMES, Ball, MissionSpace
 from gym_minigrid.roomgrid import RoomGrid
 
 
@@ -11,7 +10,12 @@ class BlockedUnlockPickupEnv(RoomGrid):
 
     def __init__(self, **kwargs):
         room_size = 6
+        mission_space = MissionSpace(
+            mission_func=lambda color, type: f"pick up the {color} {type}",
+            ordered_placeholders=[COLOR_NAMES, ["box", "key"]],
+        )
         super().__init__(
+            mission_space=mission_space,
             num_rows=1,
             num_cols=2,
             room_size=room_size,
@@ -46,9 +50,3 @@ class BlockedUnlockPickupEnv(RoomGrid):
                 terminated = True
 
         return obs, reward, terminated, truncated, info
-
-
-register(
-    id="MiniGrid-BlockedUnlockPickup-v0",
-    entry_point="gym_minigrid.envs.blockedunlockpickup:BlockedUnlockPickupEnv",
-)

+ 12 - 63
gym_minigrid/envs/crossing.py

@@ -2,8 +2,7 @@ import itertools as itt
 
 import numpy as np
 
-from gym_minigrid.minigrid import Goal, Grid, Lava, MiniGridEnv, Wall
-from gym_minigrid.register import register
+from gym_minigrid.minigrid import Goal, Grid, Lava, MiniGridEnv, MissionSpace
 
 
 class CrossingEnv(MiniGridEnv):
@@ -14,7 +13,18 @@ class CrossingEnv(MiniGridEnv):
     def __init__(self, size=9, num_crossings=1, obstacle_type=Lava, **kwargs):
         self.num_crossings = num_crossings
         self.obstacle_type = obstacle_type
+
+        if obstacle_type == Lava:
+            mission_space = MissionSpace(
+                mission_func=lambda: "avoid the lava and get to the green goal square"
+            )
+        else:
+            mission_space = MissionSpace(
+                mission_func=lambda: "find the opening and get to the green goal square"
+            )
+
         super().__init__(
+            mission_space=mission_space,
             grid_size=size,
             max_steps=4 * size * size,
             # Set this to True for maximum speed
@@ -85,64 +95,3 @@ class CrossingEnv(MiniGridEnv):
             if self.obstacle_type == Lava
             else "find the opening and get to the green goal square"
         )
-
-
-register(
-    id="MiniGrid-LavaCrossingS9N1-v0",
-    entry_point="gym_minigrid.envs.crossing:CrossingEnv",
-    size=9,
-    num_crossings=1,
-)
-
-register(
-    id="MiniGrid-LavaCrossingS9N2-v0",
-    entry_point="gym_minigrid.envs.crossing:CrossingEnv",
-    size=9,
-    num_crossings=2,
-)
-
-register(
-    id="MiniGrid-LavaCrossingS9N3-v0",
-    entry_point="gym_minigrid.envs.crossing:CrossingEnv",
-    size=9,
-    num_crossings=3,
-)
-
-register(
-    id="MiniGrid-LavaCrossingS11N5-v0",
-    entry_point="gym_minigrid.envs.crossing:CrossingEnv",
-    size=11,
-    num_crossings=5,
-)
-
-register(
-    id="MiniGrid-SimpleCrossingS9N1-v0",
-    entry_point="gym_minigrid.envs.crossing:CrossingEnv",
-    size=9,
-    num_crossings=1,
-    obstacle_type=Wall,
-)
-
-register(
-    id="MiniGrid-SimpleCrossingS9N2-v0",
-    entry_point="gym_minigrid.envs.crossing:CrossingEnv",
-    size=9,
-    num_crossings=2,
-    obstacle_type=Wall,
-)
-
-register(
-    id="MiniGrid-SimpleCrossingS9N3-v0",
-    entry_point="gym_minigrid.envs.crossing:CrossingEnv",
-    size=9,
-    num_crossings=3,
-    obstacle_type=Wall,
-)
-
-register(
-    id="MiniGrid-SimpleCrossingS11N5-v0",
-    entry_point="gym_minigrid.envs.crossing:CrossingEnv",
-    size=11,
-    num_crossings=5,
-    obstacle_type=Wall,
-)

+ 6 - 15
gym_minigrid/envs/distshift.py

@@ -1,5 +1,4 @@
-from gym_minigrid.minigrid import Goal, Grid, Lava, MiniGridEnv
-from gym_minigrid.register import register
+from gym_minigrid.minigrid import Goal, Grid, Lava, MiniGridEnv, MissionSpace
 
 
 class DistShiftEnv(MiniGridEnv):
@@ -21,7 +20,12 @@ class DistShiftEnv(MiniGridEnv):
         self.goal_pos = (width - 2, 1)
         self.strip2_row = strip2_row
 
+        mission_space = MissionSpace(
+            mission_func=lambda: "get to the green goal square"
+        )
+
         super().__init__(
+            mission_space=mission_space,
             width=width,
             height=height,
             max_steps=4 * width * height,
@@ -53,16 +57,3 @@ class DistShiftEnv(MiniGridEnv):
             self.place_agent()
 
         self.mission = "get to the green goal square"
-
-
-register(
-    id="MiniGrid-DistShift1-v0",
-    entry_point="gym_minigrid.envs.distshift:DistShiftEnv",
-    strip2_row=2,
-)
-
-register(
-    id="MiniGrid-DistShift2-v0",
-    entry_point="gym_minigrid.envs.distshift:DistShiftEnv",
-    strip2_row=5,
-)

+ 5 - 28
gym_minigrid/envs/doorkey.py

@@ -1,5 +1,4 @@
-from gym_minigrid.minigrid import Door, Goal, Grid, Key, MiniGridEnv
-from gym_minigrid.register import register
+from gym_minigrid.minigrid import Door, Goal, Grid, Key, MiniGridEnv, MissionSpace
 
 
 class DoorKeyEnv(MiniGridEnv):
@@ -10,7 +9,10 @@ class DoorKeyEnv(MiniGridEnv):
     def __init__(self, size=8, **kwargs):
         if "max_steps" not in kwargs:
             kwargs["max_steps"] = 10 * size * size
-        super().__init__(grid_size=size, **kwargs)
+        mission_space = MissionSpace(
+            mission_func=lambda: "use the key to open the door and then get to the goal"
+        )
+        super().__init__(mission_space=mission_space, grid_size=size, **kwargs)
 
     def _gen_grid(self, width, height):
         # Create an empty grid
@@ -38,28 +40,3 @@ class DoorKeyEnv(MiniGridEnv):
         self.place_obj(obj=Key("yellow"), top=(0, 0), size=(splitIdx, height))
 
         self.mission = "use the key to open the door and then get to the goal"
-
-
-register(
-    id="MiniGrid-DoorKey-5x5-v0",
-    entry_point="gym_minigrid.envs.doorkey:DoorKeyEnv",
-    size=5,
-)
-
-register(
-    id="MiniGrid-DoorKey-6x6-v0",
-    entry_point="gym_minigrid.envs.doorkey:DoorKeyEnv",
-    size=6,
-)
-
-register(
-    id="MiniGrid-DoorKey-8x8-v0",
-    entry_point="gym_minigrid.envs.doorkey:DoorKeyEnv",
-    size=8,
-)
-
-register(
-    id="MiniGrid-DoorKey-16x16-v0",
-    entry_point="gym_minigrid.envs.doorkey:DoorKeyEnv",
-    size=16,
-)

+ 7 - 45
gym_minigrid/envs/dynamicobstacles.py

@@ -2,8 +2,7 @@ from operator import add
 
 from gym.spaces import Discrete
 
-from gym_minigrid.minigrid import Ball, Goal, Grid, MiniGridEnv
-from gym_minigrid.register import register
+from gym_minigrid.minigrid import Ball, Goal, Grid, MiniGridEnv, MissionSpace
 
 
 class DynamicObstaclesEnv(MiniGridEnv):
@@ -22,7 +21,13 @@ class DynamicObstaclesEnv(MiniGridEnv):
             self.n_obstacles = int(n_obstacles)
         else:
             self.n_obstacles = int(size / 2)
+
+        mission_space = MissionSpace(
+            mission_func=lambda: "get to the green goal square"
+        )
+
         super().__init__(
+            mission_space=mission_space,
             grid_size=size,
             max_steps=4 * size * size,
             # Set this to True for maximum speed
@@ -90,46 +95,3 @@ class DynamicObstaclesEnv(MiniGridEnv):
             return obs, reward, terminated, truncated, info
 
         return obs, reward, terminated, truncated, info
-
-
-register(
-    id="MiniGrid-Dynamic-Obstacles-5x5-v0",
-    entry_point="gym_minigrid.envs.dynamicobstacles:DynamicObstaclesEnv",
-    size=5,
-    n_obstacles=2,
-)
-
-register(
-    id="MiniGrid-Dynamic-Obstacles-Random-5x5-v0",
-    entry_point="gym_minigrid.envs.dynamicobstacles:DynamicObstaclesEnv",
-    size=5,
-    agent_start_pos=None,
-    n_obstacles=2,
-)
-
-register(
-    id="MiniGrid-Dynamic-Obstacles-6x6-v0",
-    entry_point="gym_minigrid.envs.dynamicobstacles:DynamicObstaclesEnv",
-    size=6,
-    n_obstacles=3,
-)
-
-register(
-    id="MiniGrid-Dynamic-Obstacles-Random-6x6-v0",
-    entry_point="gym_minigrid.envs.dynamicobstacles:DynamicObstaclesEnv",
-    size=6,
-    agent_start_pos=None,
-    n_obstacles=3,
-)
-
-register(
-    id="MiniGrid-Dynamic-Obstacles-8x8-v0",
-    entry_point="gym_minigrid.envs.dynamicobstacles:DynamicObstaclesEnv",
-)
-
-register(
-    id="MiniGrid-Dynamic-Obstacles-16x16-v0",
-    entry_point="gym_minigrid.envs.dynamicobstacles:DynamicObstaclesEnv",
-    size=16,
-    n_obstacles=8,
-)

+ 6 - 38
gym_minigrid/envs/empty.py

@@ -1,5 +1,4 @@
-from gym_minigrid.minigrid import Goal, Grid, MiniGridEnv
-from gym_minigrid.register import register
+from gym_minigrid.minigrid import Goal, Grid, MiniGridEnv, MissionSpace
 
 
 class EmptyEnv(MiniGridEnv):
@@ -11,7 +10,12 @@ class EmptyEnv(MiniGridEnv):
         self.agent_start_pos = agent_start_pos
         self.agent_start_dir = agent_start_dir
 
+        mission_space = MissionSpace(
+            mission_func=lambda: "get to the green goal square"
+        )
+
         super().__init__(
+            mission_space=mission_space,
             grid_size=size,
             max_steps=4 * size * size,
             # Set this to True for maximum speed
@@ -37,39 +41,3 @@ class EmptyEnv(MiniGridEnv):
             self.place_agent()
 
         self.mission = "get to the green goal square"
-
-
-register(
-    id="MiniGrid-Empty-5x5-v0", entry_point="gym_minigrid.envs.empty:EmptyEnv", size=5
-)
-
-register(
-    id="MiniGrid-Empty-Random-5x5-v0",
-    entry_point="gym_minigrid.envs.empty:EmptyEnv",
-    size=5,
-    agent_start_pos=None,
-)
-
-register(
-    id="MiniGrid-Empty-6x6-v0",
-    entry_point="gym_minigrid.envs.empty:EmptyEnv",
-    size=6,
-)
-
-register(
-    id="MiniGrid-Empty-Random-6x6-v0",
-    entry_point="gym_minigrid.envs.empty:EmptyEnv",
-    size=6,
-    agent_start_pos=None,
-)
-
-register(
-    id="MiniGrid-Empty-8x8-v0",
-    entry_point="gym_minigrid.envs.empty:EmptyEnv",
-)
-
-register(
-    id="MiniGrid-Empty-16x16-v0",
-    entry_point="gym_minigrid.envs.empty:EmptyEnv",
-    size=16,
-)

+ 26 - 24
gym_minigrid/envs/fetch.py

@@ -1,5 +1,11 @@
-from gym_minigrid.minigrid import COLOR_NAMES, Ball, Grid, Key, MiniGridEnv
-from gym_minigrid.register import register
+from gym_minigrid.minigrid import (
+    COLOR_NAMES,
+    Ball,
+    Grid,
+    Key,
+    MiniGridEnv,
+    MissionSpace,
+)
 
 
 class FetchEnv(MiniGridEnv):
@@ -10,9 +16,24 @@ class FetchEnv(MiniGridEnv):
 
     def __init__(self, size=8, numObjs=3, **kwargs):
         self.numObjs = numObjs
-
+        self.obj_types = ["key", "ball"]
+
+        MISSION_SYNTAX = [
+            "get a",
+            "go get a",
+            "fetch a",
+            "go fetch a",
+            "you must fetch a",
+        ]
+        self.size = size
+        mission_space = MissionSpace(
+            mission_func=lambda syntax, color, type: f"{syntax} {color} {type}",
+            ordered_placeholders=[MISSION_SYNTAX, COLOR_NAMES, self.obj_types],
+        )
         super().__init__(
-            grid_size=size,
+            mission_space=mission_space,
+            width=size,
+            height=size,
             max_steps=5 * size**2,
             # Set this to True for maximum speed
             see_through_walls=True,
@@ -28,13 +49,11 @@ class FetchEnv(MiniGridEnv):
         self.grid.vert_wall(0, 0)
         self.grid.vert_wall(width - 1, 0)
 
-        types = ["key", "ball"]
-
         objs = []
 
         # For each object to be generated
         while len(objs) < self.numObjs:
-            objType = self._rand_elem(types)
+            objType = self._rand_elem(self.obj_types)
             objColor = self._rand_elem(COLOR_NAMES)
 
             if objType == "key":
@@ -90,20 +109,3 @@ class FetchEnv(MiniGridEnv):
                 terminated = True
 
         return obs, reward, terminated, truncated, info
-
-
-register(
-    id="MiniGrid-Fetch-5x5-N2-v0",
-    entry_point="gym_minigrid.envs.fetch:FetchEnv",
-    size=5,
-    numObjs=2,
-)
-
-register(
-    id="MiniGrid-Fetch-6x6-N2-v0",
-    entry_point="gym_minigrid.envs.fetch:FetchEnv",
-    size=6,
-    numObjs=2,
-)
-
-register(id="MiniGrid-Fetch-8x8-N3-v0", entry_point="gym_minigrid.envs.fetch:FetchEnv")

+ 12 - 11
gym_minigrid/envs/fourrooms.py

@@ -1,6 +1,4 @@
-#!/usr/bin/env python
-from gym_minigrid.minigrid import Goal, Grid, MiniGridEnv
-from gym_minigrid.register import register
+from gym_minigrid.minigrid import Goal, Grid, MiniGridEnv, MissionSpace
 
 
 class FourRoomsEnv(MiniGridEnv):
@@ -12,7 +10,17 @@ class FourRoomsEnv(MiniGridEnv):
     def __init__(self, agent_pos=None, goal_pos=None, **kwargs):
         self._agent_default_pos = agent_pos
         self._goal_default_pos = goal_pos
-        super().__init__(grid_size=19, max_steps=100, **kwargs)
+
+        self.size = 19
+        mission_space = MissionSpace(mission_func=lambda: "reach the goal")
+
+        super().__init__(
+            mission_space=mission_space,
+            width=self.size,
+            height=self.size,
+            max_steps=100,
+            **kwargs
+        )
 
     def _gen_grid(self, width, height):
         # Create the grid
@@ -64,10 +72,3 @@ class FourRoomsEnv(MiniGridEnv):
             goal.init_pos, goal.cur_pos = self._goal_default_pos
         else:
             self.place_obj(Goal())
-
-        self.mission = "reach the goal"
-
-
-register(
-    id="MiniGrid-FourRooms-v0", entry_point="gym_minigrid.envs.fourrooms:FourRoomsEnv"
-)

+ 10 - 22
gym_minigrid/envs/gotodoor.py

@@ -1,5 +1,4 @@
-from gym_minigrid.minigrid import COLOR_NAMES, Door, Grid, MiniGridEnv
-from gym_minigrid.register import register
+from gym_minigrid.minigrid import COLOR_NAMES, Door, Grid, MiniGridEnv, MissionSpace
 
 
 class GoToDoorEnv(MiniGridEnv):
@@ -10,13 +9,19 @@ class GoToDoorEnv(MiniGridEnv):
 
     def __init__(self, size=5, **kwargs):
         assert size >= 5
-
+        self.size = size
+        mission_space = MissionSpace(
+            mission_func=lambda color: f"go to the {color} door",
+            ordered_placeholders=[COLOR_NAMES],
+        )
         super().__init__(
-            grid_size=size,
+            mission_space=mission_space,
+            width=size,
+            height=size,
             max_steps=5 * size**2,
             # Set this to True for maximum speed
             see_through_walls=True,
-            **kwargs
+            **kwargs,
         )
 
     def _gen_grid(self, width, height):
@@ -78,20 +83,3 @@ class GoToDoorEnv(MiniGridEnv):
             terminated = True
 
         return obs, reward, terminated, truncated, info
-
-
-register(
-    id="MiniGrid-GoToDoor-5x5-v0", entry_point="gym_minigrid.envs.gotodoor:GoToDoorEnv"
-)
-
-register(
-    id="MiniGrid-GoToDoor-6x6-v0",
-    entry_point="gym_minigrid.envs.gotodoor:GoToDoorEnv",
-    size=6,
-)
-
-register(
-    id="MiniGrid-GoToDoor-8x8-v0",
-    entry_point="gym_minigrid.envs.gotodoor:GoToDoorEnv",
-    size=8,
-)

+ 19 - 16
gym_minigrid/envs/gotoobject.py

@@ -1,5 +1,12 @@
-from gym_minigrid.minigrid import COLOR_NAMES, Ball, Box, Grid, Key, MiniGridEnv
-from gym_minigrid.register import register
+from gym_minigrid.minigrid import (
+    COLOR_NAMES,
+    Ball,
+    Box,
+    Grid,
+    Key,
+    MiniGridEnv,
+    MissionSpace,
+)
 
 
 class GoToObjectEnv(MiniGridEnv):
@@ -10,9 +17,18 @@ class GoToObjectEnv(MiniGridEnv):
 
     def __init__(self, size=6, numObjs=2, **kwargs):
         self.numObjs = numObjs
+        self.size = size
+        # Types of objects to be generated
+        self.obj_types = ["key", "ball", "box"]
 
+        mission_space = MissionSpace(
+            mission_func=lambda color, type: f"go to the {color} {type}",
+            ordered_placeholders=[COLOR_NAMES, self.obj_types],
+        )
         super().__init__(
-            grid_size=size,
+            mission_space=mission_space,
+            width=size,
+            height=size,
             max_steps=5 * size**2,
             # Set this to True for maximum speed
             see_through_walls=True,
@@ -86,16 +102,3 @@ class GoToObjectEnv(MiniGridEnv):
             terminated = True
 
         return obs, reward, terminated, truncated, info
-
-
-register(
-    id="MiniGrid-GoToObject-6x6-N2-v0",
-    entry_point="gym_minigrid.envs.gotoobject:GoToObjectEnv",
-)
-
-register(
-    id="MiniGrid-GoToObject-8x8-N2-v0",
-    entry_point="gym_minigrid.envs.gotoobject:GoToObjectEnv",
-    size=8,
-    numObjs=2,
-)

+ 6 - 45
gym_minigrid/envs/keycorridor.py

@@ -1,4 +1,4 @@
-from gym_minigrid.register import register
+from gym_minigrid.minigrid import COLOR_NAMES, MissionSpace
 from gym_minigrid.roomgrid import RoomGrid
 
 
@@ -10,8 +10,12 @@ class KeyCorridorEnv(RoomGrid):
 
     def __init__(self, num_rows=3, obj_type="ball", room_size=6, **kwargs):
         self.obj_type = obj_type
-
+        mission_space = MissionSpace(
+            mission_func=lambda color: f"pick up the {color} {obj_type}",
+            ordered_placeholders=[COLOR_NAMES],
+        )
         super().__init__(
+            mission_space=mission_space,
             room_size=room_size,
             num_rows=num_rows,
             max_steps=30 * room_size**2,
@@ -52,46 +56,3 @@ class KeyCorridorEnv(RoomGrid):
                 terminated = True
 
         return obs, reward, terminated, truncated, info
-
-
-register(
-    id="MiniGrid-KeyCorridorS3R1-v0",
-    entry_point="gym_minigrid.envs.keycorridor:KeyCorridorEnv",
-    room_size=3,
-    num_rows=1,
-)
-
-register(
-    id="MiniGrid-KeyCorridorS3R2-v0",
-    entry_point="gym_minigrid.envs.keycorridor:KeyCorridorEnv",
-    room_size=3,
-    num_rows=2,
-)
-
-register(
-    id="MiniGrid-KeyCorridorS3R3-v0",
-    entry_point="gym_minigrid.envs.keycorridor:KeyCorridorEnv",
-    room_size=3,
-    num_rows=3,
-)
-
-register(
-    id="MiniGrid-KeyCorridorS4R3-v0",
-    entry_point="gym_minigrid.envs.keycorridor:KeyCorridorEnv",
-    room_size=4,
-    num_rows=3,
-)
-
-register(
-    id="MiniGrid-KeyCorridorS5R3-v0",
-    entry_point="gym_minigrid.envs.keycorridor:KeyCorridorEnv",
-    room_size=5,
-    num_rows=3,
-)
-
-register(
-    id="MiniGrid-KeyCorridorS6R3-v0",
-    entry_point="gym_minigrid.envs.keycorridor:KeyCorridorEnv",
-    room_size=6,
-    num_rows=3,
-)

+ 15 - 23
gym_minigrid/envs/lavagap.py

@@ -1,7 +1,6 @@
 import numpy as np
 
-from gym_minigrid.minigrid import Goal, Grid, Lava, MiniGridEnv
-from gym_minigrid.register import register
+from gym_minigrid.minigrid import Goal, Grid, Lava, MiniGridEnv, MissionSpace
 
 
 class LavaGapEnv(MiniGridEnv):
@@ -12,12 +11,24 @@ class LavaGapEnv(MiniGridEnv):
 
     def __init__(self, size, obstacle_type=Lava, **kwargs):
         self.obstacle_type = obstacle_type
+        self.size = size
+
+        if obstacle_type == Lava:
+            mission_space = MissionSpace(
+                mission_func=lambda: "avoid the lava and get to the green goal square"
+            )
+        else:
+            mission_space = MissionSpace(
+                mission_func=lambda: "find the opening and get to the green goal square"
+            )
+
         super().__init__(
-            grid_size=size,
+            mission_space=mission_space,
+            width=size,
+            height=size,
             max_steps=4 * size * size,
             # Set this to True for maximum speed
             see_through_walls=False,
-            **kwargs
         )
 
     def _gen_grid(self, width, height):
@@ -56,22 +67,3 @@ class LavaGapEnv(MiniGridEnv):
             if self.obstacle_type == Lava
             else "find the opening and get to the green goal square"
         )
-
-
-register(
-    id="MiniGrid-LavaGapS5-v0",
-    entry_point="gym_minigrid.envs.lavagap:LavaGapEnv",
-    size=5,
-)
-
-register(
-    id="MiniGrid-LavaGapS6-v0",
-    entry_point="gym_minigrid.envs.lavagap:LavaGapEnv",
-    size=6,
-)
-
-register(
-    id="MiniGrid-LavaGapS7-v0",
-    entry_point="gym_minigrid.envs.lavagap:LavaGapEnv",
-    size=7,
-)

+ 22 - 9
gym_minigrid/envs/lockedroom.py

@@ -1,5 +1,13 @@
-from gym_minigrid.minigrid import COLOR_NAMES, Door, Goal, Grid, Key, MiniGridEnv, Wall
-from gym_minigrid.register import register
+from gym_minigrid.minigrid import (
+    COLOR_NAMES,
+    Door,
+    Goal,
+    Grid,
+    Key,
+    MiniGridEnv,
+    MissionSpace,
+    Wall,
+)
 
 
 class LockedRoom:
@@ -23,7 +31,18 @@ class LockedRoomEnv(MiniGridEnv):
     """
 
     def __init__(self, size=19, **kwargs):
-        super().__init__(grid_size=size, max_steps=10 * size, **kwargs)
+        self.size = size
+        mission_space = MissionSpace(
+            mission_func=lambda lockedroom_color, keyroom_color, door_color: f"get the {lockedroom_color} key from the {keyroom_color} room, unlock the {door_color} door and go to the goal",
+            ordered_placeholders=[COLOR_NAMES] * 3,
+        )
+        super().__init__(
+            mission_space=mission_space,
+            width=size,
+            height=size,
+            max_steps=10 * size,
+            **kwargs,
+        )
 
     def _gen_grid(self, width, height):
         # Create the grid
@@ -97,9 +116,3 @@ class LockedRoomEnv(MiniGridEnv):
             "unlock the %s door and "
             "go to the goal"
         ) % (lockedRoom.color, keyRoom.color, lockedRoom.color)
-
-
-register(
-    id="MiniGrid-LockedRoom-v0",
-    entry_point="gym_minigrid.envs.lockedroom:LockedRoomEnv",
-)

+ 8 - 40
gym_minigrid/envs/memory.py

@@ -1,7 +1,6 @@
 import numpy as np
 
-from gym_minigrid.minigrid import Ball, Grid, Key, MiniGridEnv, Wall
-from gym_minigrid.register import register
+from gym_minigrid.minigrid import Ball, Grid, Key, MiniGridEnv, MissionSpace, Wall
 
 
 class MemoryEnv(MiniGridEnv):
@@ -15,9 +14,15 @@ class MemoryEnv(MiniGridEnv):
     """
 
     def __init__(self, size=8, random_length=False, **kwargs):
+        self.size = size
         self.random_length = random_length
+        mission_space = MissionSpace(
+            mission_func=lambda: "go to the matching object at the end of the hallway"
+        )
         super().__init__(
-            grid_size=size,
+            mission_space=mission_space,
+            width=size,
+            height=size,
             max_steps=5 * size**2,
             # Set this to True for maximum speed
             see_through_walls=False,
@@ -96,40 +101,3 @@ class MemoryEnv(MiniGridEnv):
             terminated = True
 
         return obs, reward, terminated, truncated, info
-
-
-register(
-    id="MiniGrid-MemoryS17Random-v0",
-    entry_point="gym_minigrid.envs.memory:MemoryEnv",
-    size=17,
-    random_length=True,
-)
-
-register(
-    id="MiniGrid-MemoryS13Random-v0",
-    entry_point="gym_minigrid.envs.memory:MemoryEnv",
-    size=13,
-    random_length=True,
-)
-
-
-register(
-    id="MiniGrid-MemoryS13-v0",
-    entry_point="gym_minigrid.envs.memory:MemoryEnv",
-    size=13,
-)
-
-
-register(
-    id="MiniGrid-MemoryS11-v0",
-    entry_point="gym_minigrid.envs.memory:MemoryEnv",
-    size=11,
-)
-
-register(
-    id="MiniGrid-MemoryS9-v0", entry_point="gym_minigrid.envs.memory:MemoryEnv", size=9
-)
-
-register(
-    id="MiniGrid-MemoryS7-v0", entry_point="gym_minigrid.envs.memory:MemoryEnv", size=7
-)

+ 21 - 27
gym_minigrid/envs/multiroom.py

@@ -1,5 +1,12 @@
-from gym_minigrid.minigrid import COLOR_NAMES, Door, Goal, Grid, MiniGridEnv, Wall
-from gym_minigrid.register import register
+from gym_minigrid.minigrid import (
+    COLOR_NAMES,
+    Door,
+    Goal,
+    Grid,
+    MiniGridEnv,
+    MissionSpace,
+    Wall,
+)
 
 
 class MultiRoom:
@@ -26,7 +33,18 @@ class MultiRoomEnv(MiniGridEnv):
 
         self.rooms = []
 
-        super().__init__(grid_size=25, max_steps=self.maxNumRooms * 20, **kwargs)
+        mission_space = MissionSpace(
+            mission_func=lambda: "traverse the rooms to get to the goal"
+        )
+
+        self.size = 25
+
+        super().__init__(
+            mission_space=mission_space,
+            width=self.size,
+            height=self.size,
+            max_steps=self.maxNumRooms * 20,
+        )
 
     def _gen_grid(self, width, height):
         roomList = []
@@ -198,27 +216,3 @@ class MultiRoomEnv(MiniGridEnv):
                 break
 
         return True
-
-
-register(
-    id="MiniGrid-MultiRoom-N2-S4-v0",
-    entry_point="gym_minigrid.envs.multiroom:MultiRoomEnv",
-    minNumRooms=2,
-    maxNumRooms=2,
-    maxRoomSize=4,
-)
-
-register(
-    id="MiniGrid-MultiRoom-N4-S5-v0",
-    entry_point="gym_minigrid.envs.multiroom:MultiRoomEnv",
-    minNumRooms=4,
-    maxNumRooms=4,
-    maxRoomSize=5,
-)
-
-register(
-    id="MiniGrid-MultiRoom-N6-v0",
-    entry_point="gym_minigrid.envs.multiroom:MultiRoomEnv",
-    minNumRooms=6,
-    maxNumRooms=6,
-)

+ 8 - 80
gym_minigrid/envs/obstructedmaze.py

@@ -1,5 +1,4 @@
-from gym_minigrid.minigrid import COLOR_NAMES, DIR_TO_VEC, Ball, Box, Key
-from gym_minigrid.register import register
+from gym_minigrid.minigrid import COLOR_NAMES, DIR_TO_VEC, Ball, Box, Key, MissionSpace
 from gym_minigrid.roomgrid import RoomGrid
 
 
@@ -13,12 +12,16 @@ class ObstructedMazeEnv(RoomGrid):
         room_size = 6
         max_steps = 4 * num_rooms_visited * room_size**2
 
+        mission_space = MissionSpace(
+            mission_func=lambda: f"pick up the {COLOR_NAMES[0]} ball",
+        )
         super().__init__(
+            mission_space=mission_space,
             room_size=room_size,
             num_rows=num_rows,
             num_cols=num_cols,
             max_steps=max_steps,
-            **kwargs
+            **kwargs,
         )
         self.obj = Ball()  # initialize the obj attribute, that will be changed later on
 
@@ -123,7 +126,7 @@ class ObstructedMaze_Full(ObstructedMazeEnv):
         blocked=True,
         num_quarters=4,
         num_rooms_visited=25,
-        **kwargs
+        **kwargs,
     ):
         self.agent_room = agent_room
         self.key_in_box = key_in_box
@@ -157,7 +160,7 @@ class ObstructedMaze_Full(ObstructedMazeEnv):
                     door_idx=(i + k) % 4,
                     color=self.door_colors[(i + k) % len(self.door_colors)],
                     key_in_box=self.key_in_box,
-                    blocked=self.blocked
+                    blocked=self.blocked,
                 )
 
         corners = [(2, 0), (2, 2), (0, 2), (0, 0)][: self.num_quarters]
@@ -182,78 +185,3 @@ class ObstructedMaze_2Dlh(ObstructedMaze_Full):
 class ObstructedMaze_2Dlhb(ObstructedMaze_Full):
     def __init__(self, **kwargs):
         super().__init__((2, 1), True, True, 1, 4, **kwargs)
-
-
-register(
-    id="MiniGrid-ObstructedMaze-1Dl-v0",
-    entry_point="gym_minigrid.envs.obstructedmaze:ObstructedMaze_1Dlhb",
-    key_in_box=False,
-    blocked=False,
-)
-
-register(
-    id="MiniGrid-ObstructedMaze-1Dlh-v0",
-    entry_point="gym_minigrid.envs.obstructedmaze:ObstructedMaze_1Dlhb",
-    key_in_box=True,
-    blocked=False,
-)
-
-register(
-    id="MiniGrid-ObstructedMaze-1Dlhb-v0",
-    entry_point="gym_minigrid.envs.obstructedmaze:ObstructedMaze_1Dlhb",
-)
-
-register(
-    id="MiniGrid-ObstructedMaze-2Dl-v0",
-    entry_point="gym_minigrid.envs.obstructedmaze:ObstructedMaze_Full",
-    agent_room=(2, 1),
-    key_in_box=False,
-    blocked=False,
-    num_quarters=1,
-    num_rooms_visited=4,
-)
-
-register(
-    id="MiniGrid-ObstructedMaze-2Dlh-v0",
-    entry_point="gym_minigrid.envs.obstructedmaze:ObstructedMaze_Full",
-    agent_room=(2, 1),
-    key_in_box=True,
-    blocked=False,
-    num_quarters=1,
-    num_rooms_visited=4,
-)
-
-register(
-    id="MiniGrid-ObstructedMaze-2Dlhb-v0",
-    entry_point="gym_minigrid.envs.obstructedmaze:ObstructedMaze_Full",
-    agent_room=(2, 1),
-    key_in_box=True,
-    blocked=True,
-    num_quarters=1,
-    num_rooms_visited=4,
-)
-
-register(
-    id="MiniGrid-ObstructedMaze-1Q-v0",
-    entry_point="gym_minigrid.envs.obstructedmaze:ObstructedMaze_Full",
-    agent_room=(1, 1),
-    key_in_box=True,
-    blocked=True,
-    num_quarters=1,
-    num_rooms_visited=5,
-)
-
-register(
-    id="MiniGrid-ObstructedMaze-2Q-v0",
-    entry_point="gym_minigrid.envs.obstructedmaze:ObstructedMaze_Full",
-    agent_room=(2, 1),
-    key_in_box=True,
-    blocked=True,
-    num_quarters=2,
-    num_rooms_visited=11,
-)
-
-register(
-    id="MiniGrid-ObstructedMaze-Full-v0",
-    entry_point="gym_minigrid.envs.obstructedmaze:ObstructedMaze_Full",
-)

+ 19 - 9
gym_minigrid/envs/playground.py

@@ -1,5 +1,13 @@
-from gym_minigrid.minigrid import COLOR_NAMES, Ball, Box, Door, Grid, Key, MiniGridEnv
-from gym_minigrid.register import register
+from gym_minigrid.minigrid import (
+    COLOR_NAMES,
+    Ball,
+    Box,
+    Door,
+    Grid,
+    Key,
+    MiniGridEnv,
+    MissionSpace,
+)
 
 
 class PlaygroundEnv(MiniGridEnv):
@@ -9,7 +17,15 @@ class PlaygroundEnv(MiniGridEnv):
     """
 
     def __init__(self, **kwargs):
-        super().__init__(grid_size=19, max_steps=100, **kwargs)
+        mission_space = MissionSpace(mission_func=lambda: "")
+        self.size = 19
+        super().__init__(
+            mission_space=mission_space,
+            width=self.size,
+            height=self.size,
+            max_steps=100,
+            **kwargs
+        )
 
     def _gen_grid(self, width, height):
         # Create the grid
@@ -72,9 +88,3 @@ class PlaygroundEnv(MiniGridEnv):
 
         # No explicit mission in this environment
         self.mission = ""
-
-
-register(
-    id="MiniGrid-Playground-v0",
-    entry_point="gym_minigrid.envs.playground:PlaygroundEnv",
-)

+ 23 - 17
gym_minigrid/envs/putnear.py

@@ -1,5 +1,12 @@
-from gym_minigrid.minigrid import COLOR_NAMES, Ball, Box, Grid, Key, MiniGridEnv
-from gym_minigrid.register import register
+from gym_minigrid.minigrid import (
+    COLOR_NAMES,
+    Ball,
+    Box,
+    Grid,
+    Key,
+    MiniGridEnv,
+    MissionSpace,
+)
 
 
 class PutNearEnv(MiniGridEnv):
@@ -9,14 +16,25 @@ class PutNearEnv(MiniGridEnv):
     """
 
     def __init__(self, size=6, numObjs=2, **kwargs):
+        self.size = size
         self.numObjs = numObjs
-
+        self.obj_types = ["key", "ball", "box"]
+        mission_space = MissionSpace(
+            mission_func=lambda move_color, move_type, target_color, target_type: f"put the {move_color} {move_type} near the {target_color} {target_type}",
+            ordered_placeholders=[
+                COLOR_NAMES,
+                self.obj_types,
+                COLOR_NAMES,
+                self.obj_types,
+            ],
+        )
         super().__init__(
-            grid_size=size,
+            mission_space=mission_space,
+            width=size,
+            height=size,
             max_steps=5 * size,
             # Set this to True for maximum speed
             see_through_walls=True,
-            **kwargs
         )
 
     def _gen_grid(self, width, height):
@@ -117,15 +135,3 @@ class PutNearEnv(MiniGridEnv):
             terminated = True
 
         return obs, reward, terminated, truncated, info
-
-
-register(
-    id="MiniGrid-PutNear-6x6-N2-v0", entry_point="gym_minigrid.envs.putnear:PutNearEnv"
-)
-
-register(
-    id="MiniGrid-PutNear-8x8-N3-v0",
-    entry_point="gym_minigrid.envs.putnear:PutNearEnv",
-    size=8,
-    numObjs=3,
-)

+ 9 - 16
gym_minigrid/envs/redbluedoors.py

@@ -1,5 +1,4 @@
-from gym_minigrid.minigrid import Door, Grid, MiniGridEnv
-from gym_minigrid.register import register
+from gym_minigrid.minigrid import Door, Grid, MiniGridEnv, MissionSpace
 
 
 class RedBlueDoorEnv(MiniGridEnv):
@@ -11,9 +10,15 @@ class RedBlueDoorEnv(MiniGridEnv):
 
     def __init__(self, size=8, **kwargs):
         self.size = size
-
+        mission_space = MissionSpace(
+            mission_func=lambda: "open the red door then the blue door"
+        )
         super().__init__(
-            width=2 * size, height=size, max_steps=20 * size * size, **kwargs
+            mission_space=mission_space,
+            width=2 * size,
+            height=size,
+            max_steps=20 * size * size,
+            **kwargs
         )
 
     def _gen_grid(self, width, height):
@@ -63,15 +68,3 @@ class RedBlueDoorEnv(MiniGridEnv):
                 terminated = True
 
         return obs, reward, terminated, truncated, info
-
-
-register(
-    id="MiniGrid-RedBlueDoors-6x6-v0",
-    entry_point="gym_minigrid.envs.redbluedoors:RedBlueDoorEnv",
-    size=6,
-)
-
-register(
-    id="MiniGrid-RedBlueDoors-8x8-v0",
-    entry_point="gym_minigrid.envs.redbluedoors:RedBlueDoorEnv",
-)

+ 3 - 4
gym_minigrid/envs/unlock.py

@@ -1,4 +1,4 @@
-from gym_minigrid.register import register
+from gym_minigrid.minigrid import MissionSpace
 from gym_minigrid.roomgrid import RoomGrid
 
 
@@ -9,7 +9,9 @@ class UnlockEnv(RoomGrid):
 
     def __init__(self, **kwargs):
         room_size = 6
+        mission_space = MissionSpace(mission_func=lambda: "open the door")
         super().__init__(
+            mission_space=mission_space,
             num_rows=1,
             num_cols=2,
             room_size=room_size,
@@ -39,6 +41,3 @@ class UnlockEnv(RoomGrid):
                 terminated = True
 
         return obs, reward, terminated, truncated, info
-
-
-register(id="MiniGrid-Unlock-v0", entry_point="gym_minigrid.envs.unlock:UnlockEnv")

+ 6 - 7
gym_minigrid/envs/unlockpickup.py

@@ -1,4 +1,4 @@
-from gym_minigrid.register import register
+from gym_minigrid.minigrid import COLOR_NAMES, MissionSpace
 from gym_minigrid.roomgrid import RoomGrid
 
 
@@ -9,7 +9,12 @@ class UnlockPickupEnv(RoomGrid):
 
     def __init__(self, **kwargs):
         room_size = 6
+        mission_space = MissionSpace(
+            mission_func=lambda color: f"pick up the {color} box",
+            ordered_placeholders=[COLOR_NAMES],
+        )
         super().__init__(
+            mission_space=mission_space,
             num_rows=1,
             num_cols=2,
             room_size=room_size,
@@ -41,9 +46,3 @@ class UnlockPickupEnv(RoomGrid):
                 terminated = True
 
         return obs, reward, terminated, truncated, info
-
-
-register(
-    id="MiniGrid-UnlockPickup-v0",
-    entry_point="gym_minigrid.envs.unlockpickup:UnlockPickupEnv",
-)

+ 4 - 5
manual_control.py

@@ -10,14 +10,12 @@ from gym_minigrid.wrappers import ImgObsWrapper, RGBImgPartialObsWrapper
 
 def redraw(img):
     if not args.agent_view:
-        img = env.render(tile_size=args.tile_size)
-
+        img = env.render(mode="rgb_array", tile_size=args.tile_size)
     window.show_img(img)
 
 
 def reset():
-    seed = None if args.seed == -1 else args.seed
-    obs = env.reset(seed=seed)
+    obs = env.reset()
 
     if hasattr(env, "mission"):
         print("Mission: %s" % env.mission)
@@ -96,7 +94,8 @@ parser.add_argument(
 
 args = parser.parse_args()
 
-env = gym.make(args.env, render_mode="rgb_array", new_step_api=True)
+seed = None if args.seed == -1 else args.seed
+env = gym.make(args.env, seed=seed, new_step_api=True)
 
 if args.agent_view:
     env = RGBImgPartialObsWrapper(env)

+ 214 - 27
gym_minigrid/minigrid.py

@@ -1,12 +1,13 @@
 import hashlib
 import math
-import string
 from abc import abstractmethod
 from enum import IntEnum
+from typing import Any, Callable, Optional, Union
 
 import gym
 import numpy as np
 from gym import spaces
+from gym.utils import seeding
 
 # Size in pixels of a tile in the full-scale human view
 from gym_minigrid.rendering import (
@@ -77,6 +78,197 @@ DIR_TO_VEC = [
 ]
 
 
+def check_if_no_duplicate(duplicate_list: list) -> bool:
+    """Check if given list contains any duplicates"""
+    return len(set(duplicate_list)) == len(duplicate_list)
+
+
+class MissionSpace(spaces.Space[str]):
+    r"""A space representing a mission for the Gym-Minigrid environments.
+    The space allows generating random mission strings constructed with an input placeholder list.
+    Example Usage::
+        >>> observation_space = MissionSpace(mission_func=lambda color: f"Get the {color} ball.",
+                                                ordered_placeholders=[["green", "blue"]])
+        >>> observation_space.sample()
+            "Get the green ball."
+        >>> observation_space = MissionSpace(mission_func=lambda : "Get the ball.".,
+                                                ordered_placeholders=None)
+        >>> observation_space.sample()
+            "Get the ball."
+    """
+
+    def __init__(
+        self,
+        mission_func: Callable[..., str],
+        ordered_placeholders: Optional["list[list[str]]"] = None,
+        seed: Optional[Union[int, seeding.RandomNumberGenerator]] = None,
+    ):
+        r"""Constructor of :class:`MissionSpace` space.
+
+        Args:
+            mission_func (lambda _placeholders(str): _mission(str)): Function that generates a mission string from random placeholders.
+            ordered_placeholders (Optional["list[list[str]]"]): List of lists of placeholders ordered in placing order in the mission function mission_func.
+            seed: seed: The seed for sampling from the space.
+        """
+        # Check that the ordered placeholders and mission function are well defined.
+        if ordered_placeholders is not None:
+            assert (
+                len(ordered_placeholders) == mission_func.__code__.co_argcount
+            ), f"The number of placeholders {len(ordered_placeholders)} is different from the number of parameters in the mission function {mission_func.__code__.co_argcount}."
+            for placeholder_list in ordered_placeholders:
+                assert check_if_no_duplicate(
+                    placeholder_list
+                ), "Make sure that the placeholders don't have any duplicate values."
+        else:
+            assert (
+                mission_func.__code__.co_argcount == 0
+            ), f"If the ordered placeholders are {ordered_placeholders}, the mission function shouldn't have any parameters."
+
+        self.ordered_placeholders = ordered_placeholders
+        self.mission_func = mission_func
+
+        super().__init__(dtype=str, seed=seed)
+
+        # Check that mission_func returns a string
+        sampled_mission = self.sample()
+        assert isinstance(
+            sampled_mission, str
+        ), f"mission_func must return type str not {type(sampled_mission)}"
+
+    def sample(self) -> str:
+        """Sample a random mission string."""
+        if self.ordered_placeholders is not None:
+            placeholders = []
+            for rand_var_list in self.ordered_placeholders:
+                idx = self.np_random.integers(0, len(rand_var_list))
+
+                placeholders.append(rand_var_list[idx])
+
+            return self.mission_func(*placeholders)
+        else:
+            return self.mission_func()
+
+    def contains(self, x: Any) -> bool:
+        """Return boolean specifying if x is a valid member of this space."""
+        # Store a list of all the placeholders from self.ordered_placeholders that appear in x
+        if self.ordered_placeholders is not None:
+            check_placeholder_list = []
+            for placeholder_list in self.ordered_placeholders:
+                for placeholder in placeholder_list:
+                    if placeholder in x:
+                        check_placeholder_list.append(placeholder)
+
+            # Remove duplicates from the list
+            check_placeholder_list = list(set(check_placeholder_list))
+
+            start_id_placeholder = []
+            end_id_placeholder = []
+            # Get the starting and ending id of the identified placeholders with possible duplicates
+            new_check_placeholder_list = []
+            for placeholder in check_placeholder_list:
+                new_start_id_placeholder = [
+                    i for i in range(len(x)) if x.startswith(placeholder, i)
+                ]
+                new_check_placeholder_list += [placeholder] * len(
+                    new_start_id_placeholder
+                )
+                end_id_placeholder += [
+                    start_id + len(placeholder) - 1
+                    for start_id in new_start_id_placeholder
+                ]
+                start_id_placeholder += new_start_id_placeholder
+
+            # Order by starting id the placeholders
+            ordered_placeholder_list = sorted(
+                zip(
+                    start_id_placeholder, end_id_placeholder, new_check_placeholder_list
+                )
+            )
+
+            # Check for repeated placeholders contained in each other
+            remove_placeholder_id = []
+            for i, placeholder_1 in enumerate(ordered_placeholder_list):
+                starting_id = i + 1
+                for j, placeholder_2 in enumerate(
+                    ordered_placeholder_list[starting_id:]
+                ):
+                    # Check if place holder ids overlap and keep the longest
+                    if max(placeholder_1[0], placeholder_2[0]) < min(
+                        placeholder_1[1], placeholder_2[1]
+                    ):
+                        remove_placeholder = min(
+                            placeholder_1[2], placeholder_2[2], key=len
+                        )
+                        if remove_placeholder == placeholder_1[2]:
+                            remove_placeholder_id.append(i)
+                        else:
+                            remove_placeholder_id.append(i + j + 1)
+            for id in remove_placeholder_id:
+                del ordered_placeholder_list[id]
+
+            final_placeholders = [
+                placeholder[2] for placeholder in ordered_placeholder_list
+            ]
+
+            # Check that the identified final placeholders are in the same order as the original placeholders.
+            for orered_placeholder, final_placeholder in zip(
+                self.ordered_placeholders, final_placeholders
+            ):
+                if final_placeholder in orered_placeholder:
+                    continue
+                else:
+                    return False
+            try:
+                mission_string_with_placeholders = self.mission_func(
+                    *final_placeholders
+                )
+            except Exception as e:
+                print(
+                    f"{x} is not contained in MissionSpace due to the following exception: {e}"
+                )
+                return False
+
+            return bool(mission_string_with_placeholders == x)
+
+        else:
+            return bool(self.mission_func() == x)
+
+    def __repr__(self) -> str:
+        """Gives a string representation of this space."""
+        return f"MissionSpace({self.mission_func}, {self.ordered_placeholders})"
+
+    def __eq__(self, other) -> bool:
+        """Check whether ``other`` is equivalent to this instance."""
+        if isinstance(other, MissionSpace):
+
+            # Check that place holder lists are the same
+            if self.ordered_placeholders is not None:
+                # Check length
+                if (len(self.order_placeholder) == len(other.order_placeholder)) and (
+                    all(
+                        set(i) == set(j)
+                        for i, j in zip(self.order_placeholder, other.order_placeholder)
+                    )
+                ):
+                    # Check mission string is the same with dummy space placeholders
+                    test_placeholders = [""] * len(self.order_placeholder)
+                    mission = self.mission_func(*test_placeholders)
+                    other_mission = other.mission_func(*test_placeholders)
+                    return mission == other_mission
+            else:
+
+                # Check that other is also None
+                if other.ordered_placeholders is None:
+
+                    # Check mission string is the same
+                    mission = self.mission_func()
+                    other_mission = other.mission_func()
+                    return mission == other_mission
+
+        # If none of the statements above return then False
+        return False
+
+
 class WorldObj:
     """
     Base class for grid world objects
@@ -259,9 +451,7 @@ class Door(WorldObj):
             state = 1
         else:
             raise ValueError(
-                "There is no possible state encoding for the state:\n -Door Open: {}\n -Door Closed: {}\n -Door Locked: {}".format(
-                    self.is_open, not self.is_open, self.is_locked
-                )
+                f"There is no possible state encoding for the state:\n -Door Open: {self.is_open}\n -Door Closed: {not self.is_open}\n -Door Locked: {self.is_locked}"
             )
 
         return (OBJECT_TO_IDX[self.type], COLOR_TO_IDX[self.color], state)
@@ -638,7 +828,7 @@ class MiniGridEnv(gym.Env):
         # Deprecated: use 'render_modes' instead
         "render.modes": ["human", "rgb_array"],
         "video.frames_per_second": 10,  # Deprecated: use 'render_fps' instead
-        "render_modes": ["human", "rgb_array"],
+        "render_modes": ["human", "rgb_array", "single_rgb_array"],
         "render_fps": 10,
     }
 
@@ -661,15 +851,21 @@ class MiniGridEnv(gym.Env):
 
     def __init__(
         self,
+        mission_space: MissionSpace,
         grid_size: int = None,
         width: int = None,
         height: int = None,
         max_steps: int = 100,
         see_through_walls: bool = False,
         agent_view_size: int = 7,
-        render_mode: str = None,
+        highlight: bool = True,
+        tile_size: int = TILE_PIXELS,
         **kwargs,
     ):
+
+        # Initialize mission
+        self.mission = mission_space.sample()
+
         # Can't set both grid_size and width/height
         if grid_size:
             assert width is None and height is None
@@ -689,7 +885,7 @@ class MiniGridEnv(gym.Env):
 
         # Observations are dictionaries containing an
         # encoding of the grid and a textual 'mission' string
-        self.observation_space = spaces.Box(
+        image_observation_space = spaces.Box(
             low=0,
             high=255,
             shape=(self.agent_view_size, self.agent_view_size, 3),
@@ -697,18 +893,12 @@ class MiniGridEnv(gym.Env):
         )
         self.observation_space = spaces.Dict(
             {
-                "image": self.observation_space,
+                "image": image_observation_space,
                 "direction": spaces.Discrete(4),
-                "mission": spaces.Text(
-                    max_length=200,
-                    charset=string.ascii_letters + string.digits + " .,!-",
-                ),
+                "mission": mission_space,
             }
         )
 
-        # render mode
-        self.render_mode = render_mode
-
         # Range of possible rewards
         self.reward_range = (0, 1)
 
@@ -761,7 +951,11 @@ class MiniGridEnv(gym.Env):
 
         # Return first observation
         obs = self.gen_obs()
-        return obs
+
+        if not return_info:
+            return obs
+        else:
+            return obs, {}
 
     def hash(self, size=16):
         """Compute a hash that uniquely identifies the current state of the environment.
@@ -1252,17 +1446,11 @@ class MiniGridEnv(gym.Env):
 
         return img
 
-    def render(self, mode="human", close=False, highlight=True, tile_size=TILE_PIXELS):
+    def render(self, mode="human", highlight=True, tile_size=TILE_PIXELS):
+        assert mode in self.metadata["render_modes"]
         """
         Render the whole-grid human view
         """
-        if self.render_mode is not None:
-            mode = self.render_mode
-        if close:
-            if self.window:
-                self.window.close()
-            return
-
         if mode == "human" and not self.window:
             self.window = Window("gym_minigrid")
             self.window.show(block=False)
@@ -1313,10 +1501,9 @@ class MiniGridEnv(gym.Env):
             assert self.window is not None
             self.window.set_caption(self.mission)
             self.window.show_img(img)
-
-        return img
+        else:
+            return img
 
     def close(self):
         if self.window:
             self.window.close()
-        return

+ 0 - 16
gym_minigrid/register.py

@@ -1,16 +0,0 @@
-from gym.envs.registration import register as gym_register
-
-env_list = []
-
-
-def register(id, entry_point, reward_threshold=0.95, **kwargs):
-    assert id.startswith("MiniGrid-")
-    assert id not in env_list
-
-    # Register the environment with OpenAI gym
-    gym_register(
-        id=id, entry_point=entry_point, reward_threshold=reward_threshold, kwargs=kwargs
-    )
-
-    # Add the environment to the set
-    env_list.append(id)

+ 3 - 1
gym_minigrid/roomgrid.py

@@ -208,7 +208,9 @@ class RoomGrid(MiniGridEnv):
         elif kind == "box":
             obj = Box(color)
         else:
-            raise f"{kind} object kind is not available in this environment."
+            raise ValueError(
+                f"{kind} object kind is not available in this environment."
+            )
 
         return self.place_in_room(i, j, obj)
 

+ 21 - 16
gym_minigrid/wrappers.py

@@ -5,11 +5,12 @@ from functools import reduce
 import gym
 import numpy as np
 from gym import spaces
+from gym.core import ObservationWrapper, Wrapper
 
 from gym_minigrid.minigrid import COLOR_TO_IDX, OBJECT_TO_IDX, STATE_TO_IDX, Goal
 
 
-class ReseedWrapper(gym.Wrapper):
+class ReseedWrapper(Wrapper):
     """
     Wrapper to always regenerate an environment with the same set of seeds.
     This can be used to force an environment to always keep the same
@@ -26,6 +27,10 @@ class ReseedWrapper(gym.Wrapper):
         self.seed_idx = (self.seed_idx + 1) % len(self.seeds)
         return self.env.reset(seed=seed, **kwargs)
 
+    def step(self, action):
+        obs, reward, done, info = self.env.step(action)
+        return obs, reward, done, info
+
 
 class ActionBonus(gym.Wrapper):
     """
@@ -62,7 +67,7 @@ class ActionBonus(gym.Wrapper):
         return self.env.reset(**kwargs)
 
 
-class StateBonus(gym.Wrapper):
+class StateBonus(Wrapper):
     """
     Adds an exploration bonus based on which positions
     are visited on the grid.
@@ -98,7 +103,7 @@ class StateBonus(gym.Wrapper):
         return self.env.reset(**kwargs)
 
 
-class ImgObsWrapper(gym.Wrapper):
+class ImgObsWrapper(ObservationWrapper):
     """
     Use the image as the only observation output, no language/mission.
     """
@@ -111,7 +116,7 @@ class ImgObsWrapper(gym.Wrapper):
         return obs["image"]
 
 
-class OneHotPartialObsWrapper(gym.Wrapper):
+class OneHotPartialObsWrapper(ObservationWrapper):
     """
     Wrapper to get a one-hot encoding of a partially observable
     agent view as observation.
@@ -151,11 +156,10 @@ class OneHotPartialObsWrapper(gym.Wrapper):
         return {**obs, "image": out}
 
 
-class RGBImgObsWrapper(gym.Wrapper):
+class RGBImgObsWrapper(ObservationWrapper):
     """
     Wrapper to use fully observable RGB image as observation,
     This can be used to have the agent to solve the gridworld in pixel space.
-    To use it, make the unwrapped environment with render_mode='rgb_array'.
     """
 
     def __init__(self, env, tile_size=8):
@@ -176,14 +180,13 @@ class RGBImgObsWrapper(gym.Wrapper):
 
     def observation(self, obs):
         env = self.unwrapped
-        assert env.render_mode == "rgb_array", env.render_mode
 
-        rgb_img = env.render(highlight=False, tile_size=self.tile_size)
+        rgb_img = env.render(mode="rgb_array", highlight=True, tile_size=self.tile_size)
 
         return {**obs, "image": rgb_img}
 
 
-class RGBImgPartialObsWrapper(gym.Wrapper):
+class RGBImgPartialObsWrapper(ObservationWrapper):
     """
     Wrapper to use partially observable RGB image as observation.
     This can be used to have the agent to solve the gridworld in pixel space.
@@ -214,7 +217,7 @@ class RGBImgPartialObsWrapper(gym.Wrapper):
         return {**obs, "image": rgb_img_partial}
 
 
-class FullyObsWrapper(gym.Wrapper):
+class FullyObsWrapper(ObservationWrapper):
     """
     Fully observable gridworld using a compact grid encoding
     """
@@ -243,7 +246,7 @@ class FullyObsWrapper(gym.Wrapper):
         return {**obs, "image": full_grid}
 
 
-class DictObservationSpaceWrapper(gym.Wrapper):
+class DictObservationSpaceWrapper(ObservationWrapper):
     """
     Transforms the observation space (that has a textual component) to a fully numerical observation space,
     where the textual instructions are replaced by arrays representing the indices of each word in a fixed vocabulary.
@@ -361,7 +364,7 @@ class DictObservationSpaceWrapper(gym.Wrapper):
         return obs
 
 
-class FlatObsWrapper(gym.Wrapper):
+class FlatObsWrapper(ObservationWrapper):
     """
     Encode mission strings using a one-hot scheme,
     and combine these with observed images into one flat array
@@ -371,7 +374,7 @@ class FlatObsWrapper(gym.Wrapper):
         super().__init__(env)
 
         self.maxStrLen = maxStrLen
-        self.numCharCodes = 27
+        self.numCharCodes = 28
 
         imgSpace = env.observation_space.spaces["image"]
         imgSize = reduce(operator.mul, imgSpace.shape, 1)
@@ -405,6 +408,8 @@ class FlatObsWrapper(gym.Wrapper):
                     chNo = ord(ch) - ord("a")
                 elif ch == " ":
                     chNo = ord("z") - ord("a") + 1
+                elif ch == ",":
+                    chNo = ord("z") - ord("a") + 2
                 else:
                     raise ValueError(
                         f"Character {ch} is not available in mission string."
@@ -420,7 +425,7 @@ class FlatObsWrapper(gym.Wrapper):
         return obs
 
 
-class ViewSizeWrapper(gym.Wrapper):
+class ViewSizeWrapper(Wrapper):
     """
     Wrapper to customize the agent field of view size.
     This cannot be used with fully observable wrappers.
@@ -455,7 +460,7 @@ class ViewSizeWrapper(gym.Wrapper):
         return {**obs, "image": image}
 
 
-class DirectionObsWrapper(gym.Wrapper):
+class DirectionObsWrapper(ObservationWrapper):
     """
     Provides the slope/angular direction to the goal with the observations as modeled by (y2 - y2 )/( x2 - x1)
     type = {slope , angle}
@@ -489,7 +494,7 @@ class DirectionObsWrapper(gym.Wrapper):
         return obs
 
 
-class SymbolicObsWrapper(gym.Wrapper):
+class SymbolicObsWrapper(ObservationWrapper):
     """
     Fully observable grid with a symbolic state representation.
     The symbol is a triple of (X, Y, IDX), where X and Y are

+ 3 - 2
pyproject.toml

@@ -7,8 +7,6 @@ include = [
 exclude = [
     "**/node_modules",
     "**/__pycache__",
-
-   #"gym_minigrid/**",
 ]
 
 strict = [
@@ -33,3 +31,6 @@ reportUntypedFunctionDecorator = "none"
 reportMissingTypeStubs = false
 reportUnboundVariable = "warning"
 reportGeneralTypeIssues ="none"
+
+[tool.pytest.ini_options]
+filterwarnings = ['ignore:.*step API.*:DeprecationWarning'] # TODO: to be removed when old step API is removed

+ 1 - 1
requirements.txt

@@ -1,3 +1,3 @@
 numpy>=1.18.0
-gym>=0.25
+gym>=0.22,<=0.26
 matplotlib>=3.0

+ 0 - 229
run_tests.py

@@ -1,229 +0,0 @@
-#!/usr/bin/env python3
-
-import random
-
-import gym
-import numpy as np
-from gym import spaces
-
-from gym_minigrid.envs.empty import EmptyEnv
-from gym_minigrid.minigrid import Grid
-from gym_minigrid.register import env_list
-from gym_minigrid.wrappers import (
-    DictObservationSpaceWrapper,
-    FlatObsWrapper,
-    FullyObsWrapper,
-    ImgObsWrapper,
-    OneHotPartialObsWrapper,
-    ReseedWrapper,
-    RGBImgObsWrapper,
-    RGBImgPartialObsWrapper,
-    ViewSizeWrapper,
-)
-
-# Test importing wrappers
-
-
-print("%d environments registered" % len(env_list))
-
-for env_idx, env_name in enumerate(env_list):
-    print(f"testing {env_name} ({env_idx + 1}/{len(env_list)})")
-
-    # Load the gym environment
-    env = gym.make(env_name, render_mode="rgb_array", new_step_api=True)
-    env.max_steps = min(env.max_steps, 200)
-    env.reset()
-    env.render()
-
-    # Verify that the same seed always produces the same environment
-    for i in range(0, 5):
-        seed = 1337 + i
-        _ = env.reset(seed=seed)
-        grid1 = env.grid
-        _ = env.reset(seed=seed)
-        grid2 = env.grid
-        assert grid1 == grid2
-
-    env.reset()
-
-    # Run for a few episodes
-    num_episodes = 0
-    while num_episodes < 5:
-        # Pick a random action
-        action = random.randint(0, env.action_space.n - 1)
-
-        obs, reward, terminated, truncated, info = env.step(action)
-
-        # Validate the agent position
-        assert env.agent_pos[0] < env.width
-        assert env.agent_pos[1] < env.height
-
-        # Test observation encode/decode roundtrip
-        img = obs["image"]
-        grid, vis_mask = Grid.decode(img)
-        img2 = grid.encode(vis_mask=vis_mask)
-        assert np.array_equal(img, img2)
-
-        # Test the env to string function
-        str(env)
-
-        # Check that the reward is within the specified range
-        assert reward >= env.reward_range[0], reward
-        assert reward <= env.reward_range[1], reward
-
-        if terminated or truncated:
-            num_episodes += 1
-            env.reset()
-
-        env.render()
-
-    # Test the close method
-    env.close()
-
-    env = gym.make(env_name, new_step_api=True)
-    env = ReseedWrapper(env)
-    for _ in range(10):
-        env.reset()
-        env.step(0)
-        env.close()
-
-    env = gym.make(env_name, new_step_api=True)
-    env = ImgObsWrapper(env)
-    env.reset()
-    env.step(0)
-    env.close()
-
-    # Test the fully observable wrapper
-    env = gym.make(env_name, new_step_api=True)
-    env = FullyObsWrapper(env)
-    env.reset()
-    obs, _, _, _, _ = env.step(0)
-    assert obs["image"].shape == env.observation_space.spaces["image"].shape
-    env.close()
-
-    # RGB image observation wrapper
-    env = gym.make(env_name, new_step_api=True)
-    env = RGBImgPartialObsWrapper(env)
-    env.reset()
-    obs, _, _, _, _ = env.step(0)
-    assert obs["image"].mean() > 0
-    env.close()
-
-    env = gym.make(env_name, new_step_api=True)
-    env = FlatObsWrapper(env)
-    env.reset()
-    env.step(0)
-    env.close()
-
-    env = gym.make(env_name, new_step_api=True)
-    env = ViewSizeWrapper(env, 5)
-    env.reset()
-    env.step(0)
-    env.close()
-
-    # Test the DictObservationSpaceWrapper
-    env = gym.make(env_name, new_step_api=True)
-    env = DictObservationSpaceWrapper(env)
-    env.reset()
-    mission = env.mission
-    obs, _, _, _, _ = env.step(0)
-    assert env.string_to_indices(mission) == [
-        value for value in obs["mission"] if value != 0
-    ]
-    env.close()
-
-    # Test the wrappers return proper observation spaces.
-    wrappers = [RGBImgObsWrapper, RGBImgPartialObsWrapper, OneHotPartialObsWrapper]
-    for wrapper in wrappers:
-        env = wrapper(gym.make(env_name, render_mode="rgb_array", new_step_api=True))
-        obs_space, wrapper_name = env.observation_space, wrapper.__name__
-        assert isinstance(
-            obs_space, spaces.Dict
-        ), f"Observation space for {wrapper_name} is not a Dict: {obs_space}."
-        # This should not fail either
-        ImgObsWrapper(env)
-        env.reset()
-        env.step(0)
-        env.close()
-
-##############################################################################
-
-print("testing extra observations")
-
-
-class EmptyEnvWithExtraObs(EmptyEnv):
-    """
-    Custom environment with an extra observation
-    """
-
-    def __init__(self, **kwargs) -> None:
-        super().__init__(size=5, **kwargs)
-        self.observation_space["size"] = spaces.Box(
-            low=0,
-            high=1000,  # gym does not like np.iinfo(np.uint).max,
-            shape=(2,),
-            dtype=np.uint,
-        )
-
-    def reset(self, **kwargs):
-        obs = super().reset(**kwargs)
-        obs["size"] = np.array([self.width, self.height], dtype=np.uint)
-        return obs
-
-    def step(self, action):
-        obs, reward, terminated, truncated, info = super().step(action)
-        obs["size"] = np.array([self.width, self.height], dtype=np.uint)
-        return obs, reward, terminated, truncated, info
-
-
-wrappers = [
-    OneHotPartialObsWrapper,
-    RGBImgObsWrapper,
-    RGBImgPartialObsWrapper,
-    FullyObsWrapper,
-]
-for wrapper in wrappers:
-    env1 = wrapper(EmptyEnvWithExtraObs(render_mode="rgb_array"))
-    env2 = wrapper(
-        gym.make("MiniGrid-Empty-5x5-v0", render_mode="rgb_array", new_step_api=True)
-    )
-
-    obs1 = env1.reset(seed=0)
-    obs2 = env2.reset(seed=0)
-    assert "size" in obs1
-    assert obs1["size"].shape == (2,)
-    assert (obs1["size"] == [5, 5]).all()
-    for key in obs2:
-        assert np.array_equal(obs1[key], obs2[key])
-
-    obs1, reward1, terminated1, truncated1, _ = env1.step(0)
-    obs2, reward2, terminated2, truncated2, _ = env2.step(0)
-    assert "size" in obs1
-    assert obs1["size"].shape == (2,)
-    assert (obs1["size"] == [5, 5]).all()
-    for key in obs2:
-        assert np.array_equal(obs1[key], obs2[key])
-
-##############################################################################
-
-print("testing agent_sees method")
-env = gym.make("MiniGrid-DoorKey-6x6-v0", new_step_api=True)
-goal_pos = (env.grid.width - 2, env.grid.height - 2)
-
-# Test the "in" operator on grid objects
-assert ("green", "goal") in env.grid
-assert ("blue", "key") not in env.grid
-
-# Test the env.agent_sees() function
-env.reset()
-for i in range(0, 500):
-    action = random.randint(0, env.action_space.n - 1)
-    obs, reward, terminated, truncated, info = env.step(action)
-
-    grid, _ = Grid.decode(obs["image"])
-    goal_visible = ("green", "goal") in grid
-
-    agent_sees_goal = env.agent_sees(*goal_pos)
-    assert agent_sees_goal == goal_visible
-    if terminated or truncated:
-        env.reset()

+ 5 - 3
setup.py

@@ -21,7 +21,6 @@ setup(
     classifiers=[
         "Development Status :: 5 - Production/Stable",
         "Programming Language :: Python :: 3",
-        "Programming Language :: Python :: 3.6",
         "Programming Language :: Python :: 3.7",
         "Programming Language :: Python :: 3.8",
         "Programming Language :: Python :: 3.9",
@@ -33,14 +32,17 @@ setup(
     description="Minimalistic gridworld reinforcement learning environments",
     extras_require=extras,
     packages=["gym_minigrid", "gym_minigrid.envs"],
+    entry_points={
+        "gym.envs": ["__root__ = gym_minigrid.__init__:register_minigrid_envs"]
+    },
     license="Apache",
     long_description=long_description,
     long_description_content_type="text/markdown",
     install_requires=[
-        "gym>=0.25.0",
+        "gym>=0.22,<=0.26",
         "numpy>=1.18.0",
         "matplotlib>=3.0",
     ],
-    python_requires=">=3.6",
+    python_requires=">=3.7",
     tests_require=extras["testing"],
 )

+ 0 - 25
test_interactive_mode.py

@@ -1,25 +0,0 @@
-#!/usr/bin/env python3
-
-import random
-import time
-
-import gym
-
-# Load the gym environment
-env = gym.make("MiniGrid-Empty-8x8-v0", new_step_api=True)
-env.reset()
-
-for i in range(0, 100):
-    print(f"step {i}")
-
-    # Pick a random action
-    action = random.randint(0, env.action_space.n - 1)
-
-    obs, reward, terminated, truncated, info = env.step(action)
-
-    env.render()
-
-    time.sleep(0.05)
-
-# Test the close method
-env.close()

+ 0 - 0
tests/__init__.py


+ 254 - 0
tests/test_envs.py

@@ -0,0 +1,254 @@
+import gym
+import numpy as np
+import pytest
+from gym.envs.registration import EnvSpec
+from gym.utils.env_checker import check_env
+
+from gym_minigrid.minigrid import Grid, MissionSpace
+from tests.utils import all_testing_env_specs, assert_equals
+
+CHECK_ENV_IGNORE_WARNINGS = [
+    f"\x1b[33mWARN: {message}\x1b[0m"
+    for message in [
+        "A Box observation space minimum value is -infinity. This is probably too low.",
+        "A Box observation space maximum value is -infinity. This is probably too high.",
+        "For Box action spaces, we recommend using a symmetric and normalized space (range=[-1, 1] or [0, 1]). See https://stable-baselines3.readthedocs.io/en/master/guide/rl_tips.html for more information.",
+        "Initializing wrapper in old step API which returns one bool instead of two. It is recommended to set `new_step_api=True` to use new step API. This will be the default behaviour in future.",
+        "Initializing environment in old step API which returns one bool instead of two. It is recommended to set `new_step_api=True` to use new step API. This will be the default behaviour in future.",
+        "Core environment is written in old step API which returns one bool instead of two. It is recommended to rewrite the environment with new step API. ",
+    ]
+]
+
+
+@pytest.mark.parametrize(
+    "spec", all_testing_env_specs, ids=[spec.id for spec in all_testing_env_specs]
+)
+def test_env(spec):
+    # Capture warnings
+    env = spec.make(disable_env_checker=True).unwrapped
+
+    # Test if env adheres to Gym API
+    with pytest.warns() as warnings:
+        check_env(env)
+
+    for warning in warnings.list:
+        if warning.message.args[0] not in CHECK_ENV_IGNORE_WARNINGS:
+            raise gym.error.Error(f"Unexpected warning: {warning.message}")
+
+
+# Note that this precludes running this test in multiple threads.
+# However, we probably already can't do multithreading due to some environments.
+SEED = 0
+NUM_STEPS = 50
+
+
+@pytest.mark.parametrize(
+    "env_spec", all_testing_env_specs, ids=[env.id for env in all_testing_env_specs]
+)
+def test_env_determinism_rollout(env_spec: EnvSpec):
+    """Run a rollout with two environments and assert equality.
+
+    This test run a rollout of NUM_STEPS steps with two environments
+    initialized with the same seed and assert that:
+
+    - observation after first reset are the same
+    - same actions are sampled by the two envs
+    - observations are contained in the observation space
+    - obs, rew, done and info are equals between the two envs
+    """
+    # Don't check rollout equality if it's a nondeterministic environment.
+    if env_spec.nondeterministic is True:
+        return
+
+    env_1 = env_spec.make(disable_env_checker=True)
+    env_2 = env_spec.make(disable_env_checker=True)
+
+    initial_obs_1 = env_1.reset(seed=SEED)
+    initial_obs_2 = env_2.reset(seed=SEED)
+    assert_equals(initial_obs_1, initial_obs_2)
+
+    env_1.action_space.seed(SEED)
+
+    for time_step in range(NUM_STEPS):
+        # We don't evaluate the determinism of actions
+        action = env_1.action_space.sample()
+
+        obs_1, rew_1, done_1, info_1 = env_1.step(action)
+        obs_2, rew_2, done_2, info_2 = env_2.step(action)
+
+        assert_equals(obs_1, obs_2, f"[{time_step}] ")
+        assert env_1.observation_space.contains(
+            obs_1
+        )  # obs_2 verified by previous assertion
+
+        assert rew_1 == rew_2, f"[{time_step}] reward 1={rew_1}, reward 2={rew_2}"
+        assert done_1 == done_2, f"[{time_step}] done 1={done_1}, done 2={done_2}"
+        assert_equals(info_1, info_2, f"[{time_step}] ")
+
+        if done_1:  # done_2 verified by previous assertion
+            env_1.reset(seed=SEED)
+            env_2.reset(seed=SEED)
+
+    env_1.close()
+    env_2.close()
+
+
+@pytest.mark.parametrize(
+    "spec", all_testing_env_specs, ids=[spec.id for spec in all_testing_env_specs]
+)
+def test_render_modes(spec):
+    env = spec.make()
+
+    for mode in env.metadata.get("render_modes", []):
+        if mode != "human":
+            new_env = spec.make()
+
+            new_env.reset()
+            new_env.step(new_env.action_space.sample())
+            new_env.render(mode=mode)
+
+
+@pytest.mark.parametrize("env_id", ["MiniGrid-DoorKey-6x6-v0"])
+def test_agent_sees_method(env_id):
+    env = gym.make(env_id)
+    goal_pos = (env.grid.width - 2, env.grid.height - 2)
+
+    # Test the "in" operator on grid objects
+    assert ("green", "goal") in env.grid
+    assert ("blue", "key") not in env.grid
+
+    # Test the env.agent_sees() function
+    env.reset()
+    for i in range(0, 500):
+        action = env.action_space.sample()
+        obs, reward, done, info = env.step(action)
+
+        grid, _ = Grid.decode(obs["image"])
+        goal_visible = ("green", "goal") in grid
+
+        agent_sees_goal = env.agent_sees(*goal_pos)
+        assert agent_sees_goal == goal_visible
+        if done:
+            env.reset()
+
+    env.close()
+
+
+@pytest.mark.parametrize(
+    "env_spec", all_testing_env_specs, ids=[spec.id for spec in all_testing_env_specs]
+)
+def old_run_test(env_spec):
+    # Load the gym environment
+    env = env_spec.make()
+    env.max_steps = min(env.max_steps, 200)
+    env.reset()
+    env.render()
+
+    # Verify that the same seed always produces the same environment
+    for i in range(0, 5):
+        seed = 1337 + i
+        _ = env.reset(seed=seed)
+        grid1 = env.grid
+        _ = env.reset(seed=seed)
+        grid2 = env.grid
+        assert grid1 == grid2
+
+    env.reset()
+
+    # Run for a few episodes
+    num_episodes = 0
+    while num_episodes < 5:
+        # Pick a random action
+        action = env.action_space.sample()
+
+        obs, reward, done, info = env.step(action)
+
+        # Validate the agent position
+        assert env.agent_pos[0] < env.width
+        assert env.agent_pos[1] < env.height
+
+        # Test observation encode/decode roundtrip
+        img = obs["image"]
+        grid, vis_mask = Grid.decode(img)
+        img2 = grid.encode(vis_mask=vis_mask)
+        assert np.array_equal(img, img2)
+
+        # Test the env to string function
+        str(env)
+
+        # Check that the reward is within the specified range
+        assert reward >= env.reward_range[0], reward
+        assert reward <= env.reward_range[1], reward
+
+        if done:
+            num_episodes += 1
+            env.reset()
+
+        env.render()
+
+    # Test the close method
+    env.close()
+
+
+@pytest.mark.parametrize("env_id", ["MiniGrid-Empty-8x8-v0"])
+def test_interactive_mode(env_id):
+    env = gym.make(env_id)
+    env.reset()
+
+    for i in range(0, 100):
+        print(f"step {i}")
+
+        # Pick a random action
+        action = env.action_space.sample()
+
+        obs, reward, done, info = env.step(action)
+
+    # Test the close method
+    env.close()
+
+
+def test_mission_space():
+
+    # Test placeholders
+    mission_space = MissionSpace(
+        mission_func=lambda color, obj_type: f"Get the {color} {obj_type}.",
+        ordered_placeholders=[["green", "red"], ["ball", "key"]],
+    )
+
+    assert mission_space.contains("Get the green ball.")
+    assert mission_space.contains("Get the red key.")
+    assert not mission_space.contains("Get the purple box.")
+
+    # Test passing inverted placeholders
+    assert not mission_space.contains("Get the key red.")
+
+    # Test passing extra repeated placeholders
+    assert not mission_space.contains("Get the key red key.")
+
+    # Test contained placeholders like "get the" and "go get the". "get the" string is contained in both placeholders.
+    mission_space = MissionSpace(
+        mission_func=lambda get_syntax, obj_type: f"{get_syntax} {obj_type}.",
+        ordered_placeholders=[
+            ["go get the", "get the", "go fetch the", "fetch the"],
+            ["ball", "key"],
+        ],
+    )
+
+    assert mission_space.contains("get the ball.")
+    assert mission_space.contains("go get the key.")
+    assert mission_space.contains("go fetch the ball.")
+
+    # Test repeated placeholders
+    mission_space = MissionSpace(
+        mission_func=lambda get_syntax, color_1, obj_type_1, color_2, obj_type_2: f"{get_syntax} {color_1} {obj_type_1} and the {color_2} {obj_type_2}.",
+        ordered_placeholders=[
+            ["go get the", "get the", "go fetch the", "fetch the"],
+            ["green", "red"],
+            ["ball", "key"],
+            ["green", "red"],
+            ["ball", "key"],
+        ],
+    )
+
+    assert mission_space.contains("get the green key and the green key.")
+    assert mission_space.contains("go fetch the red ball and the green key.")

+ 237 - 0
tests/test_wrappers.py

@@ -0,0 +1,237 @@
+import math
+
+import gym
+import numpy as np
+import pytest
+
+from gym_minigrid.envs import EmptyEnv
+from gym_minigrid.minigrid import MiniGridEnv
+from gym_minigrid.wrappers import (
+    ActionBonus,
+    DictObservationSpaceWrapper,
+    FlatObsWrapper,
+    FullyObsWrapper,
+    ImgObsWrapper,
+    OneHotPartialObsWrapper,
+    ReseedWrapper,
+    RGBImgObsWrapper,
+    RGBImgPartialObsWrapper,
+    StateBonus,
+    ViewSizeWrapper,
+)
+from tests.utils import all_testing_env_specs, assert_equals
+
+SEEDS = [100, 243, 500]
+NUM_STEPS = 100
+
+
+@pytest.mark.parametrize(
+    "env_spec", all_testing_env_specs, ids=[spec.id for spec in all_testing_env_specs]
+)
+def test_reseed_wrapper(env_spec):
+    """
+    Test the ReseedWrapper with a list of SEEDS.
+    """
+    unwrapped_env = env_spec.make()
+    env = env_spec.make()
+    env = ReseedWrapper(env, seeds=SEEDS)
+    env.action_space.seed(0)
+
+    for seed in SEEDS:
+        env.reset()
+        unwrapped_env.reset(seed=seed)
+        for time_step in range(NUM_STEPS):
+            action = env.action_space.sample()
+
+            obs, rew, done, info = env.step(action)
+            (
+                unwrapped_obs,
+                unwrapped_rew,
+                unwrapped_done,
+                unwrapped_info,
+            ) = unwrapped_env.step(action)
+
+            assert_equals(obs, unwrapped_obs, f"[{time_step}] ")
+            assert unwrapped_env.observation_space.contains(obs)
+
+            assert (
+                rew == unwrapped_rew
+            ), f"[{time_step}] reward={rew}, unwrapped reward={unwrapped_rew}"
+            assert (
+                done == unwrapped_done
+            ), f"[{time_step}] done={done}, unwrapped done={unwrapped_done}"
+            assert_equals(info, unwrapped_info, f"[{time_step}] ")
+
+            # Start the next seed
+            if done:
+                break
+
+    env.close()
+    unwrapped_env.close()
+
+
+@pytest.mark.parametrize("env_id", ["MiniGrid-Empty-16x16-v0"])
+def test_state_bonus_wrapper(env_id):
+    env = gym.make(env_id)
+    wrapped_env = StateBonus(gym.make(env_id))
+
+    action_forward = MiniGridEnv.Actions.forward
+    action_left = MiniGridEnv.Actions.left
+    action_right = MiniGridEnv.Actions.right
+
+    for _ in range(10):
+        wrapped_env.reset()
+        for _ in range(5):
+            wrapped_env.step(action_forward)
+
+    # Turn lef 3 times (check that actions don't influence bonus)
+    for _ in range(3):
+        _, wrapped_rew, _, _ = wrapped_env.step(action_left)
+
+    env.reset()
+    for _ in range(5):
+        env.step(action_forward)
+    # Turn right 3 times
+    for _ in range(3):
+        _, rew, _, _ = env.step(action_right)
+
+    expected_bonus_reward = rew + 1 / math.sqrt(13)
+
+    assert expected_bonus_reward == wrapped_rew
+
+
+@pytest.mark.parametrize("env_id", ["MiniGrid-Empty-16x16-v0"])
+def test_action_bonus_wrapper(env_id):
+    env = gym.make(env_id)
+    wrapped_env = ActionBonus(gym.make(env_id))
+
+    action = MiniGridEnv.Actions.forward
+
+    for _ in range(10):
+        wrapped_env.reset()
+        for _ in range(5):
+            _, wrapped_rew, _, _ = wrapped_env.step(action)
+
+    env.reset()
+    for _ in range(5):
+        _, rew, _, _ = env.step(action)
+
+    expected_bonus_reward = rew + 1 / math.sqrt(10)
+
+    assert expected_bonus_reward == wrapped_rew
+
+
+@pytest.mark.parametrize(
+    "env_spec", all_testing_env_specs, ids=[spec.id for spec in all_testing_env_specs]
+)
+def test_dict_observation_space_wrapper(env_spec):
+    env = env_spec.make()
+    env = DictObservationSpaceWrapper(env)
+    env.reset()
+    mission = env.mission
+    obs, _, _, _ = env.step(0)
+    assert env.string_to_indices(mission) == [
+        value for value in obs["mission"] if value != 0
+    ]
+    env.close()
+
+
+@pytest.mark.parametrize(
+    "wrapper",
+    [
+        ReseedWrapper,
+        ImgObsWrapper,
+        FlatObsWrapper,
+        ViewSizeWrapper,
+        DictObservationSpaceWrapper,
+        OneHotPartialObsWrapper,
+        RGBImgPartialObsWrapper,
+        FullyObsWrapper,
+    ],
+)
+@pytest.mark.parametrize(
+    "env_spec", all_testing_env_specs, ids=[spec.id for spec in all_testing_env_specs]
+)
+def test_main_wrappers(wrapper, env_spec):
+    env = env_spec.make()
+    env = wrapper(env)
+    for _ in range(10):
+        env.reset()
+        env.step(0)
+    env.close()
+
+
+@pytest.mark.parametrize(
+    "wrapper",
+    [
+        OneHotPartialObsWrapper,
+        RGBImgPartialObsWrapper,
+        FullyObsWrapper,
+    ],
+)
+@pytest.mark.parametrize(
+    "env_spec", all_testing_env_specs, ids=[spec.id for spec in all_testing_env_specs]
+)
+def test_observation_space_wrappers(wrapper, env_spec):
+    env = wrapper(env_spec.make(disable_env_checker=True))
+    obs_space, wrapper_name = env.observation_space, wrapper.__name__
+    assert isinstance(
+        obs_space, gym.spaces.Dict
+    ), f"Observation space for {wrapper_name} is not a Dict: {obs_space}."
+    # This should not fail either
+    ImgObsWrapper(env)
+    env.reset()
+    env.step(0)
+    env.close()
+
+
+class EmptyEnvWithExtraObs(EmptyEnv):
+    """
+    Custom environment with an extra observation
+    """
+
+    def __init__(self) -> None:
+        super().__init__(size=5)
+        self.observation_space["size"] = gym.spaces.Box(
+            low=0, high=np.iinfo(np.uint).max, shape=(2,), dtype=np.uint
+        )
+
+    def reset(self, **kwargs):
+        obs = super().reset(**kwargs)
+        obs["size"] = np.array([self.width, self.height])
+        return obs
+
+    def step(self, action):
+        obs, reward, done, info = super().step(action)
+        obs["size"] = np.array([self.width, self.height])
+        return obs, reward, done, info
+
+
+@pytest.mark.parametrize(
+    "wrapper",
+    [
+        OneHotPartialObsWrapper,
+        RGBImgObsWrapper,
+        RGBImgPartialObsWrapper,
+        FullyObsWrapper,
+    ],
+)
+def test_agent_sees_method(wrapper):
+    env1 = wrapper(EmptyEnvWithExtraObs())
+    env2 = wrapper(gym.make("MiniGrid-Empty-5x5-v0"))
+
+    obs1 = env1.reset(seed=0)
+    obs2 = env2.reset(seed=0)
+    assert "size" in obs1
+    assert obs1["size"].shape == (2,)
+    assert (obs1["size"] == [5, 5]).all()
+    for key in obs2:
+        assert np.array_equal(obs1[key], obs2[key])
+
+    obs1, reward1, done1, _ = env1.step(0)
+    obs2, reward2, done2, _ = env2.step(0)
+    assert "size" in obs1
+    assert obs1["size"].shape == (2,)
+    assert (obs1["size"] == [5, 5]).all()
+    for key in obs2:
+        assert np.array_equal(obs1[key], obs2[key])

+ 34 - 0
tests/utils.py

@@ -0,0 +1,34 @@
+"""Finds all the specs that we can test with"""
+import gym
+import numpy as np
+
+all_testing_env_specs = [
+    env_spec
+    for env_spec in gym.envs.registry.values()
+    if env_spec.entry_point.startswith("gym_minigrid.envs")
+]
+
+
+def assert_equals(a, b, prefix=None):
+    """Assert equality of data structures `a` and `b`.
+
+    Args:
+        a: first data structure
+        b: second data structure
+        prefix: prefix for failed assertion message for types and dicts
+    """
+    assert type(a) == type(b), f"{prefix}Differing types: {a} and {b}"
+    if isinstance(a, dict):
+        assert list(a.keys()) == list(b.keys()), f"{prefix}Key sets differ: {a} and {b}"
+
+        for k in a.keys():
+            v_a = a[k]
+            v_b = b[k]
+            assert_equals(v_a, v_b)
+    elif isinstance(a, np.ndarray):
+        np.testing.assert_array_equal(a, b)
+    elif isinstance(a, tuple):
+        for elem_from_a, elem_from_b in zip(a, b):
+            assert_equals(elem_from_a, elem_from_b)
+    else:
+        assert a == b