+<h1>MiniGrid-docs<a class="headerlink" href="#minigrid-docs" title="Permalink to this headline">#</a></h1>
+<p>This repo contains the <span class="xref myst">NEW website</span> for <a class="reference external" href="https://github.com/Farama-Foundation/MiniGrid">MiniGrid</a>. This site is currently in Beta and we are in the process of adding/editing information.</p>
+<p>The documentation uses Sphinx. However, the documentation is written in regular md, NOT rst.</p>
+<p>If you are modifying a non-environment page or an atari environment page, please PR this repo. Otherwise, follow the steps below:</p>
+<h2>Instructions for modifying environment pages<a class="headerlink" href="#instructions-for-modifying-environment-pages" title="Permalink to this headline">#</a></h2>
+<section id="editing-an-environment-page">
+<h3>Editing an environment page<a class="headerlink" href="#editing-an-environment-page" title="Permalink to this headline">#</a></h3>
+<p>If you are editing an Atari environment, directly edit the md file in this repository.</p>
+<p>Otherwise, fork Gym and edit the docstring in the environment’s Python file. Then, pip install your Gym fork and run <code class="docutils literal notranslate"><span class="pre">docs/scripts/gen_mds.py</span></code> in this repo. This will automatically generate a md documentation file for the environment.</p>
+</section>
+</section>
+<section id="build-the-documentation">
+<h2>Build the Documentation<a class="headerlink" href="#build-the-documentation" title="Permalink to this headline">#</a></h2>
+<p>Install the required packages and Gym (or your fork):</p>
+ var summary = Search.makeSearchSummary(data, searchterms, hlterms);
+ if (summary) {
+ listItem.append(summary);
+ }
+ }
+ Search.output.append(listItem);
+ setTimeout(function() {
+ displayNextItem();
+ }, 5);
+ }});
+ } else {
+ // no source available, just display title
+ Search.output.append(listItem);
+ setTimeout(function() {
+ displayNextItem();
+ }, 5);
+ }
+ }
+ // search finished, update title and status message
+ else {
+ Search.stopPulse();
+ Search.title.text(_('Search Results'));
+ if (!resultCount)
+ Search.status.text(_('Your search did not match any documents. Please make sure that all words are spelled correctly and that you\'ve selected enough categories.'));
+ else
+ Search.status.text(_('Search finished, found %s page(s) matching the search query.').replace('%s', resultCount));
+<h1>Wrappers<a class="headerlink" href="#wrappers" title="Permalink to this headline">#</a></h1>
+<p>MiniGrid is built to support tasks involving natural language and sparse rewards.
+The observations are dictionaries, with an ‘image’ field, partially observable
+view of the environment, a ‘mission’ field which is a textual string
+describing the objective the agent should reach to get a reward, and a ‘direction’
+field which can be used as an optional compass. Using dictionaries makes it
+easy for you to add additional information to observations
+if you need to, without having to encode everything into a single tensor.</p>
+<p>There are a variety of wrappers to change the observation format available in <span class="xref myst">minigrid/wrappers.py</span>.
+If your RL code expects one single tensor for observations, take a look at <code class="docutils literal notranslate"><span class="pre">FlatObsWrapper</span></code>.
+There is also an <code class="docutils literal notranslate"><span class="pre">ImgObsWrapper</span></code> that gets rid of the ‘mission’ field in observations, leaving only the image field tensor.</p>
+<p>Please note that the default observation format is a partially observable view of the environment using a
+compact and efficient encoding, with 3 input values per visible grid cell, 7x7x3 values total.
+These values are <strong>not pixels</strong>. If you want to obtain an array of RGB pixels as observations instead,
+use the <code class="docutils literal notranslate"><span class="pre">RGBImgPartialObsWrapper</span></code>. You can use it as follows:</p>
+<h1>Installation<a class="headerlink" href="#installation" title="Permalink to this headline">#</a></h1>
+<p>There is now a <a class="reference external" href="https://pypi.org/project/minigrid/">pip package</a> available, which is updated periodically:</p>
+<p>Alternatively, to get the latest version of MiniGrid, you can clone this repository and install the dependencies with <code class="docutils literal notranslate"><span class="pre">pip3</span></code>:</p>
+<h2>List of publications & submissions using MiniGrid or BabyAI (please open a pull request to add missing entries):<a class="headerlink" href="#list-of-publications-submissions-using-minigrid-or-babyai-please-open-a-pull-request-to-add-missing-entries" title="Permalink to this headline">#</a></h2>
+<ul class="simple">
+<li><p><a class="reference external" href="https://proceedings.mlr.press/v162/paischer22a.html">History Compression via Language Models in Reinforcement Learning.</a> (Johannes Kepler University Linz, PMLR 2022)</p></li>
+<li><p><a class="reference external" href="https://arxiv.org/abs/2202.02886">Leveraging Approximate Symbolic Models for Reinforcement Learning via Skill Diversity</a> (Arizona State University, ICML 2022)</p></li>
+<li><p><a class="reference external" href="https://proceedings.mlr.press/v162/mavor-parker22a.html">How to Stay Curious while avoiding Noisy TVs using Aleatoric Uncertainty Estimation</a> (University College London, Boston University, ICML 2022)</p></li>
+<li><p><a class="reference external" href="https://openreview.net/pdf?id=rUwm9wCjURV">In a Nutshell, the Human Asked for This: Latent Goals for Following Temporal Specifications</a> (Imperial College London, ICLR 2022)</p></li>
+<li><p><a class="reference external" href="https://arxiv.org/abs/2111.04894">Safe Policy Optimization with Local Generalized Linear Function Approximations</a> (IBM Research, Tsinghua University, NeurIPS 2021)</p></li>
+<li><p><a class="reference external" href="https://papers.nips.cc/paper/2020/file/ec3183a7f107d1b8dbb90cb3c01ea7d5-Paper.pdf">Information-theoretic Task Selection for Meta-Reinforcement Learning</a> (University of Leeds, NeurIPS 2020)</p></li>
+<li><p><a class="reference external" href="https://arxiv.org/pdf/2012.08621.pdf">BeBold: Exploration Beyond the Boundary of Explored Regions</a> (UCB, December 2020)</p></li>
+<li><p><a class="reference external" href="https://arxiv.org/abs/2010.08843">Approximate Information State for Approximate Planning and Reinforcement Learning in Partially Observed Systems</a> (McGill, October 2020)</p></li>
+<li><p><a class="reference external" href="https://arxiv.org/pdf/2010.03934.pdf">Prioritized Level Replay</a> (FAIR, October 2020)</p></li>
+<li><p><a class="reference external" href="https://arxiv.org/pdf/2008.12760.pdf">AllenAct: A Framework for Embodied AI Research</a> (Allen Institute for AI, August 2020)</p></li>
+<li><p><a class="reference external" href="https://arxiv.org/pdf/1912.05525.pdf">Learning to Request Guidance in Emergent Communication</a> (University of Amsterdam, Dec 2019)</p></li>
+<li><p><a class="reference external" href="https://arxiv.org/abs/1911.07141">Working Memory Graphs</a> (MSR, Nov 2019)</p></li>
+<li><p><a class="reference external" href="https://arxiv.org/pdf/1910.04040.pdf">Fast Task-Adaptation for Tasks Labeled Using Natural Language in Reinforcement Learning</a> (Oct 2019, University of Antwerp)</p></li>
+<li><p><a class="reference external" href="https://arxiv.org/abs/1910.12911">Generalization in Reinforcement Learning with Selective Noise Injection and Information Bottleneck</a> (MSR, NeurIPS, Oct 2019)</p></li>
+<li><p><a class="reference external" href="http://surl.tirl.info/proceedings/SURL-2019_paper_10.pdf">Learning Effective Subgoals with Multi-Task Hierarchical Reinforcement Learning</a> (Tsinghua University, August 2019)</p></li>
+<li><p><a class="reference external" href="https://arxiv.org/abs/1908.05135">Mastering emergent language: learning to guide in simulated navigation</a> (University of Amsterdam, Aug 2019)</p></li>
+<li><p><a class="reference external" href="https://arxiv.org/abs/1906.03574">Transfer Learning by Modeling a Distribution over Policies</a> (Mila, June 2019)</p></li>
+<li><p><a class="reference external" href="https://arxiv.org/abs/1906.10667">Reinforcement Learning with Competitive Ensembles of Information-Constrained Primitives</a> (Mila, June 2019)</p></li>
+<li><p><a class="reference external" href="https://arxiv.org/abs/1905.11589">Learning distant cause and effect using only local and immediate credit assignment</a> (Incubator 491, May 2019)</p></li>
+<li><p><a class="reference external" href="https://arxiv.org/abs/1904.04700">Practical Open-Loop Optimistic Planning</a> (INRIA, April 2019)</p></li>
+<li><p><a class="reference external" href="https://arxiv.org/abs/1907.00664">Learning World Graphs to Accelerate Hierarchical Reinforcement Learning</a> (Salesforce Research, 2019)</p></li>
+<li><p><a class="reference external" href="https://mila.quebec/wp-content/uploads/2019/05/WebPage.pdf">Variational State Encoding as Intrinsic Motivation in Reinforcement Learning</a> (Mila, TARL 2019)</p></li>
+<li><p><a class="reference external" href="https://tarl2019.github.io/assets/papers/modhe2019unsupervised.pdf">Unsupervised Discovery of Decision States Through Intrinsic Control</a> (Georgia Tech, TARL 2019)</p></li>
+<li><p><a class="reference external" href="https://openreview.net/forum?id=SkgQBn0cF7">Modeling the Long Term Future in Model-Based Reinforcement Learning</a> (Mila, ICLR 2019)</p></li>
+<li><p><a class="reference external" href="https://arxiv.org/pdf/1902.10646.pdf">Unifying Ensemble Methods for Q-learning via Social Choice Theory</a> (Max Planck Institute, Feb 2019)</p></li>
+<li><p><a class="reference external" href="https://personalrobotics.cs.washington.edu/workshops/mlmp2018/assets/docs/18_CameraReadySubmission.pdf">Planning Beyond The Sensing Horizon Using a Learned Context</a> (MLMP@IROS, 2018)</p></li>
+<li><p><a class="reference external" href="https://arxiv.org/abs/1811.07882">Guiding Policies with Language via Meta-Learning</a> (UC Berkeley, Nov 2018)</p></li>
+<li><p><a class="reference external" href="https://arxiv.org/abs/1811.06889">On the Complexity of Exploration in Goal-Driven Navigation</a> (CMU, NeurIPS, Nov 2018)</p></li>
+<li><p><a class="reference external" href="https://openreview.net/forum?id=rJg8yhAqKm">Transfer and Exploration via the Information Bottleneck</a> (Mila, Nov 2018)</p></li>
+<li><p><a class="reference external" href="https://gupea.ub.gu.se/bitstream/2077/62445/1/gupea_2077_62445_1.pdf">Creating safer reward functions for reinforcement learning agents in the gridworld</a> (University of Gothenburg, 2018)</p></li>
+<li><p><a class="reference external" href="https://arxiv.org/abs/1810.08272">BabyAI: First Steps Towards Grounded Language Learning With a Human In the Loop</a> (Mila, ICLR, Oct 2018)</p></li>
+</ul>
+<p>This environment has been built as part of work done at <a class="reference external" href="https://mila.quebec">Mila</a>. The Dynamic obstacles environment has been added as part of work done at <a class="reference external" href="https://www.ias.informatik.tu-darmstadt.de/">IAS in TU Darmstadt</a> and the University of Genoa for mobile robot navigation with dynamic obstacles.</p>
+<li><a class="reference internal" href="#">List of publications</a><ul>
+<li><a class="reference internal" href="#list-of-publications-submissions-using-minigrid-or-babyai-please-open-a-pull-request-to-add-missing-entries">List of publications & submissions using MiniGrid or BabyAI (please open a pull request to add missing entries):</a></li>
+<li><p><code class="docutils literal notranslate"><span class="pre">OBJECT_TO_IDX</span></code> and <code class="docutils literal notranslate"><span class="pre">COLOR_TO_IDX</span></code> mapping can be found in
+<li><p><code class="docutils literal notranslate"><span class="pre">STATE</span></code> refers to the door state with 0=open, 1=closed and 2=locked</p></li>
+</ul>
+</section>
+<section id="rewards">
+<h2>Rewards<a class="headerlink" href="#rewards" title="Permalink to this headline">#</a></h2>
+<p>A reward of ‘1’ is given for success, and ‘0’ for failure.</p>
+</section>
+<section id="termination">
+<h2>Termination<a class="headerlink" href="#termination" title="Permalink to this headline">#</a></h2>
+<p>The episode ends if any one of the following conditions is met:</p>
+<ol class="arabic simple">
+<li><p>The agent picks up the correct box.</p></li>
+<li><p>Timeout (see <code class="docutils literal notranslate"><span class="pre">max_steps</span></code>).</p></li>
+</ol>
+</section>
+<section id="registered-configurations">
+<h2>Registered Configurations<a class="headerlink" href="#registered-configurations" title="Permalink to this headline">#</a></h2>
+<li><p>The agent can pick up and carry exactly one object (eg: ball or key)</p></li>
+<li><p>To open a locked door, the agent has to be carrying a key matching the door’s color</p></li>
+</ul>
+<p>Actions in the basic environment:</p>
+<ul class="simple">
+<li><p>Turn left</p></li>
+<li><p>Turn right</p></li>
+<li><p>Move forward</p></li>
+<li><p>Pick up an object</p></li>
+<li><p>Drop the object being carried</p></li>
+<li><p>Toggle (open doors, interact with objects)</p></li>
+<li><p>Done (task completed, optional)</p></li>
+</ul>
+<p>Default tile/observation encoding:</p>
+<ul class="simple">
+<li><p>Each tile is encoded as a 3 dimensional tuple: <code class="docutils literal notranslate"><span class="pre">(OBJECT_IDX,</span> <span class="pre">COLOR_IDX,</span> <span class="pre">STATE)</span></code></p></li>
+<li><p><code class="docutils literal notranslate"><span class="pre">OBJECT_TO_IDX</span></code> and <code class="docutils literal notranslate"><span class="pre">COLOR_TO_IDX</span></code> mapping can be found in <span class="xref myst">minigrid/minigrid.py</span></p></li>
+<li><p><code class="docutils literal notranslate"><span class="pre">STATE</span></code> refers to the door state with 0=open, 1=closed and 2=locked</p></li>
+</ul>
+<p>By default, sparse rewards are given for reaching a green goal tile. A
+reward of 1 is given for success, and zero for failure. There is also an
+environment-specific time step limit for completing the task.
+You can define your own reward function by creating a class derived
+from <code class="docutils literal notranslate"><span class="pre">MiniGridEnv</span></code>. Extending the environment with new object types or new actions
+should be very easy. If you wish to do this, you should take a look at the
+<li><p><code class="docutils literal notranslate"><span class="pre">OBJECT_TO_IDX</span></code> and <code class="docutils literal notranslate"><span class="pre">COLOR_TO_IDX</span></code> mapping can be found in
+<li><p><code class="docutils literal notranslate"><span class="pre">STATE</span></code> refers to the door state with 0=open, 1=closed and 2=locked</p></li>
+</ul>
+</section>
+<section id="rewards">
+<h2>Rewards<a class="headerlink" href="#rewards" title="Permalink to this headline">#</a></h2>
+<p>A reward of ‘1’ is given for success, and ‘0’ for failure.</p>
+</section>
+<section id="termination">
+<h2>Termination<a class="headerlink" href="#termination" title="Permalink to this headline">#</a></h2>
+<p>The episode ends if any one of the following conditions is met:</p>
+<ol class="arabic simple">
+<li><p>The agent reaches the goal.</p></li>
+<li><p>The agent falls into lava.</p></li>
+<li><p>Timeout (see <code class="docutils literal notranslate"><span class="pre">max_steps</span></code>).</p></li>
+</ol>
+</section>
+<section id="registered-configurations">
+<h2>Registered Configurations<a class="headerlink" href="#registered-configurations" title="Permalink to this headline">#</a></h2>
+<li><p><code class="docutils literal notranslate"><span class="pre">OBJECT_TO_IDX</span></code> and <code class="docutils literal notranslate"><span class="pre">COLOR_TO_IDX</span></code> mapping can be found in
+<li><p><code class="docutils literal notranslate"><span class="pre">STATE</span></code> refers to the door state with 0=open, 1=closed and 2=locked</p></li>
+</ul>
+</section>
+<section id="rewards">
+<h2>Rewards<a class="headerlink" href="#rewards" title="Permalink to this headline">#</a></h2>
+<p>A reward of ‘1’ is given for success, and ‘0’ for failure.</p>
+</section>
+<section id="termination">
+<h2>Termination<a class="headerlink" href="#termination" title="Permalink to this headline">#</a></h2>
+<p>The episode ends if any one of the following conditions is met:</p>
+<ol class="arabic simple">
+<li><p>The agent reaches the goal.</p></li>
+<li><p>Timeout (see <code class="docutils literal notranslate"><span class="pre">max_steps</span></code>).</p></li>
+</ol>
+</section>
+<section id="registered-configurations">
+<h2>Registered Configurations<a class="headerlink" href="#registered-configurations" title="Permalink to this headline">#</a></h2>
+<li><p><code class="docutils literal notranslate"><span class="pre">OBJECT_TO_IDX</span></code> and <code class="docutils literal notranslate"><span class="pre">COLOR_TO_IDX</span></code> mapping can be found in
+<li><p><code class="docutils literal notranslate"><span class="pre">STATE</span></code> refers to the door state with 0=open, 1=closed and 2=locked</p></li>
+</ul>
+</section>
+<section id="rewards">
+<h2>Rewards<a class="headerlink" href="#rewards" title="Permalink to this headline">#</a></h2>
+<p>A reward of ‘1’ is given for success, and ‘0’ for failure. A ‘-1’ penalty is
+subtracted if the agent collides with an obstacle.</p>
+</section>
+<section id="termination">
+<h2>Termination<a class="headerlink" href="#termination" title="Permalink to this headline">#</a></h2>
+<p>The episode ends if any one of the following conditions is met:</p>
+<ol class="arabic simple">
+<li><p>The agent reaches the goal.</p></li>
+<li><p>The agent collides with an obstacle.</p></li>
+<li><p>Timeout (see <code class="docutils literal notranslate"><span class="pre">max_steps</span></code>).</p></li>
+</ol>
+</section>
+<section id="registered-configurations">
+<h2>Registered Configurations<a class="headerlink" href="#registered-configurations" title="Permalink to this headline">#</a></h2>
+<li><p><code class="docutils literal notranslate"><span class="pre">OBJECT_TO_IDX</span></code> and <code class="docutils literal notranslate"><span class="pre">COLOR_TO_IDX</span></code> mapping can be found in
+<li><p><code class="docutils literal notranslate"><span class="pre">STATE</span></code> refers to the door state with 0=open, 1=closed and 2=locked</p></li>
+</ul>
+</section>
+<section id="rewards">
+<h2>Rewards<a class="headerlink" href="#rewards" title="Permalink to this headline">#</a></h2>
+<p>A reward of ‘1’ is given for success, and ‘0’ for failure.</p>
+</section>
+<section id="termination">
+<h2>Termination<a class="headerlink" href="#termination" title="Permalink to this headline">#</a></h2>
+<p>The episode ends if any one of the following conditions is met:</p>
+<ol class="arabic simple">
+<li><p>The agent reaches the goal.</p></li>
+<li><p>Timeout (see <code class="docutils literal notranslate"><span class="pre">max_steps</span></code>).</p></li>
+</ol>
+</section>
+<section id="registered-configurations">
+<h2>Registered Configurations<a class="headerlink" href="#registered-configurations" title="Permalink to this headline">#</a></h2>
+<li><p><code class="docutils literal notranslate"><span class="pre">OBJECT_TO_IDX</span></code> and <code class="docutils literal notranslate"><span class="pre">COLOR_TO_IDX</span></code> mapping can be found in
+<li><p><code class="docutils literal notranslate"><span class="pre">STATE</span></code> refers to the door state with 0=open, 1=closed and 2=locked</p></li>
+</ul>
+</section>
+<section id="rewards">
+<h2>Rewards<a class="headerlink" href="#rewards" title="Permalink to this headline">#</a></h2>
+<p>A reward of ‘1’ is given for success, and ‘0’ for failure.</p>
+</section>
+<section id="termination">
+<h2>Termination<a class="headerlink" href="#termination" title="Permalink to this headline">#</a></h2>
+<p>The episode ends if any one of the following conditions is met:</p>
+<ol class="arabic simple">
+<li><p>The agent picks up the correct object.</p></li>
+<li><p>The agent picks up the wrong object.</p></li>
+<li><p>Timeout (see <code class="docutils literal notranslate"><span class="pre">max_steps</span></code>).</p></li>
+</ol>
+</section>
+<section id="registered-configurations">
+<h2>Registered Configurations<a class="headerlink" href="#registered-configurations" title="Permalink to this headline">#</a></h2>
+<li><p><code class="docutils literal notranslate"><span class="pre">OBJECT_TO_IDX</span></code> and <code class="docutils literal notranslate"><span class="pre">COLOR_TO_IDX</span></code> mapping can be found in
+<li><p><code class="docutils literal notranslate"><span class="pre">STATE</span></code> refers to the door state with 0=open, 1=closed and 2=locked</p></li>
+</ul>
+</section>
+<section id="rewards">
+<h2>Rewards<a class="headerlink" href="#rewards" title="Permalink to this headline">#</a></h2>
+<p>A reward of ‘1’ is given for success, and ‘0’ for failure.</p>
+</section>
+<section id="termination">
+<h2>Termination<a class="headerlink" href="#termination" title="Permalink to this headline">#</a></h2>
+<p>The episode ends if any one of the following conditions is met:</p>
+<ol class="arabic simple">
+<li><p>The agent reaches the goal.</p></li>
+<li><p>Timeout (see <code class="docutils literal notranslate"><span class="pre">max_steps</span></code>).</p></li>
+</ol>
+</section>
+<section id="registered-configurations">
+<h2>Registered Configurations<a class="headerlink" href="#registered-configurations" title="Permalink to this headline">#</a></h2>
+<li><p><code class="docutils literal notranslate"><span class="pre">OBJECT_TO_IDX</span></code> and <code class="docutils literal notranslate"><span class="pre">COLOR_TO_IDX</span></code> mapping can be found in
+<li><p><code class="docutils literal notranslate"><span class="pre">STATE</span></code> refers to the door state with 0=open, 1=closed and 2=locked</p></li>
+</ul>
+</section>
+<section id="rewards">
+<h2>Rewards<a class="headerlink" href="#rewards" title="Permalink to this headline">#</a></h2>
+<p>A reward of ‘1’ is given for success, and ‘0’ for failure.</p>
+</section>
+<section id="termination">
+<h2>Termination<a class="headerlink" href="#termination" title="Permalink to this headline">#</a></h2>
+<p>The episode ends if any one of the following conditions is met:</p>
+<ol class="arabic simple">
+<li><p>The agent stands next the correct door performing the <code class="docutils literal notranslate"><span class="pre">done</span></code> action.</p></li>
+<li><p>Timeout (see <code class="docutils literal notranslate"><span class="pre">max_steps</span></code>).</p></li>
+</ol>
+</section>
+<section id="registered-configurations">
+<h2>Registered Configurations<a class="headerlink" href="#registered-configurations" title="Permalink to this headline">#</a></h2>
+<li><p><code class="docutils literal notranslate"><span class="pre">OBJECT_TO_IDX</span></code> and <code class="docutils literal notranslate"><span class="pre">COLOR_TO_IDX</span></code> mapping can be found in
+<li><p><code class="docutils literal notranslate"><span class="pre">STATE</span></code> refers to the door state with 0=open, 1=closed and 2=locked</p></li>
+</ul>
+</section>
+<section id="rewards">
+<h2>Rewards<a class="headerlink" href="#rewards" title="Permalink to this headline">#</a></h2>
+<p>A reward of ‘1’ is given for success, and ‘0’ for failure.</p>
+</section>
+<section id="termination">
+<h2>Termination<a class="headerlink" href="#termination" title="Permalink to this headline">#</a></h2>
+<p>The episode ends if any one of the following conditions is met:</p>
+<ol class="arabic simple">
+<li><p>The agent picks up the correct object.</p></li>
+<li><p>Timeout (see <code class="docutils literal notranslate"><span class="pre">max_steps</span></code>).</p></li>
+</ol>
+</section>
+<section id="registered-configurations">
+<h2>Registered Configurations<a class="headerlink" href="#registered-configurations" title="Permalink to this headline">#</a></h2>
+<li><p><code class="docutils literal notranslate"><span class="pre">OBJECT_TO_IDX</span></code> and <code class="docutils literal notranslate"><span class="pre">COLOR_TO_IDX</span></code> mapping can be found in
+<li><p><code class="docutils literal notranslate"><span class="pre">STATE</span></code> refers to the door state with 0=open, 1=closed and 2=locked</p></li>
+</ul>
+</section>
+<section id="rewards">
+<h2>Rewards<a class="headerlink" href="#rewards" title="Permalink to this headline">#</a></h2>
+<p>A reward of ‘1’ is given for success, and ‘0’ for failure.</p>
+</section>
+<section id="termination">
+<h2>Termination<a class="headerlink" href="#termination" title="Permalink to this headline">#</a></h2>
+<p>The episode ends if any one of the following conditions is met:</p>
+<ol class="arabic simple">
+<li><p>The agent reaches the goal.</p></li>
+<li><p>The agent falls into lava.</p></li>
+<li><p>Timeout (see <code class="docutils literal notranslate"><span class="pre">max_steps</span></code>).</p></li>
+</ol>
+</section>
+<section id="registered-configurations">
+<h2>Registered Configurations<a class="headerlink" href="#registered-configurations" title="Permalink to this headline">#</a></h2>
+<li><p><code class="docutils literal notranslate"><span class="pre">OBJECT_TO_IDX</span></code> and <code class="docutils literal notranslate"><span class="pre">COLOR_TO_IDX</span></code> mapping can be found in
+<li><p><code class="docutils literal notranslate"><span class="pre">STATE</span></code> refers to the door state with 0=open, 1=closed and 2=locked</p></li>
+</ul>
+</section>
+<section id="rewards">
+<h2>Rewards<a class="headerlink" href="#rewards" title="Permalink to this headline">#</a></h2>
+<p>A reward of ‘1’ is given for success, and ‘0’ for failure.</p>
+</section>
+<section id="termination">
+<h2>Termination<a class="headerlink" href="#termination" title="Permalink to this headline">#</a></h2>
+<p>The episode ends if any one of the following conditions is met:</p>
+<ol class="arabic simple">
+<li><p>The agent reaches the goal.</p></li>
+<li><p>Timeout (see <code class="docutils literal notranslate"><span class="pre">max_steps</span></code>).</p></li>
+</ol>
+</section>
+<section id="registered-configurations">
+<h2>Registered Configurations<a class="headerlink" href="#registered-configurations" title="Permalink to this headline">#</a></h2>
+<li><p><code class="docutils literal notranslate"><span class="pre">OBJECT_TO_IDX</span></code> and <code class="docutils literal notranslate"><span class="pre">COLOR_TO_IDX</span></code> mapping can be found in
+<li><p><code class="docutils literal notranslate"><span class="pre">STATE</span></code> refers to the door state with 0=open, 1=closed and 2=locked</p></li>
+</ul>
+</section>
+<section id="rewards">
+<h2>Rewards<a class="headerlink" href="#rewards" title="Permalink to this headline">#</a></h2>
+<p>A reward of ‘1’ is given for success, and ‘0’ for failure.</p>
+</section>
+<section id="termination">
+<h2>Termination<a class="headerlink" href="#termination" title="Permalink to this headline">#</a></h2>
+<p>The episode ends if any one of the following conditions is met:</p>
+<ol class="arabic simple">
+<li><p>The agent reaches the correct matching object.</p></li>
+<li><p>The agent reaches the wrong matching object.</p></li>
+<li><p>Timeout (see <code class="docutils literal notranslate"><span class="pre">max_steps</span></code>).</p></li>
+</ol>
+</section>
+<section id="registered-configurations">
+<h2>Registered Configurations<a class="headerlink" href="#registered-configurations" title="Permalink to this headline">#</a></h2>
+<li><p><code class="docutils literal notranslate"><span class="pre">OBJECT_TO_IDX</span></code> and <code class="docutils literal notranslate"><span class="pre">COLOR_TO_IDX</span></code> mapping can be found in
+<li><p><code class="docutils literal notranslate"><span class="pre">STATE</span></code> refers to the door state with 0=open, 1=closed and 2=locked</p></li>
+</ul>
+</section>
+<section id="rewards">
+<h2>Rewards<a class="headerlink" href="#rewards" title="Permalink to this headline">#</a></h2>
+<p>A reward of ‘1’ is given for success, and ‘0’ for failure.</p>
+</section>
+<section id="termination">
+<h2>Termination<a class="headerlink" href="#termination" title="Permalink to this headline">#</a></h2>
+<p>The episode ends if any one of the following conditions is met:</p>
+<ol class="arabic simple">
+<li><p>The agent reaches the goal.</p></li>
+<li><p>Timeout (see <code class="docutils literal notranslate"><span class="pre">max_steps</span></code>).</p></li>
+</ol>
+</section>
+<section id="registered-configurations">
+<h2>Registered Configurations<a class="headerlink" href="#registered-configurations" title="Permalink to this headline">#</a></h2>
+<p>S: size of map SxS.
+N: number of rooms.</p>
+<ul class="simple">
+<li><p><code class="docutils literal notranslate"><span class="pre">MiniGrid-MultiRoom-N2-S4-v0</span></code> (two small rooms)</p></li>