浏览代码

Added environments to README

Maxime Chevalier-Boisvert 7 年之前
父节点
当前提交
ea0e67e005
共有 1 个文件被更改,包括 32 次插入10 次删除
  1. 32 10
      README.md

+ 32 - 10
README.md

@@ -144,6 +144,21 @@ square the agent must get to. This environment is extremely difficult to
 solve using classical RL. However, by gradually increasing the number of
 solve using classical RL. However, by gradually increasing the number of
 rooms and building a curriculum, the environment can be solved.
 rooms and building a curriculum, the environment can be solved.
 
 
+### Fetch environment
+
+Registered configurations:
+- `MiniGrid-Fetch-5x5-N2-v0`
+- `MiniGrid-Fetch-8x8-N3-v0`
+
+<p align="center">
+<img src="/figures/fetch-env.png" width=450>
+</p>
+
+This environment has multiple objects of assorted types and colors. The
+agent receives a textual string as part of its observation telling it
+which object to pick up. Picking up the wrong object produces a negative
+reward.
+
 ### Go-to-door environment
 ### Go-to-door environment
 
 
 Registered configurations:
 Registered configurations:
@@ -160,20 +175,27 @@ receives a textual (mission) string as input, telling it which door to go to,
 (eg: "go to the red door"). It receives a positive reward for performing the
 (eg: "go to the red door"). It receives a positive reward for performing the
 `wait` action next to the correct door, as indicated in the mission string.
 `wait` action next to the correct door, as indicated in the mission string.
 
 
-### Fetch environment
+### Put-near environment
 
 
 Registered configurations:
 Registered configurations:
-- `MiniGrid-Fetch-5x5-N2-v0`
-- `MiniGrid-Fetch-8x8-N3-v0`
+- `MiniGrid-PutNear-6x6-N2-v0`
+- `MiniGrid-PutNear-8x8-N3-v0`
 
 
-<p align="center">
-<img src="/figures/fetch-env.png" width=450>
-</p>
+The agent is instructed through a textual string to pick up an object and
+place it next to another object. This environment is easy to solve with two
+objects, but difficult to solve with more, as it involves both textual
+understanding and spatial reasoning involving multiple objects.
 
 
-This environment has multiple objects of assorted types and colors. The
-agent receives a textual string as part of its observation telling it
-which object to pick up. Picking up the wrong object produces a negative
-reward.
+### Locked Room Environment
+
+Registed configurations:
+- `MiniGrid-LockedRoom-v0`
+
+The environment has six rooms, one of which is locked. The agent receives
+a textual mission string as input, telling it which room to go to in order
+to get the key that opens the locked room. It then has to go into the locked
+room in order to reach the final goal. This environment is extremely difficult
+to solve with vanilla reinforcement learning alone.
 
 
 ### Four room question answering environment
 ### Four room question answering environment