| 
					
				 | 
			
			
				@@ -3,7 +3,7 @@ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 There are other gridworld Gym environments out there, but this one is 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 designed to be particularly simple, lightweight and fast. The code has very few 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 dependencies, making it less likely to break or fail to install. It loads no 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				-external sprites/textures, and it can run at up to 5800 FPS on a quad-core 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+external sprites/textures, and it can run at up to 6000 FPS on a quad-core i7 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 laptop, which means you can run your experiments faster. Batteries are 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 included: a known-working RL implementation is supplied in this repository 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 to help you get started. 
			 | 
		
	
	
		
			
				| 
					
				 | 
			
			
				@@ -72,21 +72,42 @@ python3 pytorch_rl/enjoy.py --env-name MiniGrid-Empty-6x6-v0 --load-dir ./traine 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				  
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 ## Design 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				  
			 | 
		
	
		
			
				 | 
				 | 
			
			
				-The environment is partially observable and uses a compact and efficient 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				-encoding, with just 3 inputs per visible grid cell. It is also easy to 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				-produce an array of pixels for observations if desired. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				- 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				-Each cell/tile in the grid world contains one object, each object has an 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				-associated discrete color. The objects currently supported are walls, doors, 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				-locked doors, keys, balls, boxes and a goal square. The basic version of the 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				-environment has 5 possible actions: turn left, turn right, move 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				-forward, pickup/toggle to interact with objects, and a wait/noop action. The 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				-agent can carry one carryable item at a time (eg: ball or key). By default, 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				-only sparse rewards for reaching the goal square are provided. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				- 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				-Extending the environment with new object types and dynamics should be 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				-very easy. If you wish to do this, you should take a look at 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				-the [gym_minigrid/minigrid.py](gym_minigrid/minigrid.py) source file. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+MiniGrid is built to support tasks involving natural language and sparse rewards. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+The observations are dictionaries, with an 'image' field, partially observable 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+view of the environment, and a 'mission' field which is a textual string 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+describing the objective the agent should reach to get a reward. Using 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+dictionaries makes it easy for you to add additional information to observations 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+if you need to, without having to force everything into a single tensor. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+If your RL code expects a tensor for observations, please take a look at 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+`FlatObsWrapper` in  
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+[gym_minigrid/wrappers.py](/gym_minigrid/wrappers.py). 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+The partially observable view of the environment uses a compact and efficient 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+encoding, with just 3 input values per visible grid cell, 147 values total. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+If you want to obtain an array of RGB pixels instead, see the `getObsRender` method in 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+[gym_minigrid/minigrid.py](gym_minigrid/minigrid.py). 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+Structure of the world: 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+- The world is an NxM grid of tiles 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+- Each tile in the grid world contains zero or one object 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  - Cells that do not contain an object have the value `None` 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+- Each object has an associated discrete color (string) 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+- Each object has an associated type (string) 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+  - Provided object types are: wall, door, locked_doors, key, ball, box and goal 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+- The agent can pick up and carry exactly one object (eg: ball or key) 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+Actions in the basic environment: 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+- Turn left 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+- Turn right 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+- Move forward 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+- Toggle (pick up or interact with objects) 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+- Wait (noop, do nothing) 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+ 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+By default, sparse rewards for reaching a goal square are provided, but you can 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+define your own reward function by creating a class derived from MiniGridEnv. Extending 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+the environment with new object types or action should be very easy very easy. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+If you wish to do this, you should take a look at the 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				+[gym_minigrid/minigrid.py](gym_minigrid/minigrid.py) source file. 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				  
			 | 
		
	
		
			
				 | 
				 | 
			
			
				 ## Included Environments 
			 | 
		
	
		
			
				 | 
				 | 
			
			
				  
			 |