瀏覽代碼

Moved rl code to pytorch-rl. Fixed warnings. Fixed issue w/ flat obs.

Maxime Chevalier-Boisvert 7 年之前
父節點
當前提交
80b3178610

+ 6 - 6
README.md

@@ -28,10 +28,10 @@ pip3 install -e .
 ```
 
 Optionally, if you wish use the reinforcement learning code included
-under [/basicrl](/basicrl), you can install its dependencies as follows:
+under [/pytorch-rl](/pytorch-rl), you can install its dependencies as follows:
 
 ```
-cd basicrl
+cd pytorch-rl
 
 # PyTorch
 conda install pytorch torchvision -c soumith
@@ -49,7 +49,7 @@ cd ..
 pip3 install -r requirements.txt
 ```
 
-Note: the basicrl code is a custom fork of [this repository](https://github.com/ikostrikov/pytorch-a2c-ppo-acktr),
+Note: the pytorch-rl code is a custom fork of [this repository](https://github.com/ikostrikov/pytorch-a2c-ppo-acktr),
 which was modified to work with this environment.
 
 ## Basic Usage
@@ -66,17 +66,17 @@ The environment being run can be selected with the `--env-name` option, eg:
 ./standalone.py --env-name MiniGrid-Empty-8x8-v0
 ```
 
-Basic reinforcement learning code is provided in the `basicrl` subdirectory.
+Basic reinforcement learning code is provided in the `pytorch-rl` subdirectory.
 You can perform training using the ACKTR algorithm with:
 
 ```
-python3 basicrl/main.py --env-name MiniGrid-Empty-6x6-v0 --no-vis --num-processes 32 --algo acktr
+python3 pytorch-rl/main.py --env-name MiniGrid-Empty-6x6-v0 --no-vis --num-processes 32 --algo acktr
 ```
 
 You can view the result of training using the `enjoy.py` script:
 
 ```
-python3 basicrl/enjoy.py --env-name MiniGrid-Empty-6x6-v0 --load-dir ./trained_models/acktr
+python3 pytorch-rl/enjoy.py --env-name MiniGrid-Empty-6x6-v0 --load-dir ./trained_models/acktr
 ```
 
 ## Design

basicrl/LICENSE → pytorch-rl/LICENSE


basicrl/README.md → pytorch-rl/README.md


basicrl/arguments.py → pytorch-rl/arguments.py


+ 3 - 3
basicrl/distributions.py

@@ -19,7 +19,7 @@ class Categorical(nn.Module):
     def sample(self, x, deterministic):
         x = self(x)
 
-        probs = F.softmax(x)
+        probs = F.softmax(x, dim=1)
         if deterministic is False:
             action = probs.multinomial()
         else:
@@ -29,8 +29,8 @@ class Categorical(nn.Module):
     def logprobs_and_entropy(self, x, actions):
         x = self(x)
 
-        log_probs = F.log_softmax(x)
-        probs = F.softmax(x)
+        log_probs = F.log_softmax(x, dim=1)
+        probs = F.softmax(x, dim=1)
 
         action_log_probs = log_probs.gather(1, actions)
 

basicrl/enjoy.py → pytorch-rl/enjoy.py


basicrl/envs.py → pytorch-rl/envs.py


basicrl/imgs/a2c_beamrider.png → pytorch-rl/imgs/a2c_beamrider.png


basicrl/imgs/a2c_breakout.png → pytorch-rl/imgs/a2c_breakout.png


basicrl/imgs/a2c_qbert.png → pytorch-rl/imgs/a2c_qbert.png


basicrl/imgs/a2c_seaquest.png → pytorch-rl/imgs/a2c_seaquest.png


basicrl/imgs/acktr_beamrider.png → pytorch-rl/imgs/acktr_beamrider.png


basicrl/imgs/acktr_breakout.png → pytorch-rl/imgs/acktr_breakout.png


basicrl/imgs/acktr_qbert.png → pytorch-rl/imgs/acktr_qbert.png


basicrl/imgs/acktr_seaquest.png → pytorch-rl/imgs/acktr_seaquest.png


basicrl/imgs/ppo_halfcheetah.png → pytorch-rl/imgs/ppo_halfcheetah.png


basicrl/imgs/ppo_hopper.png → pytorch-rl/imgs/ppo_hopper.png


basicrl/imgs/ppo_reacher.png → pytorch-rl/imgs/ppo_reacher.png


basicrl/imgs/ppo_walker.png → pytorch-rl/imgs/ppo_walker.png


basicrl/kfac.py → pytorch-rl/kfac.py


+ 12 - 2
basicrl/main.py

@@ -64,8 +64,10 @@ def main():
     else:
         envs = DummyVecEnv(envs)
 
-    if len(envs.observation_space.shape) == 1:
-        envs = VecNormalize(envs)
+    # Maxime: commented this out because it very much changes the behavior
+    # of the code for seemingly arbitrary reasons
+    #if len(envs.observation_space.shape) == 1:
+    #    envs = VecNormalize(envs)
 
     obs_shape = envs.observation_space.shape
     obs_shape = (obs_shape[0] * args.num_stack, *obs_shape[1:])
@@ -79,6 +81,14 @@ def main():
             "Recurrent policy is not implemented for the MLP controller"
         actor_critic = MLPPolicy(obs_numel, envs.action_space)
 
+    # Maxime: log some info about the model and its size
+    modelSize = 0
+    for p in actor_critic.parameters():
+        pSize = reduce(operator.mul, p.size(), 1)
+        modelSize += pSize
+    print(str(actor_critic))
+    print('Total model size: %d' % modelSize)
+
     if envs.action_space.__class__.__name__ == "Discrete":
         action_shape = 1
     else:

basicrl/model.py → pytorch-rl/model.py


basicrl/requirements.txt → pytorch-rl/requirements.txt


basicrl/storage.py → pytorch-rl/storage.py


basicrl/utils.py → pytorch-rl/utils.py


basicrl/visualize.py → pytorch-rl/visualize.py