Maxime Chevalier-Boisvert 6ded215f0b Added code to use Gym Monitor for video recording | 6 years ago | |
---|---|---|
.. | ||
imgs | 7 years ago | |
LICENSE | 7 years ago | |
README.md | 7 years ago | |
arguments.py | 7 years ago | |
distributions.py | 7 years ago | |
enjoy.py | 7 years ago | |
envs.py | 6 years ago | |
kfac.py | 7 years ago | |
main.py | 6 years ago | |
model.py | 7 years ago | |
record_video.py | 6 years ago | |
requirements.txt | 7 years ago | |
storage.py | 7 years ago | |
utils.py | 7 years ago | |
visualize.py | 7 years ago |
This is a PyTorch implementation of
Also see the OpenAI posts: A2C/ACKTR and PPO for more information.
This implementation is inspired by the OpenAI baselines for A2C, ACKTR and PPO. It uses the same hyper parameters and the model since they were well tuned for Atari games.
I highly recommend PyBullet as a free open source alternative to MuJoCo for continuous control tasks.
All environments are operated using exactly the same Gym interface. See their documentations for a comprehensive list.
In order to install requirements, follow:
# PyTorch
conda install pytorch torchvision -c soumith
# Baselines for Atari preprocessing
git clone https://github.com/openai/baselines.git
cd baselines
pip install -e .
# Other requirements
pip install -r requirements.txt
Contributions are very welcome. If you know how to make this code better, don't hesitate to send a pull request. Also see a todo list below.
Also I'm searching for volunteers to run all experiments on Atari and MuJoCo (with multiple random seeds).
It's extremely difficult to reproduce results for Reinforcement Learning methods. See "Deep Reinforcement Learning that Matters" for more information. I tried to reproduce OpenAI results as closely as possible. However, majors differences in performance can be caused even by minor differences in TensorFlow and PyTorch libraries.
Start a Visdom
server with python -m visdom.server
, it will serve http://localhost:8097/
by default.
python main.py --env-name "PongNoFrameskip-v4"
python main.py --env-name "PongNoFrameskip-v4" --algo ppo --use-gae --lr 2.5e-4 --clip-param 0.1 --num-processes 8 --num-steps 128 --num-mini-batch 4 --vis-interval 1 --log-interval 1
python main.py --env-name "PongNoFrameskip-v4" --algo acktr --num-processes 32 --num-steps 20
python main.py --env-name "Reacher-v1" --num-stack 1 --num-frames 1000000
python main.py --env-name "Reacher-v1" --algo ppo --use-gae --vis-interval 1 --log-interval 1 --num-stack 1 --num-steps 2048 --num-processes 1 --lr 3e-4 --entropy-coef 0 --ppo-epoch 10 --num-mini-batch 32 --gamma 0.99 --tau 0.95 --num-frames 1000000
ACKTR requires some modifications to be made specifically for MuJoCo. But at the moment, I want to keep this code as unified as possible. Thus, I'm going for better ways to integrate it into the codebase.
Load a pretrained model from my Google Drive.
Also pretrained models for other games are available on request. Send me an email or create an issue, and I will upload it.
Disclaimer: I might have used different hyper-parameters to train these models.
python enjoy.py --load-dir trained_models/a2c --env-name "PongNoFrameskip-v4" --num-stack 4
python enjoy.py --load-dir trained_models/ppo --env-name "Reacher-v1" --num-stack 1