|
@@ -4,8 +4,8 @@ This example builds a singularity container for [Facebook Research's PyTorch GAN
|
|
|
Zoo](https://github.com/facebookresearch/pytorch_GAN_zoo).
|
|
|
|
|
|
The singularity container will allow you to call all the scripts from the
|
|
|
-project and includes are requirements. The container supports CUDA versions
|
|
|
-10.1, 10.2 and 11.1 on the host.
|
|
|
+project and includes are requirements. The container supports CUDA version 11.1
|
|
|
+on the host.
|
|
|
|
|
|
## Building
|
|
|
|
|
@@ -36,16 +36,23 @@ singularity exec pytorch_GAN_zoo.sif eval.py
|
|
|
Any flags or command line arguments can be declared after the script name.
|
|
|
|
|
|
When training, you will need to supply the `--nv` flag to singularity so that
|
|
|
-the host GPU may be used. You will also need to select a singularity app, using
|
|
|
-the `--app` flag to select the appropriate CUDA version. The available apps are
|
|
|
-`cu101`, `cu102`, and `cu111` for CUDA 10.1, 10.2 and 11.1 respectively.
|
|
|
+the host GPU may be used.
|
|
|
|
|
|
-For example, to pre-process the dtd dataset and train a PGAN model on a host
|
|
|
-with CUDA 10.2 you could run the following commands.
|
|
|
+### Multiple GPUs
|
|
|
+
|
|
|
+PyTorch GAN zoo natively supports [parallelisation across multiple
|
|
|
+GPUs](https://github.com/facebookresearch/pytorch_GAN_zoo/issues/57). The
|
|
|
+devices to use can be selected using the `CUDA_VISIBLE_DEVICES` environment
|
|
|
+variable. CUDA compatible GPUs are numbered from zero. For example, to use the
|
|
|
+first and third CUDA accelerators you would set `CUDA_VISIBLE_DEVICES=0,2`
|
|
|
+
|
|
|
+To pass this environment variable to singularity the `--env-file` flag must be
|
|
|
+used as [passing environment variables with commas is not supported by the
|
|
|
+`--env` flag](https://github.com/apptainer/singularity/issues/6088).
|
|
|
|
|
|
```bash
|
|
|
-singularity exec --app cu102 pytorch_GAN_zoo.sif datasets.py dtd <path to dtd dataset>/images/
|
|
|
-singularity exec --nv --app cu102 pytorch_GAN_zoo.sif train.py PGAN -c config_dtd.json --restart --no_vis -n dtd
|
|
|
+echo 'CUDA_VISIBLE_DEVICES=0,1' > env.txt
|
|
|
+singularity exec --env-file env.txt pytorch_GAN_zoo.sif ...
|
|
|
```
|
|
|
|
|
|
### Models
|
|
@@ -60,16 +67,14 @@ In each example the `--restart` flag is used so that checkpoints are
|
|
|
periodically written during the training. The `--no_vis` flag is used to disable
|
|
|
visdom visualisations.
|
|
|
|
|
|
-As above, these examples assume the host has CUDA 10.2 installed.
|
|
|
-
|
|
|
#### DTD
|
|
|
|
|
|
The DTD dataset requires no preprocessing, so the datasets script simply creates
|
|
|
a configuration file.
|
|
|
|
|
|
```bash
|
|
|
-singularity exec --app cu102 pytorch_GAN_zoo.sif datasets.py dtd <path to dtd>/images
|
|
|
-singularity exec --nv --app cu102 pytorch_GAN_zoo.sif train.py PGAN -c config_dtd.json --restart --no_vis -n dtd
|
|
|
+singularity exec pytorch_GAN_zoo.sif datasets.py dtd <path to dtd>/images
|
|
|
+singularity exec pytorch_GAN_zoo.sif train.py PGAN -c config_dtd.json --restart --no_vis -n dtd
|
|
|
```
|
|
|
|
|
|
Where `<path to dtd>` is the path of the directory extracted from the dtd
|
|
@@ -82,8 +87,8 @@ A processed dataset will be written to a directory delcared using the `-o` flag,
|
|
|
`cifar-10` n this example.
|
|
|
|
|
|
```bash
|
|
|
-singularity exec --app cu102 pytorch_GAN_zoo.sif datasets.py cifar10 <path to cifar-10> -o cifar10
|
|
|
-singularity exec --nv --app cu102 pytorch_GAN_zoo.sif train.py -c config_cifar10.json --restart --no_vis -n cifar10
|
|
|
+singularity exec pytorch_GAN_zoo.sif datasets.py cifar10 <path to cifar-10> -o cifar10
|
|
|
+singularity exec pytorch_GAN_zoo.sif train.py -c config_cifar10.json --restart --no_vis -n cifar10
|
|
|
```
|
|
|
|
|
|
Where `<path to cifar-10>` is the path of the directory containing the pickle
|