sovit-123 efdf0956a3 README file updates for Attention Mechanism and Vision Transformer posts		2 rokov pred
..
README.md	efdf0956a3 README file updates for Attention Mechanism and Vision Transformer posts	2 rokov pred
mhsa.py	98d090c2c0 Implementation of Vision Transformer in PyTorch	2 rokov pred
train.py	98d090c2c0 Implementation of Vision Transformer in PyTorch	2 rokov pred
vit.py	98d090c2c0 Implementation of Vision Transformer in PyTorch	2 rokov pred
vitconfigs.py	98d090c2c0 Implementation of Vision Transformer in PyTorch	2 rokov pred

Vision Transformer PyTorch

Implementation of Vision Transformer in PyTorch

Files

mhsa.py: Implementation of Multi Head Self Attention layer
vitconfigs.py: Configs for base (ViT-B), large (ViT-L) and huge (ViT-H) models as described by Dosovitskiy et. al.
vit.py: Implementation of Vision Transformer
train.py: Training script for ViT on imagenet dataset using DarkLight

Environment

Set up an environment with pytorch and TensorRT. The easiest way is to use an NGC container like this (note that a CUDA GPU is required for training):

docker run --gpus all -it --rm nvcr.io/nvidia/pytorch:23.01-py3

Verify ViT forward pass

python3 vit.py #this will print a verification message if fwd pass is successful.

Training

In the docker container, load an external volume which contains imagenet dataset. The dataset should have the format:

- root
	|
	|
	|--- train
	|		|
	|		|_ timg1.jpg
	|		|_ timg2.jpg
	|		...
	|
	|--- val
	|		|
	|		|_ vimg1.jpg
	|		|_ vimg2.jpg
	|		...

The image names contain the class label in imagenet. Provide the path of the root dir in train.py

Run training with

python3 train.py

Visualize training progress with tensorboard.

tensorboard --logdir=./runs

AI Courses by OpenCV

Want to become an expert in AI? AI Courses by OpenCV is a great place to start.