{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Megatron GPT Bootcamp\n", "\n", "## Learning objectives\n", "\n", "This objective of the bootcamp is to first, help you quickly go through one time the default Magatron workflow to let you familiarize on how Megatron works, thereafter we will be focus on catering to the specifics of local langauge needs, in this case Swedish. We will give recommandations/advices which can be optionally applied to your workflow and include some practical, useful scripts to help you kick-start your own journey in training local langauge Megatron GPT2/3 models. \n", "\n", "\n", "* Standard: Python\n", "* Frameworks: Pytorch + Megatron \n", "\n", "It is required to have more than one GPU for the bootcamp and we recommend using a [DGX](https://www.nvidia.com/en-in/data-center/dgx-systems/) like cluster with [NVLink / NVSwitch](https://www.nvidia.com/en-in/data-center/nvlink/) support.\n", "\n", "Let's start with testing the GPUs you are running the code on in this bootcamp." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "## check how many GPUs you have and GPU Mem capacity \n", "\n", " Wed Aug 25 07:03:55 2021 \n", " +-----------------------------------------------------------------------------+\n", " | NVIDIA-SMI 450.51.05 Driver Version: 450.51.05 CUDA Version: 11.2 |\n", " |-------------------------------+----------------------+----------------------+\n", " | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |\n", " | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |\n", " | | | MIG M. |\n", " |===============================+======================+======================|\n", " | 0 Tesla V100-SXM2... On | 00000000:06:00.0 Off | 0 |\n", " | N/A 34C P0 57W / 300W | 0MiB / 16160MiB | 0% Default |\n", " | | | N/A |\n", " +-------------------------------+----------------------+----------------------+\n", " | 1 Tesla V100-SXM2... On | 00000000:85:00.0 Off | 0 |\n", " | N/A 30C P0 41W / 300W | 0MiB / 16160MiB | 0% Default |\n", " | | | N/A |\n", " +-------------------------------+----------------------+----------------------+\n", " +-----------------------------------------------------------------------------+\n", " | Processes: |\n", " | GPU GI CI PID Type Process name GPU Memory |\n", " | ID ID Usage |\n", " |=============================================================================|\n", " | No running processes found |\n", " +-----------------------------------------------------------------------------+\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Wed Aug 25 07:03:55 2021 \n", "+-----------------------------------------------------------------------------+\n", "| NVIDIA-SMI 450.51.05 Driver Version: 450.51.05 CUDA Version: 11.2 |\n", "|-------------------------------+----------------------+----------------------+\n", "| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |\n", "| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |\n", "| | | MIG M. |\n", "|===============================+======================+======================|\n", "| 0 Tesla V100-SXM2... On | 00000000:06:00.0 Off | 0 |\n", "| N/A 34C P0 57W / 300W | 0MiB / 16160MiB | 0% Default |\n", "| | | N/A |\n", "+-------------------------------+----------------------+----------------------+\n", "| 1 Tesla V100-SXM2... On | 00000000:85:00.0 Off | 0 |\n", "| N/A 30C P0 41W / 300W | 0MiB / 16160MiB | 0% Default |\n", "| | | N/A |\n", "+-------------------------------+----------------------+----------------------+\n", "| 2 Tesla V100-SXM2... On | 00000000:86:00.0 Off | 0 |\n", "| N/A 31C P0 41W / 300W | 0MiB / 16160MiB | 0% Default |\n", "| | | N/A |\n", "+-------------------------------+----------------------+----------------------+\n", "| 3 Tesla V100-SXM2... On | 00000000:89:00.0 Off | 0 |\n", "| N/A 33C P0 40W / 300W | 0MiB / 16160MiB | 0% Default |\n", "| | | N/A |\n", "+-------------------------------+----------------------+----------------------+\n", " \n", "+-----------------------------------------------------------------------------+\n", "| Processes: |\n", "| GPU GI CI PID Type Process name GPU Memory |\n", "| ID ID Usage |\n", "|=============================================================================|\n", "| No running processes found |\n", "+-----------------------------------------------------------------------------+\n" ] } ], "source": [ "!nvidia-smi" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "## verify nvlink active \n", "OUTPUT should look something simialr to the below -\n", "\n", " GPU 0: Tesla V100-SXM2-16GB (UUID: GPU-b29deceb-3745-51d2-2cf3-807ea8ac8e60)\n", " Link 0: 25.781 GB/s\n", " Link 1: 25.781 GB/s\n", " Link 2: 25.781 GB/s\n", " Link 3: 25.781 GB/s\n", " Link 4: 25.781 GB/s\n", " Link 5: 25.781 GB/s\n", " GPU 1: Tesla V100-SXM2-16GB (UUID: GPU-4de46420-3e95-182f-c0c3-d488dda562d8)\n", " Link 0: 25.781 GB/s\n", " Link 1: 25.781 GB/s\n", " Link 2: 25.781 GB/s\n", " Link 3: 25.781 GB/s\n", " Link 4: 25.781 GB/s\n", " Link 5: 25.781 GB/s" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "GPU 0: Tesla V100-SXM2-16GB (UUID: GPU-b29deceb-3745-51d2-2cf3-807ea8ac8e60)\n", "\t Link 0: 25.781 GB/s\n", "\t Link 1: 25.781 GB/s\n", "\t Link 2: 25.781 GB/s\n", "\t Link 3: 25.781 GB/s\n", "\t Link 4: 25.781 GB/s\n", "\t Link 5: 25.781 GB/s\n", "GPU 1: Tesla V100-SXM2-16GB (UUID: GPU-4de46420-3e95-182f-c0c3-d488dda562d8)\n", "\t Link 0: 25.781 GB/s\n", "\t Link 1: 25.781 GB/s\n", "\t Link 2: 25.781 GB/s\n", "\t Link 3: 25.781 GB/s\n", "\t Link 4: 25.781 GB/s\n", "\t Link 5: 25.781 GB/s\n", "GPU 2: Tesla V100-SXM2-16GB (UUID: GPU-8e9b4e82-ac7f-c189-cc17-045a3585def2)\n", "\t Link 0: 25.781 GB/s\n", "\t Link 1: 25.781 GB/s\n", "\t Link 2: 25.781 GB/s\n", "\t Link 3: 25.781 GB/s\n", "\t Link 4: 25.781 GB/s\n", "\t Link 5: 25.781 GB/s\n", "GPU 3: Tesla V100-SXM2-16GB (UUID: GPU-a3d96d2e-c606-b23f-e9e0-59a3a507fc10)\n", "\t Link 0: 25.781 GB/s\n", "\t Link 1: 25.781 GB/s\n", "\t Link 2: 25.781 GB/s\n", "\t Link 3: 25.781 GB/s\n", "\t Link 4: 25.781 GB/s\n", "\t Link 5: 25.781 GB/s\n" ] } ], "source": [ "# verify nvlink status\n", "!nvidia-smi nvlink --status" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "## verify profiling capability \n", "OUTPUT should look something simialr to the below\n", "note that we want all environment check pass ( = OK or available )\n", "\n", " Sampling Environment Check\n", " Linux Kernel Paranoid Level = 1: OK\n", " Linux Distribution = Ubuntu\n", " Linux Kernel Version = 4.15.0-112-generic: OK\n", " Linux perf_event_open syscall available: OK\n", " Sampling trigger event available: OK\n", " Intel(c) Last Branch Record support: Available\n", " Sampling Environment: OK" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Sampling Environment Check\n", "Linux Kernel Paranoid Level = 1: OK\n", "Linux Distribution = Ubuntu\n", "Linux Kernel Version = 4.15.0-112-generic: OK\n", "Linux perf_event_open syscall available: OK\n", "Sampling trigger event available: OK\n", "Intel(c) Last Branch Record support: Available\n", "Sampling Environment: OK\n" ] } ], "source": [ "# verify profiling capacility \n", "!nsys status -e" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "## making placeholder folders for dataset" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import os\n", "os.makedirs('./dataset/EN/32k', exist_ok=True)\n", "os.makedirs('./dataset/EN/50k', exist_ok=True)\n", "os.makedirs('./dataset/SV/32k', exist_ok=True)\n", "os.makedirs('./dataset/SV/56k', exist_ok=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Tutorial Outline\n", "\n", "The following contents will be covered during the Bootcamp :\n", "\n", "- [**Introduction and outlines of Day 2**](./jupyter_notebook/Day2_0_intro.ipynb)\n", " Megatron 101 in half a day \n", " - [Estimate hours/days needed to execute one end-to-end run per Megatron configuration](./jupyter_notebook/Day2-1_EstimateComputeDaysNeeded.ipynb)\n", " - [Understanding the core of Megatron - mpu ](./jupyter_notebook/Day2-2_MegatronFundementals.ipynb)\n", " - [About GPT's tokenizer](./jupyter_notebook/Day2-3_GPT_vocab_merge_files.ipynb)\n", " - [jsonfy and convert to mmap format](./jupyter_notebook/Day2-4_jsonfy_and_process2mmap.ipynb)\n", " - [Megatron runs vs config](./jupyter_notebook/Day2-5_Observe_GPT_runs_vs_performance.ipynb)\n", " - [challenge - the best profiler](./jupyter_notebook/Day2-5_Observe_GPT_runs_vs_performance.ipynb#TheChallenge)\n", "\n", "- [**Day 3 outlines **](./jupyter_notebook/Day3-0_overview.ipynb)\n", " Getting started on training your own Megatron GPT models !\n", " - [Fetch and extract Swedish data](./jupyter_notebook/Megatron-LM/tools/openwebtext/Day3-1_acquiring_data.ipynb)\n", " - [Find sentence boundary and deduplicate your data](./jupyter_notebook/Megatron-LM/tools/openwebtext/Day3-2_SentenceBoundary_and_Deduplicate.ipynb)\n", " - [mini challenge - approaching groundtruth](./jupyter_notebook/Megatron-LM/tools/openwebtext/Day3-2_SentenceBoundary_and_Deduplicate.ipynb#TheChallenge)\n", " - [Train your own GPTBPE Tokenizer on your own data ](./jupyter_notebook/Day3-3_train_own_GPT2BPETokenizer.ipynb)\n", " - [customize preprocess data python script and convert to mmap](./jupyter_notebook/Day3-4_customize_process2mmap.ipynb)\n", " - [The Challenge - Go Big or go home!](./jupyter_notebook/Day3-5_run_Megatron_with_varying_config.ipynb)\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Tutorial Duration\n", "The lab material will be presented in a 6hr session. Link to material is available for download at the end of the lab with the **exception of the CC-100 Swedish preprocessed data used in the labs**, however, one can download CC-100 data on your own in [CC-100 webpage](http://data.statmt.org/cc-100/) for various langauges!\n", "\n", "### Content Level\n", "Intermediate , Advanced\n", "\n", "### Target Audience and Prerequisites\n", "The target audience for this lab is researchers/graduate students and developers who are interested in learning about scaling their Deep learning systems to multiple GPUs to accelerate their scientific applications.\n", "\n", "Basic understanding on Deep learning is required, If you are new to Deep learning , it is recommended to go through the [AI for Climate Bootcamp](https://github.com/gpuhackathons-org/gpubootcamp/tree/master/hpc_ai/ai_science_climate) prior.\n", " \n", "**Disclaimer** : All the results mentioned in the notebooks were tested on a *DGX-1 machine equipped with 2 or 4 or 8 x Tesla V100 connected via NVLink*. The results would vary when using different hardware and would also depend on the Interconnect bandwidth and the thermal conditions of the machine." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "--- \n", "\n", "## Licensing\n", "\n", "This material is released by OpenACC-Standard.org, in collaboration with NVIDIA Corporation, under the Creative Commons Attribution 4.0 International (CC BY 4.0)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.8" }, "toc-autonumbering": false }, "nbformat": 4, "nbformat_minor": 4 }