{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Megatron GPT Bootcamp\n", "\n", "## Learning objectives" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This objective of this bootcamp is designed to onborad you with NVIDIA [Megatron-LM](https://github.com/NVIDIA/Megatron-LM) in a step-wised manner. We will give you the necessary tools and knoweldge to kick-start training your own language model. \n", "\n", "More specifically, In Day 2, We will learn the default [Megatron-LM](https://github.com/NVIDIA/Megatron-LM)'s workflow, highlighting :\n", "\n", " - Given a fixed dataset ( measured by # of tokens ) calculate compute needs in order to plan training runs and request resources.\n", " \n", " - Understanding Megatron-LM's core engine - Model Parallel Unit, this is the key which enable the possibility to train model with up to 1 trillion parameters on a superPOD.\n", " \n", " - Profiling : as we scale, it is important to maintain the performance of GPUs utilization across multi-gpus or multi-node runs.\n", "\n", "In Day 3, we will shift our focus on all the customization we need to incoporate into [Megatron-LM](https://github.com/NVIDIA/Megatron-LM)'s workflow, in order to cater for local langauge needs, in this case Swedish. We will give recommandations which can be optionally applied to your workflow and include some practical, useful scripts to help you kick-start your own journey in training local langauge Megatron GPT2/3 models. \n", "\n", "* Standard: Python\n", "* Frameworks: Pytorch + Megatron-LM \n", "\n", "It is required to have more than one GPU for the bootcamp and we recommend using a [DGX](https://www.nvidia.com/en-in/data-center/dgx-systems/) like cluster with [NVLink / NVSwitch](https://www.nvidia.com/en-in/data-center/nvlink/) support.\n", "\n", "Let's start with testing the GPUs you are running the code on in this bootcamp." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "### Check # of GPUs you have and GPU memory capacity \n", "\n", " Wed Aug 25 07:03:55 2021 \n", " +-----------------------------------------------------------------------------+\n", " | NVIDIA-SMI 450.51.05 Driver Version: 450.51.05 CUDA Version: 11.2 |\n", " |-------------------------------+----------------------+----------------------+\n", " | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |\n", " | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |\n", " | | | MIG M. |\n", " |===============================+======================+======================|\n", " | 0 Tesla V100-SXM2... On | 00000000:06:00.0 Off | 0 |\n", " | N/A 34C P0 57W / 300W | 0MiB / 16160MiB | 0% Default |\n", " | | | N/A |\n", " +-------------------------------+----------------------+----------------------+\n", " | 1 Tesla V100-SXM2... On | 00000000:85:00.0 Off | 0 |\n", " | N/A 30C P0 41W / 300W | 0MiB / 16160MiB | 0% Default |\n", " | | | N/A |\n", " +-------------------------------+----------------------+----------------------+\n", " +-----------------------------------------------------------------------------+\n", " | Processes: |\n", " | GPU GI CI PID Type Process name GPU Memory |\n", " | ID ID Usage |\n", " |=============================================================================|\n", " | No running processes found |\n", " +-----------------------------------------------------------------------------+\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Wed Sep 15 09:14:15 2021 \n", "+-----------------------------------------------------------------------------+\n", "| NVIDIA-SMI 460.27.04 Driver Version: 460.27.04 CUDA Version: 11.2 |\n", "|-------------------------------+----------------------+----------------------+\n", "| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |\n", "| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |\n", "| | | MIG M. |\n", "|===============================+======================+======================|\n", "| 0 A100-SXM4-40GB On | 00000000:07:00.0 Off | 0 |\n", "| N/A 24C P0 57W / 400W | 0MiB / 40536MiB | 4% Default |\n", "| | | Disabled |\n", "+-------------------------------+----------------------+----------------------+\n", "| 1 A100-SXM4-40GB On | 00000000:0F:00.0 Off | 0 |\n", "| N/A 24C P0 53W / 400W | 0MiB / 40536MiB | 0% Default |\n", "| | | Disabled |\n", "+-------------------------------+----------------------+----------------------+\n", " \n", "+-----------------------------------------------------------------------------+\n", "| Processes: |\n", "| GPU GI CI PID Type Process name GPU Memory |\n", "| ID ID Usage |\n", "|=============================================================================|\n", "| No running processes found |\n", "+-----------------------------------------------------------------------------+\n" ] } ], "source": [ "!nvidia-smi" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "### Verify NVlink is active \n", "OUTPUT should look simialr to the below -\n", "\n", " GPU 0: Tesla V100-SXM2-16GB (UUID: GPU-b29deceb-3745-51d2-2cf3-807ea8ac8e60)\n", " Link 0: 25.781 GB/s\n", " Link 1: 25.781 GB/s\n", " Link 2: 25.781 GB/s\n", " Link 3: 25.781 GB/s\n", " Link 4: 25.781 GB/s\n", " Link 5: 25.781 GB/s\n", " GPU 1: Tesla V100-SXM2-16GB (UUID: GPU-4de46420-3e95-182f-c0c3-d488dda562d8)\n", " Link 0: 25.781 GB/s\n", " Link 1: 25.781 GB/s\n", " Link 2: 25.781 GB/s\n", " Link 3: 25.781 GB/s\n", " Link 4: 25.781 GB/s\n", " Link 5: 25.781 GB/s" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "GPU 0: A100-SXM4-40GB (UUID: GPU-2e4d2105-718d-3b94-6f0f-25c148681e83)\n", "\t Link 0: 25 GB/s\n", "\t Link 1: 25 GB/s\n", "\t Link 2: 25 GB/s\n", "\t Link 3: 25 GB/s\n", "\t Link 4: 25 GB/s\n", "\t Link 5: 25 GB/s\n", "\t Link 6: 25 GB/s\n", "\t Link 7: 25 GB/s\n", "\t Link 8: 25 GB/s\n", "\t Link 9: 25 GB/s\n", "\t Link 10: 25 GB/s\n", "\t Link 11: 25 GB/s\n", "GPU 1: A100-SXM4-40GB (UUID: GPU-49615223-919e-6f9f-ad79-69d86bc1a13b)\n", "\t Link 0: 25 GB/s\n", "\t Link 1: 25 GB/s\n", "\t Link 2: 25 GB/s\n", "\t Link 3: 25 GB/s\n", "\t Link 4: 25 GB/s\n", "\t Link 5: 25 GB/s\n", "\t Link 6: 25 GB/s\n", "\t Link 7: 25 GB/s\n", "\t Link 8: 25 GB/s\n", "\t Link 9: 25 GB/s\n", "\t Link 10: 25 GB/s\n", "\t Link 11: 25 GB/s\n" ] } ], "source": [ "### verify nvlink status\n", "!nvidia-smi nvlink --status" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "### Verify Profiling Capability \n", "OUTPUT should look something simialr to the below\n", "note that we want all environment check pass ( = OK or available )\n", "\n", " Sampling Environment Check\n", " Linux Kernel Paranoid Level = 1: OK\n", " Linux Distribution = Ubuntu\n", " Linux Kernel Version = 4.15.0-112-generic: OK\n", " Linux perf_event_open syscall available: OK\n", " Sampling trigger event available: OK\n", " Intel(c) Last Branch Record support: Available\n", " Sampling Environment: OK" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Sampling Environment Check\n", "Linux Kernel Paranoid Level = 2: OK\n", "Linux Distribution = Ubuntu\n", "Linux Kernel Version = 4.18.0-305.12.1.el8_4.x86_64: OK\n", "Linux perf_event_open syscall available: OK\n", "Sampling trigger event available: OK\n", "Intel(c) Last Branch Record support: Not Available\n", "Sampling Environment: OK\n" ] } ], "source": [ "# verify profiling capacility \n", "!nsys status -e" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "### Making Placeholder folders for dataset" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import os\n", "os.makedirs('./dataset/EN/32k', exist_ok=True)\n", "os.makedirs('./dataset/EN/50k', exist_ok=True)\n", "os.makedirs('./dataset/SV/32k', exist_ok=True)\n", "os.makedirs('./dataset/SV/56k', exist_ok=True)\n", "os.makedirs('./sv_ckpt/', exist_ok=True)\n", "os.makedirs('./profiles/naive', exist_ok=True)\n", "os.makedirs('./profiles/2ndrun', exist_ok=True)\n", "os.makedirs('./profiles/SV', exist_ok=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "## Create Your Own Data - Web Crawling \n", "It is mandatory to fetch your own data via web crawling NVIDIA blogs webpages, extracting raw text from the webpage. \n", "Please make sure you go through the notebook **[link here](./jupyter_notebook/Megatron-LM/tools/openwebtext/Day3-Website_scrapping.ipynb)** to scrape raw text from NVIDIA blogs' webpages. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "### Tutorial Outline\n", "\n", "The following contents will be covered during the Bootcamp :\n", "\n", "- **Outlines of Day 2**\n", " Megatron 101 in half a day \n", " - [Estimate hours/days needed to execute one end-to-end run per Megatron configuration](./jupyter_notebook/Day2-1_EstimateComputeDaysNeeded.ipynb)\n", " - [Understanding the core of Megatron - mpu ](./jupyter_notebook/Day2-2_MegatronFundementals.ipynb)\n", " - [About GPT's tokenizer](./jupyter_notebook/Day2-3_GPT_vocab_merge_files.ipynb)\n", " - [jsonfy and convert to mmap format](./jupyter_notebook/Day2-4_jsonfy_and_process2mmap.ipynb)\n", " - [Megatron runs vs config](./jupyter_notebook/Day2-5_Observe_GPT_runs_vs_performance.ipynb)\n", " - [challenge - the best profiler](./jupyter_notebook/Day2-5_Observe_GPT_runs_vs_performance.ipynb#TheChallenge)\n", "\n", "- **Outlines of Day 3**\n", " Getting started on training your own Megatron GPT models !\n", " - [Fetch and extract Swedish data](./jupyter_notebook/Megatron-LM/tools/openwebtext/Day3-1_acquiring_data.ipynb)\n", " - [Find sentence boundary and deduplicate your data](./jupyter_notebook/Megatron-LM/tools/openwebtext/Day3-2_SentenceBoundary_and_Deduplicate.ipynb)\n", " - [mini challenge - approaching groundtruth](./jupyter_notebook/Megatron-LM/tools/openwebtext/Day3-2_SentenceBoundary_and_Deduplicate.ipynb#TheChallenge)\n", " - [Train your own GPTBPE Tokenizer on your own data ](./jupyter_notebook/Day3-3_train_own_GPT2BPETokenizer.ipynb)\n", " - [customize preprocess data python script and convert to mmap](./jupyter_notebook/Day3-4_customize_process2mmap.ipynb)\n", " - [The Challenge - Go Big or go home!](./jupyter_notebook/Day3-5_run_Megatron_with_varying_config.ipynb)\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Tutorial Duration\n", "The lab material will be presented in a 8 hr session. Link to material is available for download at the end of the gpubootcamp. \n", "\n", "### Content Level\n", "Intermediate , Advanced\n", "\n", "### Target Audience and Prerequisites\n", "The target audience for this lab is researchers/graduate students and developers who are interested in learning about scaling their Deep learning systems to multiple GPUs to accelerate their scientific applications.\n", "\n", "Basic understanding on Deep learning is required, If you are new to Deep learning , it is recommended to go through the [Distributed_Deep_Learning bootcamp](https://github.com/gpuhackathons-org/gpubootcamp/tree/master/ai/Distributed_Deep_Learning/English/python) prior.\n", " \n", "**Disclaimer** : All the results mentioned in the notebooks were tested on a *DGX-1 machine equipped with 2 or 4 or 8 x Tesla V100 connected via NVLink*. The results would vary when using different hardware and would also depend on the Interconnect bandwidth and the thermal conditions of the machine." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "--- \n", "\n", "## Licensing\n", "\n", "This material is released by OpenACC-Standard.org, in collaboration with NVIDIA Corporation, under the Creative Commons Attribution 4.0 International (CC BY 4.0)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.8" }, "toc-autonumbering": false }, "nbformat": 4, "nbformat_minor": 4 }