{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Megatron GPT Bootcamp\n",
    "\n",
    "## Learning objectives"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The objective of this bootcamp is designed for training very large language models with NVIDIA [Megatron-LM](https://github.com/NVIDIA/Megatron-LM) in a step-wised manner. \n",
    "\n",
    "There are two labs, each with a focus point. \n",
    "\n",
    "In Lab 1, we will learn the default [Megatron-LM](https://github.com/NVIDIA/Megatron-LM) workflow, highlighting :\n",
    "\n",
    "   - How to calculate time-to-compute needs for resource planning.\n",
    "    \n",
    "   - Understanding Megatron-LM's core engine - Model Parallel Unit(MPU)\n",
    "    \n",
    "   - Profiling : core concepts on GPUs performance across multi-gpus and/or multi-node runs.\n",
    "\n",
    "In Lab 2, the focus is shifted to the **customization** of [Megatron-LM](https://github.com/NVIDIA/Megatron-LM) workflow. We will walk through and exercise steps for customization of the Megatron-LM's workflow in order to address to local langauge needs.  \n",
    "\n",
    "\n",
    "* Standard: Python\n",
    "* Frameworks: Pytorch + Megatron-LM \n",
    "\n",
    "It is required to have more than one GPU for this bootcamp.\n",
    "\n",
    "This bootcamp is tested on 2 x A100 GPUs with 40G memory. One should also have [NVLink / NVSwitch](https://www.nvidia.com/en-in/data-center/nvlink/)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Start by checking available gpus in the environment using nvidia-smi "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!nvidia-smi"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Verify you have 2 x A100 GPUs, each with 40G memory, below is an example of expected outputs : \n",
    "\n",
    "            Wed Sep 15 09:14:15 2021       \n",
    "            +-----------------------------------------------------------------------------+\n",
    "            | NVIDIA-SMI 460.27.04    Driver Version: 460.27.04    CUDA Version: 11.2     |\n",
    "            |-------------------------------+----------------------+----------------------+\n",
    "            | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |\n",
    "            | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |\n",
    "            |                               |                      |               MIG M. |\n",
    "            |===============================+======================+======================|\n",
    "            |   0  A100-SXM4-40GB      On   | 00000000:07:00.0 Off |                    0 |\n",
    "            | N/A   24C    P0    57W / 400W |      0MiB / 40536MiB |      4%      Default |\n",
    "            |                               |                      |             Disabled |\n",
    "            +-------------------------------+----------------------+----------------------+\n",
    "            |   1  A100-SXM4-40GB      On   | 00000000:0F:00.0 Off |                    0 |\n",
    "            | N/A   24C    P0    53W / 400W |      0MiB / 40536MiB |      0%      Default |\n",
    "            |                               |                      |             Disabled |\n",
    "            +-------------------------------+----------------------+----------------------+\n",
    "\n",
    "            +-----------------------------------------------------------------------------+\n",
    "            | Processes:                                                                  |\n",
    "            |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |\n",
    "            |        ID   ID                                                   Usage      |\n",
    "            |=============================================================================|\n",
    "            |  No running processes found                                                 |\n",
    "            +-----------------------------------------------------------------------------+\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# verify nvlink status\n",
    "!nvidia-smi nvlink --status"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Verify NVlink is active, below is an example of expected outputs : \n",
    "\n",
    "        GPU 0: A100-SXM4-40GB (UUID: GPU-2e4d2105-718d-3b94-6f0f-25c148681e83)\n",
    "             Link 0: 25 GB/s\n",
    "             Link 1: 25 GB/s\n",
    "             Link 2: 25 GB/s\n",
    "             Link 3: 25 GB/s\n",
    "             Link 4: 25 GB/s\n",
    "             Link 5: 25 GB/s\n",
    "             Link 6: 25 GB/s\n",
    "             Link 7: 25 GB/s\n",
    "             Link 8: 25 GB/s\n",
    "             Link 9: 25 GB/s\n",
    "             Link 10: 25 GB/s\n",
    "             Link 11: 25 GB/s\n",
    "        GPU 1: A100-SXM4-40GB (UUID: GPU-49615223-919e-6f9f-ad79-69d86bc1a13b)\n",
    "             Link 0: 25 GB/s\n",
    "             Link 1: 25 GB/s\n",
    "             Link 2: 25 GB/s\n",
    "             Link 3: 25 GB/s\n",
    "             Link 4: 25 GB/s\n",
    "             Link 5: 25 GB/s\n",
    "             Link 6: 25 GB/s\n",
    "             Link 7: 25 GB/s\n",
    "             Link 8: 25 GB/s\n",
    "             Link 9: 25 GB/s\n",
    "             Link 10: 25 GB/s\n",
    "             Link 11: 25 GB/s"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!nsys status -e"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Verify profiling capability, the expected output should look something simialr to the below\n",
    "\n",
    "            Sampling Environment Check\n",
    "            Linux Kernel Paranoid Level = 2: OK\n",
    "            Linux Distribution = Ubuntu\n",
    "            Linux Kernel Version = 4.18.0-305.12.1.el8_4.x86_64: OK\n",
    "            Linux perf_event_open syscall available: OK\n",
    "            Sampling trigger event available: OK\n",
    "            Intel(c) Last Branch Record support: Not Available\n",
    "            Sampling Environment: OK"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "To start with, we need to create folders as placeholders for dataset. We are going to populate these folders later."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "os.makedirs('./dataset/EN/32k', exist_ok=True)\n",
    "os.makedirs('./dataset/EN/50k', exist_ok=True)\n",
    "os.makedirs('./dataset/SV/32k', exist_ok=True)\n",
    "os.makedirs('./dataset/SV/56k', exist_ok=True)\n",
    "os.makedirs('./sv_ckpt/', exist_ok=True)\n",
    "os.makedirs('./profiles/naive', exist_ok=True)\n",
    "os.makedirs('./profiles/2ndrun', exist_ok=True)\n",
    "os.makedirs('./profiles/SV', exist_ok=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "### Tutorial Outline\n",
    "\n",
    "The following contents will be covered during the Bootcamp :\n",
    "\n",
    "- **Outlines of Lab 1**\n",
    "    Megatron 101 in half a day - Please go through the below notebooks sequentially.\n",
    "    1. [WebCrawling to obtain raw text data](./jupyter_notebook/Megatron-LM/tools/openwebtext/Lab1-1_Website_scrapping.ipynb)\n",
    "    2. [Estimate hours/days needed to execute one end-to-end run per Megatron-LM's configuration](./jupyter_notebook/Lab1-2_EstimateComputeDaysNeeded.ipynb)\n",
    "    3. [Understanding the core of Megatron-LM - MPU ](./jupyter_notebook/Lab1-3_MegatronFundementals.ipynb)\n",
    "    4. [About GPT's tokenizer](./jupyter_notebook/Lab1-4_GPT_vocab_merge_files.ipynb)\n",
    "    5. [Jsonfy and convert to mmap format](./jupyter_notebook/Lab1-5_jsonfy_and_process2mmap.ipynb)\n",
    "    6. [Megatron runs vs config](./jupyter_notebook/Lab1-6_Observe_GPT_runs_vs_performance.ipynb)\n",
    "\n",
    "\n",
    "- **Outlines of Lab 2**\n",
    "    Getting started on training own language Megatron-LM GPT models -- Please go through the below notebooks sequentially.\n",
    "    1. [Fetch and extract Swedish data](./jupyter_notebook/Megatron-LM/tools/openwebtext/Lab2-1_acquiring_data.ipynb)\n",
    "    2. [Find sentence boundary and deduplicate your data](./jupyter_notebook/Megatron-LM/tools/openwebtext/Lab2-2_SentenceBoundary_and_Deduplicate.ipynb)\n",
    "    3. [Train your own GPTBPE Tokenizer on your own data ](./jupyter_notebook/Lab2-3_train_own_GPT2BPETokenizer.ipynb)\n",
    "    4. [customize preprocess data python script and convert to mmap](./jupyter_notebook/Lab2-4_customize_process2mmap.ipynb)\n",
    "    5. [The Challenge - Go Big or go home!](./jupyter_notebook/Lab2-5_run_Megatron_with_varying_config.ipynb)\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Tutorial Duration\n",
    "The lab material will be presented in a 12 hr session. Link to material is available for download at the end of the gpubootcamp. \n",
    "\n",
    "### Content Level\n",
    "Intermediate , Advanced\n",
    "\n",
    "### Target Audience and Prerequisites\n",
    "The target audience for this lab is researchers/graduate students and developers who are interested in learning about training very large language models on a super computing cluster.\n",
    "\n",
    "Basic understanding on Deep learning and Pytorch is required, if you are new to Deep learning and or new to Pytorch, it is recommended to go through the [Distributed_Deep_Learning bootcamp](https://github.com/gpuhackathons-org/gpubootcamp/tree/master/ai/Distributed_Deep_Learning/English/python) and [Pytorch tutorials](https://pytorch.org/tutorials/) as prior.\n",
    " \n",
    "**Disclaimer** : All the results mentioned in the notebooks were tested on a *DGX-2 machine equipped with 2 x A100 GPUs connected via NVLink*. The results would vary when using different hardware and would also depend on the Interconnect bandwidth and the thermal conditions of the machine."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "--- \n",
    "\n",
    "## Licensing\n",
    "\n",
    "This material is released by OpenACC-Standard.org, in collaboration with NVIDIA Corporation, under the Creative Commons Attribution 4.0 International (CC BY 4.0)."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.8"
  },
  "toc-autonumbering": false
 },
 "nbformat": 4,
 "nbformat_minor": 4
}