{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Megatron GPT Bootcamp\n",
    "\n",
    "## Learning objectives\n",
    "\n",
    "This objective of the bootcamp is to first, help you quickly go through one time the default Magatron workflow to let you familiarize on how Megatron works, thereafter we will be focus on catering to the specifics of local langauge needs, in this case Swedish. We will give recommandations/advices which can be optionally applied to your workflow and include some practical, useful scripts to help you kick-start your own journey in training local langauge Megatron GPT2/3 models. \n",
    "\n",
    "\n",
    "* Standard: Python\n",
    "* Frameworks: Pytorch + Megatron-LM \n",
    "\n",
    "It is required to have more than one GPU for the bootcamp and we recommend using a [DGX](https://www.nvidia.com/en-in/data-center/dgx-systems/) like cluster with [NVLink / NVSwitch](https://www.nvidia.com/en-in/data-center/nvlink/) support.\n",
    "\n",
    "Let's start with testing the GPUs you are running the code on in this bootcamp."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "## check how many GPUs you have and GPU Mem capacity \n",
    "\n",
    "            Wed Aug 25 07:03:55 2021       \n",
    "        +-----------------------------------------------------------------------------+\n",
    "        | NVIDIA-SMI 450.51.05    Driver Version: 450.51.05    CUDA Version: 11.2     |\n",
    "        |-------------------------------+----------------------+----------------------+\n",
    "        | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |\n",
    "        | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |\n",
    "        |                               |                      |               MIG M. |\n",
    "        |===============================+======================+======================|\n",
    "        |   0  Tesla V100-SXM2...  On   | 00000000:06:00.0 Off |                    0 |\n",
    "        | N/A   34C    P0    57W / 300W |      0MiB / 16160MiB |      0%      Default |\n",
    "        |                               |                      |                  N/A |\n",
    "        +-------------------------------+----------------------+----------------------+\n",
    "        |   1  Tesla V100-SXM2...  On   | 00000000:85:00.0 Off |                    0 |\n",
    "        | N/A   30C    P0    41W / 300W |      0MiB / 16160MiB |      0%      Default |\n",
    "        |                               |                      |                  N/A |\n",
    "        +-------------------------------+----------------------+----------------------+\n",
    "        +-----------------------------------------------------------------------------+\n",
    "        | Processes:                                                                  |\n",
    "        |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |\n",
    "        |        ID   ID                                                   Usage      |\n",
    "        |=============================================================================|\n",
    "        |  No running processes found                                                 |\n",
    "        +-----------------------------------------------------------------------------+\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Wed Aug 25 07:03:55 2021       \n",
      "+-----------------------------------------------------------------------------+\n",
      "| NVIDIA-SMI 450.51.05    Driver Version: 450.51.05    CUDA Version: 11.2     |\n",
      "|-------------------------------+----------------------+----------------------+\n",
      "| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |\n",
      "| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |\n",
      "|                               |                      |               MIG M. |\n",
      "|===============================+======================+======================|\n",
      "|   0  Tesla V100-SXM2...  On   | 00000000:06:00.0 Off |                    0 |\n",
      "| N/A   34C    P0    57W / 300W |      0MiB / 16160MiB |      0%      Default |\n",
      "|                               |                      |                  N/A |\n",
      "+-------------------------------+----------------------+----------------------+\n",
      "|   1  Tesla V100-SXM2...  On   | 00000000:85:00.0 Off |                    0 |\n",
      "| N/A   30C    P0    41W / 300W |      0MiB / 16160MiB |      0%      Default |\n",
      "|                               |                      |                  N/A |\n",
      "+-------------------------------+----------------------+----------------------+\n",
      "|   2  Tesla V100-SXM2...  On   | 00000000:86:00.0 Off |                    0 |\n",
      "| N/A   31C    P0    41W / 300W |      0MiB / 16160MiB |      0%      Default |\n",
      "|                               |                      |                  N/A |\n",
      "+-------------------------------+----------------------+----------------------+\n",
      "|   3  Tesla V100-SXM2...  On   | 00000000:89:00.0 Off |                    0 |\n",
      "| N/A   33C    P0    40W / 300W |      0MiB / 16160MiB |      0%      Default |\n",
      "|                               |                      |                  N/A |\n",
      "+-------------------------------+----------------------+----------------------+\n",
      "                                                                               \n",
      "+-----------------------------------------------------------------------------+\n",
      "| Processes:                                                                  |\n",
      "|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |\n",
      "|        ID   ID                                                   Usage      |\n",
      "|=============================================================================|\n",
      "|  No running processes found                                                 |\n",
      "+-----------------------------------------------------------------------------+\n"
     ]
    }
   ],
   "source": [
    "!nvidia-smi"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "## verify nvlink active \n",
    "OUTPUT should look something simialr to the below -\n",
    "\n",
    "        GPU 0: Tesla V100-SXM2-16GB (UUID: GPU-b29deceb-3745-51d2-2cf3-807ea8ac8e60)\n",
    "             Link 0: 25.781 GB/s\n",
    "             Link 1: 25.781 GB/s\n",
    "             Link 2: 25.781 GB/s\n",
    "             Link 3: 25.781 GB/s\n",
    "             Link 4: 25.781 GB/s\n",
    "             Link 5: 25.781 GB/s\n",
    "        GPU 1: Tesla V100-SXM2-16GB (UUID: GPU-4de46420-3e95-182f-c0c3-d488dda562d8)\n",
    "             Link 0: 25.781 GB/s\n",
    "             Link 1: 25.781 GB/s\n",
    "             Link 2: 25.781 GB/s\n",
    "             Link 3: 25.781 GB/s\n",
    "             Link 4: 25.781 GB/s\n",
    "             Link 5: 25.781 GB/s"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "GPU 0: Tesla V100-SXM2-16GB (UUID: GPU-b29deceb-3745-51d2-2cf3-807ea8ac8e60)\n",
      "\t Link 0: 25.781 GB/s\n",
      "\t Link 1: 25.781 GB/s\n",
      "\t Link 2: 25.781 GB/s\n",
      "\t Link 3: 25.781 GB/s\n",
      "\t Link 4: 25.781 GB/s\n",
      "\t Link 5: 25.781 GB/s\n",
      "GPU 1: Tesla V100-SXM2-16GB (UUID: GPU-4de46420-3e95-182f-c0c3-d488dda562d8)\n",
      "\t Link 0: 25.781 GB/s\n",
      "\t Link 1: 25.781 GB/s\n",
      "\t Link 2: 25.781 GB/s\n",
      "\t Link 3: 25.781 GB/s\n",
      "\t Link 4: 25.781 GB/s\n",
      "\t Link 5: 25.781 GB/s\n",
      "GPU 2: Tesla V100-SXM2-16GB (UUID: GPU-8e9b4e82-ac7f-c189-cc17-045a3585def2)\n",
      "\t Link 0: 25.781 GB/s\n",
      "\t Link 1: 25.781 GB/s\n",
      "\t Link 2: 25.781 GB/s\n",
      "\t Link 3: 25.781 GB/s\n",
      "\t Link 4: 25.781 GB/s\n",
      "\t Link 5: 25.781 GB/s\n",
      "GPU 3: Tesla V100-SXM2-16GB (UUID: GPU-a3d96d2e-c606-b23f-e9e0-59a3a507fc10)\n",
      "\t Link 0: 25.781 GB/s\n",
      "\t Link 1: 25.781 GB/s\n",
      "\t Link 2: 25.781 GB/s\n",
      "\t Link 3: 25.781 GB/s\n",
      "\t Link 4: 25.781 GB/s\n",
      "\t Link 5: 25.781 GB/s\n"
     ]
    }
   ],
   "source": [
    "# verify nvlink status\n",
    "!nvidia-smi nvlink --status"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "## verify profiling capability \n",
    "OUTPUT should look something simialr to the below\n",
    "note that we want all environment check pass ( = OK or available )\n",
    "\n",
    "            Sampling Environment Check\n",
    "            Linux Kernel Paranoid Level = 1: OK\n",
    "            Linux Distribution = Ubuntu\n",
    "            Linux Kernel Version = 4.15.0-112-generic: OK\n",
    "            Linux perf_event_open syscall available: OK\n",
    "            Sampling trigger event available: OK\n",
    "            Intel(c) Last Branch Record support: Available\n",
    "            Sampling Environment: OK"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Sampling Environment Check\n",
      "Linux Kernel Paranoid Level = 1: OK\n",
      "Linux Distribution = Ubuntu\n",
      "Linux Kernel Version = 4.15.0-112-generic: OK\n",
      "Linux perf_event_open syscall available: OK\n",
      "Sampling trigger event available: OK\n",
      "Intel(c) Last Branch Record support: Available\n",
      "Sampling Environment: OK\n"
     ]
    }
   ],
   "source": [
    "# verify profiling capacility \n",
    "!nsys status -e"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "## making placeholder folders for dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "os.makedirs('./dataset/EN/32k', exist_ok=True)\n",
    "os.makedirs('./dataset/EN/50k', exist_ok=True)\n",
    "os.makedirs('./dataset/SV/32k', exist_ok=True)\n",
    "os.makedirs('./dataset/SV/56k', exist_ok=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "# create your own data - web crawling \n",
    "please go through the notebook [link here](./jupyter_notebook/Megatron-LM/tools/openwebtext/Day3-Website_scrapping.ipynb) to scrape NVIDIA blog's data "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Tutorial Outline\n",
    "\n",
    "The following contents will be covered during the Bootcamp :\n",
    "\n",
    "- [**Introduction and outlines of Day 2**](./jupyter_notebook/Day2_0_intro.ipynb)\n",
    "    Megatron 101 in half a day \n",
    "    - [Estimate hours/days needed to execute one end-to-end run per Megatron configuration](./jupyter_notebook/Day2-1_EstimateComputeDaysNeeded.ipynb)\n",
    "    - [Understanding the core of Megatron - mpu ](./jupyter_notebook/Day2-2_MegatronFundementals.ipynb)\n",
    "    - [About GPT's tokenizer](./jupyter_notebook/Day2-3_GPT_vocab_merge_files.ipynb)\n",
    "    - [jsonfy and convert to mmap format](./jupyter_notebook/Day2-4_jsonfy_and_process2mmap.ipynb)\n",
    "    - [Megatron runs vs config](./jupyter_notebook/Day2-5_Observe_GPT_runs_vs_performance.ipynb)\n",
    "    - [challenge - the best profiler](./jupyter_notebook/Day2-5_Observe_GPT_runs_vs_performance.ipynb#TheChallenge)\n",
    "\n",
    "- [**Day 3 outlines **](./jupyter_notebook/Day3-0_overview.ipynb)\n",
    "    Getting started on training your own Megatron GPT models !\n",
    "    - [Fetch and extract Swedish data](./jupyter_notebook/Megatron-LM/tools/openwebtext/Day3-1_acquiring_data.ipynb)\n",
    "    - [Find sentence boundary and deduplicate your data](./jupyter_notebook/Megatron-LM/tools/openwebtext/Day3-2_SentenceBoundary_and_Deduplicate.ipynb)\n",
    "        - [mini challenge - approaching groundtruth](./jupyter_notebook/Megatron-LM/tools/openwebtext/Day3-2_SentenceBoundary_and_Deduplicate.ipynb#TheChallenge)\n",
    "    - [Train your own GPTBPE Tokenizer on your own data ](./jupyter_notebook/Day3-3_train_own_GPT2BPETokenizer.ipynb)\n",
    "    - [customize preprocess data python script and convert to mmap](./jupyter_notebook/Day3-4_customize_process2mmap.ipynb)\n",
    "    - [The Challenge - Go Big or go home!](./jupyter_notebook/Day3-5_run_Megatron_with_varying_config.ipynb)\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Tutorial Duration\n",
    "The lab material will be presented in a 8 hr session. Link to material is available for download at the end of the gpubootcamp. \n",
    "\n",
    "### Content Level\n",
    "Intermediate , Advanced\n",
    "\n",
    "### Target Audience and Prerequisites\n",
    "The target audience for this lab is researchers/graduate students and developers who are interested in learning about scaling their Deep learning systems to multiple GPUs to accelerate their scientific applications.\n",
    "\n",
    "Basic understanding on Deep learning is required, If you are new to Deep learning , it is recommended to go through the [Distributed_Deep_Learning bootcamp](https://github.com/gpuhackathons-org/gpubootcamp/tree/master/ai/Distributed_Deep_Learning/English/python) prior.\n",
    " \n",
    "**Disclaimer** : All the results mentioned in the notebooks were tested on a *DGX-1 machine equipped with 2 or 4 or 8 x Tesla V100 connected via NVLink*. The results would vary when using different hardware and would also depend on the Interconnect bandwidth and the thermal conditions of the machine."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "--- \n",
    "\n",
    "## Licensing\n",
    "\n",
    "This material is released by OpenACC-Standard.org, in collaboration with NVIDIA Corporation, under the Creative Commons Attribution 4.0 International (CC BY 4.0)."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.8"
  },
  "toc-autonumbering": false
 },
 "nbformat": 4,
 "nbformat_minor": 4
}