{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "frozen-circumstances",
   "metadata": {},
   "source": [
    "# Profiling Megatron-LM training\n",
    "---\n",
    "\n",
    "## Learning Objectives\n",
    "\n",
    "The goal of this lab is to profile the Megatron-LM's GPT model training runs with varying training configurations in order to ensure the GPUs performance across multi-GPUs or mult-nodes workload.\n",
    "\n",
    "\n",
    "**Motivation** : Why should we care about profiling ?\n",
    "  \n",
    "The estimated time-to-compute which we went through in `Lab1-2_EstimateComputeDaysNeeded.ipynb` is based on the assumption that the training run will have good GPUs performance across multi-GPUs or multi-nodes jobs. Bad training configurations could result in low or inconsistent GPUs utilization, which in turn, might prolong the training run.\n",
    "\n",
    "In this notebook, we will cover the following : \n",
    "\n",
    "    1. intro to NVIDIA profiling toolchain\n",
    "    2. Run profiling to record training runs - naive vs. improved runs\n",
    "  \n",
    "A challenge will be presented to you at the end of this notebook, you are tasked to beat the profile of the improved run.\n",
    "\n",
    "Use the knowledge gained from going through `Lab1-2_EstimateComputeDaysNeeded.ipynb` and the profiling lecture presentations, it will help you formulate strategies on training configuration in order to obtain winning profile.\n",
    "\n",
    "Note: TAs and the NVIDIA profile expert will be available during this session when you go through this notebook, do reach out to them if you have questions."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fatal-neutral",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "1. intro to NVIDIA profiling toolchain :\n",
    "\n",
    "<center><img src=\"./Megatron-LM/pics/NVprofilingToolchain.JPG\" width=\"800\"/></center>\n",
    "\n",
    "Note: We will be going through intro to NVIDIA profiling with a NVIDIA profiling expert in the lecture presentation."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dependent-corner",
   "metadata": {},
   "source": [
    "The Profiling Workflow :\n",
    "\n",
    "Profiling is an iterative process, we record the profiling run, visualize and analyze the profile in order to find area for improvement and act upon.\n",
    "\n",
    "<center><img src=\"./Megatron-LM/pics/profiling_workflow.JPG\" width=\"700\"/></center>\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "coated-manufacturer",
   "metadata": {},
   "source": [
    "In order to properly analyze the profile obtained via real training runs. \n",
    "First, we need to understanding how Megatron-LM launches the training job.\n",
    "\n",
    "            ------------ Call out terminals as below illustrated ------------------------\n",
    "<center><img src=\"./Megatron-LM/pics/Alt_callout2terminals.JPG\" width=\"600\"/></center>\n",
    "\n",
    "\n",
    "To do a live monitoring during profiling run.\n",
    "\n",
    "Examine the below [profilig video](https://youtu.be/bnN8ZohiZSI), this video will demonstrate how to call out and arrange 2 windows  within jupyter lab, then launch and monitor the profiling training runs with one window ( left ) print out the Megatron-LM training launching procedure, and the other window ( right ), show nvidia-smi live monitoring the performance of the GPUs. The video will also showcase how to call out the saved profile obtained from the training run. and visulize it using Nsight UI."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "beautiful-outreach",
   "metadata": {},
   "outputs": [],
   "source": [
    "from IPython.lib.display import YouTubeVideo\n",
    "YouTubeVideo('bnN8ZohiZSI', width=600, height=1000)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "continent-cause",
   "metadata": {},
   "source": [
    "Reference documents : \n",
    "\n",
    "[How to installing Nsight](https://developer.nvidia.com/gameworksdownload#?dn=nsight-systems-2021-4-1)\n",
    "\n",
    "[Nsight User Guide](https://docs.nvidia.com/nsight-systems/UserGuide/index.html)\n",
    "\n",
    "<center><img src=\"./Megatron-LM/pics/multigpu_naive_run.jpg\" width=\"1000\"/></center>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "nonprofit-affairs",
   "metadata": {},
   "source": [
    "Install nvtx library, note that the nvtx tags were already implemented in this repo for your convenience."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "difficult-experience",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install nvtx"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "quality-monitor",
   "metadata": {},
   "source": [
    "For the purpose of profiling, we will clean the following folders after each profiling run, in order to ensure trainining always start from scratch."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "subjective-cement",
   "metadata": {},
   "outputs": [],
   "source": [
    "!rm -fr ../sv_ckpt/*\n",
    "!rm -fr ../dataset/EN/*.npy"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "hidden-professor",
   "metadata": {},
   "source": [
    "After the lecture with NVIDIA profiling champion, we are now ready to try out our very first profiling Megatron-LM training job.\n",
    "\n",
    "We start by profiling a naive run with a default configuration.\n",
    "\n",
    "Note: the following were obtained from previous labs :\n",
    "\n",
    "CHECKPOINT_PATH='../sv_ckpt/' ## path to save the checkpoint of the training run\n",
    "\n",
    "DATA_PATH='../dataset/EN/NVblog_text_document' ## obtained from`Lab1-1` and `Lab1-5`\n",
    "\n",
    "VOCAB_FILE='../dataset/EN/50k/gpt2-vocab.json' ## obtained from`Lab1-4`\n",
    "\n",
    "MERGE_FILE='../dataset/EN/50k/gpt2-merges.txt' ## obtained from`Lab1-4`\n",
    "\n",
    "PROFILE_OUTPUT_PATH='../profiles/naive/nsys_naive' ## path to save the profiles of this training run\n",
    "\n",
    "\n",
    "\n",
    "To evoke profiling session, call nsys decorations followed by the normal Megatron-LM training launch script : \n",
    "\n",
    "<center><img src=\"./Megatron-LM/pics/evoke_nsys_profiling.JPG\" width=\"1000\"/></center>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "prompt-database",
   "metadata": {},
   "source": [
    "To examine the naive profiling run bash script, click on [open profile_naive_run.sh ](./Megatron-LM/profile_naive_run.sh)\n",
    "\n",
    "The following code block launches the naive profiling training run."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "individual-dictionary",
   "metadata": {},
   "outputs": [],
   "source": [
    "!bash ./Megatron-LM/profile_naive_run.sh"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "undefined-bryan",
   "metadata": {},
   "source": [
    "\n",
    "Below is an example of a successful profiling outputs :\n",
    "\n",
    "        [after training is done] datetime: 2021-09-15 10:17:46 \n",
    "        ------------------------------------------------------------------------------------------------------------------\n",
    "         validation loss at the end of training for val data | lm loss value: 8.895156E+00 | lm loss PPL: 7.296543E+03 | \n",
    "        ------------------------------------------------------------------------------------------------------------------\n",
    "        saving checkpoint at iteration      12 to ../sv_ckpt/\n",
    "          successfully saved checkpoint at iteration      12 to ../sv_ckpt/\n",
    "        *****************************************\n",
    "        Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. \n",
    "        *****************************************\n",
    "        Processing events...\n",
    "        Capturing symbol files...\n",
    "        Saving temporary \"/tmp/nsys-report-4642-8c23-394b-8c2e.qdstrm\" file to disk...\n",
    "        Creating final output files...\n",
    "\n",
    "        Processing [==============================================================100%]\n",
    "        Saved report file to \"/tmp/nsys-report-4642-8c23-394b-8c2e.qdrep\"\n",
    "        Report file moved to \"/proj/guest_at_nsc/users/zcharpy/gpubootcamp/ai/Megatron/English/Python/jupyter_notebook/../profiles/naive/nsys_naive.qdrep\" \n",
    "        \n",
    " \n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "clean-generation",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "Visualizing the profiles via Nsight UI. Below is an example of the naive run profile visualized with Nsight UI :\n",
    "\n",
    "\n",
    "Observe during training phrase, the GPUs utilizations are very low ( the light-blue bar ).\n",
    "<center><img src=\"./Megatron-LM/pics/GPUs_naive_run.JPG\" width=\"1000\"/></center>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "excessive-tract",
   "metadata": {},
   "source": [
    "Below is a ReRun cell for experimentation of varying training configurations in order to obtain different training profiles.\n",
    "\n",
    "Before each re-run, make sure you clear the checkpoint directory by running the blow code block to clear checkpoint files.\n",
    "<a id=\"Rerun_Cell\"></a>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "technical-input",
   "metadata": {},
   "outputs": [],
   "source": [
    "!rm -fr ../sv_ckpt/*"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dramatic-master",
   "metadata": {},
   "source": [
    "View/Modify the profile_2nd_run.sh, click to [open profile_2nd_run.sh](./Megatron-LM/profile_2nd_run.sh).\n",
    "\n",
    "After viewing/modification, run the below cell block to obtain a new profile."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "yellow-radical",
   "metadata": {},
   "outputs": [],
   "source": [
    "!bash ./Megatron-LM/profile_2nd_run.sh"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "received-blend",
   "metadata": {},
   "source": [
    "Below is an example of a successful profiling outputs :\n",
    "\n",
    "        > finished creating GPT datasets ...\n",
    "        [after dataloaders are built] datetime: 2021-09-16 19:19:01 \n",
    "        done with setup ...\n",
    "        time (ms) | model-and-optimizer-setup: 772.93 | train/valid/test-data-iterators-setup: 1032.39\n",
    "        training ...\n",
    "        [after training is done] datetime: 2021-09-16 19:19:01 \n",
    "        ------------------------------------------------------------------------------------------------------------------\n",
    "         validation loss at the end of training for val data | lm loss value: 1.126569E+01 | lm loss PPL: 7.809596E+04 | \n",
    "        ------------------------------------------------------------------------------------------------------------------\n",
    "        Processing events...\n",
    "        Capturing symbol files...\n",
    "        Saving temporary \"/tmp/nsys-report-3aa1-f1a6-09c2-c853.qdstrm\" file to disk...\n",
    "        Creating final output files...\n",
    "\n",
    "        Processing [==============================================================100%]\n",
    "        Saved report file to \"/tmp/nsys-report-3aa1-f1a6-09c2-c853.qdrep\"\n",
    "        Report file moved to \"/proj/guest_at_nsc/users/zcharpy/gpubootcamp/ai/Megatron/English/Python/jupyter_notebook/../profiles/2ndrun/nsys_improved.qdrep\"\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "conservative-jamaica",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "The improved profiling run output file visulized with Nsight UI :\n",
    "\n",
    "Observe during training phrase, the GPUs utilizations are improved and more consistent( the light-blue bar ).\n",
    "\n",
    "<center><img src=\"./Megatron-LM/pics/2ndrun.JPG\" width=\"1000\"/></center>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "moving-medicare",
   "metadata": {},
   "source": [
    "<a id=\"TheChallenge\"></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "comprehensive-birmingham",
   "metadata": {},
   "source": [
    "----------------\n",
    "\n",
    "## **The Challenge ** - get the best looking profile\n",
    "\n",
    "\n",
    "Constraints : \n",
    "\n",
    "        - Use the given # of gpus available ( 2 x A100 GPUs 40GB ) \n",
    "        - Only modify the parameters in the **modifiable section**\n",
    "        - Avoid OOM error\n",
    "        - training run must be finished and checkpoint must be saved successfully\n",
    "Task : \n",
    "      Given the above constraints, achieve a good looking profile. \n",
    "      \n",
    "The winning profile visulized on Nsight UI should look similar to the following : \n",
    "\n",
    "Observe the GPUs utilization are above 90% consistently ( the light-blue bars ) throughout **training** phrase ( the **dark-blue** bar ).\n",
    "      \n",
    "<center><img src=\"./Megatron-LM/pics/GoodLookingProfile.JPG\" width=\"1000\"/></center>\n",
    "\n",
    "Jump back to modify the [profiling bash script](./Megatron-LM/profile_2nd_run.sh) and rerun \n",
    "<a href=\"./Lab1-6_Observe_GPT_runs_vs_performance.ipynb#Rerun_Cell\">GO to ReRun Cell</a> \n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "resistant-hearing",
   "metadata": {},
   "source": [
    "--- \n",
    "## Links and Resources\n",
    "Don't forget to check out additional resources such as [NVIDIA Nsight Systems](https://docs.nvidia.com/nsight-systems/index.html), [NVTX Tutorial](https://developer.nvidia.com/blog/nvidia-tools-extension-api-nvtx-annotation-tool-for-profiling-code-in-python-and-c-c/) and [Nsight Systems](https://developer.nvidia.com/blog/transitioning-nsight-systems-nvidia-visual-profiler-nvprof/).\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "british-history",
   "metadata": {},
   "source": [
    "-----\n",
    "## <p style=\"text-align:center;border:3px; padding: 1em\"> <a href=../Start_Here.ipynb>HOME</a></p>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "maritime-statistics",
   "metadata": {},
   "source": [
    "-----\n",
    "\n",
    "\n",
    "## Licensing \n",
    "\n",
    "This material is released by OpenACC-Standard.org, in collaboration with NVIDIA Corporation, under the Creative Commons Attribution 4.0 International (CC BY 4.0). "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}