{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "passing-intersection",
   "metadata": {},
   "source": [
    "## Scale up model size\n",
    "---\n",
    "In previous notebooks, we downloaded and extracted our own Swedish raw text with `Lab2-1_acquiring_data.ipynb`; practiced filtering, cleaning and deduplicating the raw text data with `Lab2-2_SentenceBoundary_and_Deduplicate.ipynb` ; trained our own GPTBPETokenizer using the raw Swedish text with `Lab2-3_train_own_GPT2BPETokenizer.ipynb`; and converted the raw text to mmap format integrating a custom sentence-splitter in `Lab2-4_customize_process2mmap.ipynb`.\n",
    "\n",
    "We have learned all the essential components in order to customize Megatron-LM's default workflow  accommodating to specific language needs (in this case, Swedish). The obvious next step is to train the Megatron-LM GPT model with the processed Swedish data.\n",
    "However, constrainedt by how much compute resources one could get, i.e. the number of GPUs available for the training job, there is an upper limit of how big a model you can train.\n",
    "\n",
    "We will test ou t what size model we could train with 2 X A100 GPUs 40GB, by presenting a challenge!\n",
    "\n",
    "## **Challenge ** - Go big or go home !\n",
    "\n",
    "- Constraints : \n",
    "    - 2 x A100 GPUs 40G is allocated for this challenge.\n",
    "    - Only the parameters in the **##### Begin/End of modifiable blocks #####** are allowed to be changed.\n",
    "    - Avoid OOM !\n",
    "    - Training run must be finished and the checkpoint must be saved successfully.\n",
    "\n",
    "- Task : \n",
    "        Given the above constraints, train the biggest GPT model as possible.\n",
    "\n",
    "- Winning criteria : The biggest model wins!\n",
    "\n",
    "Note 1: Post the parameters you changed into the **##### Begin/End of modifiable blocks #####**  on bootcamp's slack channels for verification.\n",
    "\n",
    "Note 2: We purposefully turned-off nsys profiling in this challenge, because calling nsys profiling will introduce a small overhead, which will impact the maximum achievable model size.\n",
    "\n",
    "Go directly to the code block and modify the training configuration, click here to <a href=\"./Lab2-5_run_Megatron_with_varying_config.ipynb#MODIFY_CELL\">Jump to Code Cell and Modify Training Config</a> "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dated-lithuania",
   "metadata": {},
   "source": [
    "\n",
    "**Hint** :\n",
    "Use the knowledge gained from `Lab1-6_Observe_GPT_runs_vs_performance.ipynb`, especially the section with video demonstrating how to do live profiling during a live training run."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "narrative-population",
   "metadata": {},
   "source": [
    "Modify and rerun the code blocks below to obtain an even bigger GPT model. \n",
    "\n",
    "\n",
    "<a id=\"MODIFY_CELL\"></a>\n",
    "<a href=\"./Lab2-5_run_Megatron_with_varying_config.ipynb#Rerun_Cell\">Jump to ReRun Cell</a> "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "increased-difference",
   "metadata": {},
   "source": [
    "<a id=\"MODIFY_CELL\"></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "opposed-population",
   "metadata": {},
   "source": [
    "Always clean the checkpoint folder to ensure training starts from scratch."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "unsigned-banks",
   "metadata": {},
   "outputs": [],
   "source": [
    "!rm -fr ../sv_ckpt/* \n",
    "!rm -fr ../dataset/SV/*.npy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "relative-count",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%writefile ./Megatron-LM/SV_GPT_goingBIG.sh\n",
    "# Copyright (c) 2020 NVIDIA Corporation.  All rights reserved.\n",
    "# Copyright (c) 2020 NVIDIA Corporation.  All rights reserved.\n",
    "GPUS_PER_NODE=2 # <--- remember to change the number of GPUs you actually have in your system\n",
    "# Change for multinode config\n",
    "MASTER_ADDR=localhost\n",
    "MASTER_PORT=6000\n",
    "NNODES=1 #<-- currently we are using 1 node multigpus\n",
    "NODE_RANK=0\n",
    "WORLD_SIZE=2 # <--- remember to change the number of GPUs you actually have in your system\n",
    "\n",
    "### modify this section to point the file to its own path \n",
    "CHECKPOINT_PATH='../sv_ckpt/' ## modify this path if you customize it \n",
    "DATA_PATH='../dataset/SV/webnyheter2013_32kvocab_text_document' ## modify this path if you customize it \n",
    "VOCAB_FILE='../dataset/SV/32k/vocab.json' ## modify this path if you customize it \n",
    "MERGE_FILE='../dataset/SV/32k/merges.txt' ## modify this path if you customize it \n",
    "PROFILE_OUTPUT_PATH='../profiles/SV/nsys_improved2' # modify this to your own profile path\n",
    "\n",
    "################   Beginning of modifiable section    ####################\n",
    "TENSOR_MP_SIZE=<FILL_IN>\n",
    "PIPELINE_MP_SIZE=<FILL_IN>\n",
    "NUM_LYS=<FILL_IN>\n",
    "HIDDEN_SIZE=<FILL_IN>\n",
    "NUM_ATTN_HEADS=<FILL_IN>\n",
    "SEQ_LEN=<FILL_IN>\n",
    "MAX_POS_EM=<FILL_IN>\n",
    "MICRO_BZ=<FILL_IN>\n",
    "GLOBAL_BZ=<FILL_IN>\n",
    "\n",
    "##############   end of modifiable sectio, do NOT modify anything below this line    ####################\n",
    "\n",
    "export OMP_NUM_THREADS=1\n",
    "DISTRIBUTED_ARGS=\"--nproc_per_node $GPUS_PER_NODE --nnodes $NNODES --node_rank $NODE_RANK --master_addr $MASTER_ADDR --master_port $MASTER_PORT\"\n",
    "\n",
    "## for nsys run\n",
    "#nsys profile --stats=false --force-overwrite=true --duration=300 --trace=cudnn,cuda,osrt,nvtx -o $PROFILE_OUTPUT_PATH \\\n",
    "python -m torch.distributed.launch $DISTRIBUTED_ARGS \\\n",
    "    ./Megatron-LM/pretrain_gpt.py \\\n",
    "       --tensor-model-parallel-size ${TENSOR_MP_SIZE} \\\n",
    "       --pipeline-model-parallel-size ${PIPELINE_MP_SIZE} \\\n",
    "       --num-layers ${NUM_LYS} \\\n",
    "       --hidden-size ${HIDDEN_SIZE} \\\n",
    "       --num-attention-heads ${NUM_ATTN_HEADS} \\\n",
    "       --micro-batch-size ${MICRO_BZ} \\\n",
    "       --global-batch-size ${GLOBAL_BZ} \\\n",
    "       --seq-length ${SEQ_LEN} \\\n",
    "       --max-position-embeddings ${MAX_POS_EM} \\\n",
    "       --train-samples 100 \\\n",
    "       --save ${CHECKPOINT_PATH} \\\n",
    "       --load ${CHECKPOINT_PATH} \\\n",
    "       --data-path ${DATA_PATH} \\\n",
    "       --vocab-file ${VOCAB_FILE} \\\n",
    "       --merge-file ${MERGE_FILE} \\\n",
    "       --data-impl mmap \\\n",
    "       --split 949,50,1 \\\n",
    "       --distributed-backend nccl \\\n",
    "       --lr 0.00015 \\\n",
    "       --lr-decay-style cosine \\\n",
    "       --min-lr 1.0e-5 \\\n",
    "       --weight-decay 1e-2 \\\n",
    "       --clip-grad 1.0 \\\n",
    "       --lr-warmup-fraction .01 \\\n",
    "       --checkpoint-activations \\\n",
    "       --log-interval 10 \\\n",
    "       --save-interval 100 \\\n",
    "       --eval-interval 200 \\\n",
    "       --eval-iters 10 \\\n",
    "       --fp16"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "personalized-permit",
   "metadata": {},
   "source": [
    "Check how big your model is by modify the parameters in the [params_cnt.sh](./params_cnt.sh) to match the training parames above.\n",
    "\n",
    "I got 1.6 illion ! :)  what about you ?\n",
    "\n",
    "Modify the [params count](./params_cnt.sh) according to your training configuration.\n",
    "\n",
    "After modification, run the below bash script to obtain the model size."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "stuffed-possibility",
   "metadata": {},
   "outputs": [],
   "source": [
    "!bash params_cnt.sh "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "nominated-chuck",
   "metadata": {},
   "source": [
    "Below is an example of expected outputs:\n",
    "    \n",
    "        1 <-- One could get different number depend on your training config\n",
    "        1678049280 <-- One could get different number depend on your training config\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "primary-praise",
   "metadata": {},
   "source": [
    "Re-run this cell below to get an even bigger GPT model\n",
    "\n",
    "Remember to modify the [params count](./params_cnt.sh) to check how big your model is.\n",
    "\n",
    "Jump back and edit the SV_GPT_goingBIG.sh, click here to \n",
    "<a href=\"./Lab2-5_run_Megatron_with_varying_config.ipynb#MODIFY_CELL\">Jump back to modify and overwrite SV_GPT_goingBIG.sh </a> \n",
    "<a id=\"Rerun_Cell\"></a>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "divine-sierra",
   "metadata": {},
   "outputs": [],
   "source": [
    "!bash ./Megatron-LM/SV_GPT_goingBIG.sh"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "perfect-fisher",
   "metadata": {},
   "source": [
    "Below is an example of expected outputs:\n",
    "\n",
    "        > elapsed time for building blendable dataset indices: 0.00 (sec)\n",
    "        > finished creating GPT datasets ...\n",
    "        [after dataloaders are built] datetime: 2021-09-15 11:55:58 \n",
    "        done with setup ...\n",
    "        training ...\n",
    "        time (ms) | model-and-optimizer-setup: 929.42 | train/valid/test-data-iterators-setup: 1004.53\n",
    "        [after training is done] datetime: 2021-09-15 11:55:58 \n",
    "        ------------------------------------------------------------------------------------------------------------------\n",
    "         validation loss at the end of training for val data | lm loss value: 1.171452E+01 | lm loss PPL: 1.223352E+05 | \n",
    "        ------------------------------------------------------------------------------------------------------------------\n",
    "        Evaluating iter 10/10\n",
    "        -------------------------------------------------------------------------------------------------------------------\n",
    "         validation loss at the end of training for test data | lm loss value: 1.171400E+01 | lm loss PPL: 1.222719E+05 | \n",
    "        -------------------------------------------------------------------------------------------------------------------"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "chief-costume",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "## Links and Resources\n",
    "Don't forget to read more on [Language Models are Few-Shot Learners](https://arxiv.org/pdf/2005.14165.pdf) and [Efficient Large-Scale Language Model Training on GPU Clusters](https://arxiv.org/pdf/2104.04473.pdf)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "collectible-turkey",
   "metadata": {},
   "source": [
    "-----\n",
    "## <p style=\"text-align:center;border:3px; padding: 1em\"> <a href=../Start_Here.ipynb>HOME</a></p>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "legendary-forestry",
   "metadata": {},
   "source": [
    "-----\n",
    "\n",
    "\n",
    "## Licensing \n",
    "\n",
    "This material is released by OpenACC-Standard.org, in collaboration with NVIDIA Corporation, under the Creative Commons Attribution 4.0 International (CC BY 4.0). "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}