{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "alike-prisoner",
   "metadata": {},
   "source": [
    "## Scale up model size\n",
    "---\n",
    "In previous notebooks, we downloaded and extracted our own Swedish raw text with `Lab2-1_acquiring_data.ipynb`; practiced filter, clean and deduplicate the raw text data with `Lab2-2_SentenceBoundary_and_Deduplicate.ipynb` ; trained our own GPTBPETokenizer and fitted to the raw Swedish text with `Lab2-3_train_own_GPT2BPETokenizer.ipynb`; converted the raw text to mmap format integrating a custom sentence-splitter in `Lab2-4_customize_process2mmap.ipynb`.\n",
    "\n",
    "We have learned all the essential components in order to customize Megatron-LM's default workflow in order to accommodate to specific langauge needs ( in this case, it is Swedish ). The obvious next step is to train the Megatron-LM GPT model with the processed Swedish data. \n",
    "\n",
    "However, constraint by how much compute resources one could get, that is, the number of GPUs available for the training job, there is an upper limit of how big a model you can train.\n",
    "\n",
    "We will test ou thow big a model we could train with 2 X A100 GPUs 40GB, by presenting a Challenge!\n",
    "\n",
    "## **Challenge ** - Go big or go home !\n",
    "\n",
    "- Constraints : \n",
    "    - 2 x A100 GPUs 40G is allocated for this challenge.\n",
    "    - Only the parameters in the **##### Begin/End of modifiable blocks #####** are allowed to be changed.\n",
    "    - Avoid OOM !\n",
    "    - Training run must be finished and checkpoint must be saved successfully.\n",
    "\n",
    "- Task : \n",
    "        Given the above constraints, train as BIG a GPT model as possible.\n",
    "\n",
    "- Winning criteria : The biggest model wins given the above constraints.\n",
    "\n",
    "Note 1: Post the parameters you changed into the **##### Begin/End of modifiable blocks #####**  on bootcamp's slack channels for verification.\n",
    "\n",
    "Note 2: We purposefully turned-off nsys profiling in this challenge, because calling nsys profiling will introduce a small overhead, which will impact the maximum achievable model size.\n",
    "\n",
    "Go directly to the code block and modify training configuration, click here to <a href=\"./Lab2-5_run_Megatron_with_varying_config.ipynb#MODIFY_CELL\">Jump to Code Cell and Modify Training Config</a> "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "material-finland",
   "metadata": {},
   "source": [
    "\n",
    "**Hint** :\n",
    "Use the knowledge gained from `Lab1-6_Observe_GPT_runs_vs_performance.ipynb`, especially the section with video demonstrating how to do live profiling during a live training run."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "driven-drawing",
   "metadata": {},
   "source": [
    "Modify and rerun the code blocks below to obtain a even bigger GPT model. \n",
    "\n",
    "\n",
    "<a id=\"MODIFY_CELL\"></a>\n",
    "<a href=\"./Lab2-5_run_Megatron_with_varying_config.ipynb#Rerun_Cell\">Jump to ReRun Cell</a> "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "proprietary-marketing",
   "metadata": {},
   "source": [
    "<a id=\"MODIFY_CELL\"></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "adjustable-engineer",
   "metadata": {},
   "source": [
    "Always clean the checkpoint folder to ensure trainining start from scratch."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "other-parts",
   "metadata": {},
   "outputs": [],
   "source": [
    "!rm -fr ../sv_ckpt/* \n",
    "!rm -fr ../dataset/SV/*.npy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "invisible-pepper",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%writefile ./Megatron-LM/SV_GPT_goingBIG.sh\n",
    "# Copyright (c) 2020 NVIDIA Corporation.  All rights reserved.\n",
    "# Copyright (c) 2020 NVIDIA Corporation.  All rights reserved.\n",
    "GPUS_PER_NODE=2 # <--- remember to change the number of GPUs you actually have in your system\n",
    "# Change for multinode config\n",
    "MASTER_ADDR=localhost\n",
    "MASTER_PORT=6000\n",
    "NNODES=1 #<-- currently we are using 1 node multigpus\n",
    "NODE_RANK=0\n",
    "WORLD_SIZE=2 # <--- remember to change the number of GPUs you actually have in your system\n",
    "\n",
    "### modify this section to point the file to its own path \n",
    "CHECKPOINT_PATH='../sv_ckpt/' ## modify this path if you customize it \n",
    "DATA_PATH='../dataset/SV/webnyheter2013_32kvocab_text_document' ## modify this path if you customize it \n",
    "VOCAB_FILE='../dataset/SV/32k/vocab.json' ## modify this path if you customize it \n",
    "MERGE_FILE='../dataset/SV/32k/merges.txt' ## modify this path if you customize it \n",
    "PROFILE_OUTPUT_PATH='../profiles/SV/nsys_improved2' # modify this to your own profile path\n",
    "\n",
    "################   Beginning of modifiable section    ####################\n",
    "TENSOR_MP_SIZE=<FILL_IN>\n",
    "PIPELINE_MP_SIZE=<FILL_IN>\n",
    "NUM_LYS=<FILL_IN>\n",
    "HIDDEN_SIZE=<FILL_IN>\n",
    "NUM_ATTN_HEADS=<FILL_IN>\n",
    "SEQ_LEN=<FILL_IN>\n",
    "MAX_POS_EM=<FILL_IN>\n",
    "MICRO_BZ=<FILL_IN>\n",
    "GLOBAL_BZ=<FILL_IN>\n",
    "\n",
    "##############   end of modifiable sectio, do NOT modify anything below this line    ####################\n",
    "\n",
    "export OMP_NUM_THREADS=1\n",
    "DISTRIBUTED_ARGS=\"--nproc_per_node $GPUS_PER_NODE --nnodes $NNODES --node_rank $NODE_RANK --master_addr $MASTER_ADDR --master_port $MASTER_PORT\"\n",
    "\n",
    "## for nsys run\n",
    "#nsys profile --stats=false --force-overwrite=true --duration=300 --trace=cudnn,cuda,osrt,nvtx -o $PROFILE_OUTPUT_PATH \\\n",
    "python -m torch.distributed.launch $DISTRIBUTED_ARGS \\\n",
    "    ./Megatron-LM/pretrain_gpt.py \\\n",
    "       --tensor-model-parallel-size ${TENSOR_MP_SIZE} \\\n",
    "       --pipeline-model-parallel-size ${PIPELINE_MP_SIZE} \\\n",
    "       --num-layers ${NUM_LYS} \\\n",
    "       --hidden-size ${HIDDEN_SIZE} \\\n",
    "       --num-attention-heads ${NUM_ATTN_HEADS} \\\n",
    "       --micro-batch-size ${MICRO_BZ} \\\n",
    "       --global-batch-size ${GLOBAL_BZ} \\\n",
    "       --seq-length ${SEQ_LEN} \\\n",
    "       --max-position-embeddings ${MAX_POS_EM} \\\n",
    "       --train-samples 100 \\\n",
    "       --save ${CHECKPOINT_PATH} \\\n",
    "       --load ${CHECKPOINT_PATH} \\\n",
    "       --data-path ${DATA_PATH} \\\n",
    "       --vocab-file ${VOCAB_FILE} \\\n",
    "       --merge-file ${MERGE_FILE} \\\n",
    "       --data-impl mmap \\\n",
    "       --split 949,50,1 \\\n",
    "       --distributed-backend nccl \\\n",
    "       --lr 0.00015 \\\n",
    "       --lr-decay-style cosine \\\n",
    "       --min-lr 1.0e-5 \\\n",
    "       --weight-decay 1e-2 \\\n",
    "       --clip-grad 1.0 \\\n",
    "       --lr-warmup-fraction .01 \\\n",
    "       --checkpoint-activations \\\n",
    "       --log-interval 10 \\\n",
    "       --save-interval 100 \\\n",
    "       --eval-interval 200 \\\n",
    "       --eval-iters 10 \\\n",
    "       --fp16"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "formal-turner",
   "metadata": {},
   "source": [
    "Check how big is your model. By modify the parameters in the [params_cnt.sh](./params_cnt.sh) to match the training parames above.\n",
    "\n",
    "I got 1.6 Billion :)  what about you ?\n",
    "\n",
    "Modify the [params count](./params_cnt.sh) accoring to your training configuration.\n",
    "\n",
    "After modification, run the below bash script to obtain the model size."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "welcome-donor",
   "metadata": {},
   "outputs": [],
   "source": [
    "!bash params_cnt.sh "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "noticed-trinity",
   "metadata": {},
   "source": [
    "Below is an example of expected outputs:\n",
    "    \n",
    "        1 <-- One could get different number depend on your training config\n",
    "        1678049280 <-- One could get different number depend on your training config\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "convenient-ontario",
   "metadata": {},
   "source": [
    "Re-run this cell below to get an even bigger GPT model\n",
    "\n",
    "Remember to modify the [params count](./params_cnt.sh) to check how big is your model.\n",
    "\n",
    "Jump back and edit the SV_GPT_goingBIG.sh, click here to \n",
    "<a href=\"./Lab2-5_run_Megatron_with_varying_config.ipynb#MODIFY_CELL\">Jump back to modify and overwrite SV_GPT_goingBIG.sh </a> \n",
    "<a id=\"Rerun_Cell\"></a>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "representative-kentucky",
   "metadata": {},
   "outputs": [],
   "source": [
    "!bash ./Megatron-LM/SV_GPT_goingBIG.sh"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "unnecessary-african",
   "metadata": {},
   "source": [
    "Below is an example of expected outputs:\n",
    "\n",
    "        > elapsed time for building blendable dataset indices: 0.00 (sec)\n",
    "        > finished creating GPT datasets ...\n",
    "        [after dataloaders are built] datetime: 2021-09-15 11:55:58 \n",
    "        done with setup ...\n",
    "        training ...\n",
    "        time (ms) | model-and-optimizer-setup: 929.42 | train/valid/test-data-iterators-setup: 1004.53\n",
    "        [after training is done] datetime: 2021-09-15 11:55:58 \n",
    "        ------------------------------------------------------------------------------------------------------------------\n",
    "         validation loss at the end of training for val data | lm loss value: 1.171452E+01 | lm loss PPL: 1.223352E+05 | \n",
    "        ------------------------------------------------------------------------------------------------------------------\n",
    "        Evaluating iter 10/10\n",
    "        -------------------------------------------------------------------------------------------------------------------\n",
    "         validation loss at the end of training for test data | lm loss value: 1.171400E+01 | lm loss PPL: 1.222719E+05 | \n",
    "        -------------------------------------------------------------------------------------------------------------------"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "pretty-handle",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "## Links and Resources\n",
    "Don't forget to read more on [Language Models are Few-Shot Learners](https://arxiv.org/pdf/2005.14165.pdf) and [Efficient Large-Scale Language Model Training on GPU Clusters](https://arxiv.org/pdf/2104.04473.pdf)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "caroline-induction",
   "metadata": {},
   "source": [
    "-----\n",
    "## <p style=\"text-align:center;border:3px; padding: 1em\"> <a href=../Start_Here.ipynb>HOME</a></p>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ranking-pillow",
   "metadata": {},
   "source": [
    "-----\n",
    "\n",
    "\n",
    "## Licensing \n",
    "\n",
    "This material is released by OpenACC-Standard.org, in collaboration with NVIDIA Corporation, under the Creative Commons Attribution 4.0 International (CC BY 4.0). "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}