{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "special-singer",
   "metadata": {},
   "source": [
    "# Estimate Time\n",
    "---\n",
    "\n",
    "## Learning Objectives\n",
    "The goal of this lab is to estimate compute time needed for an end to end training run.\n",
    "\n",
    "**Motivation**: In order to request for computing resources for a training job on a cluster, one must provide information such as, the number of nodes/gpus and the estimated time of the training job run.\n",
    "\n",
    "Training time (in seconds) is approximated with this equation : 8*T*P/n*X\n",
    "\n",
    "- T = dataset size measured in numbers of tokens in the dataset\n",
    "- P = model parameters for GPT3 varients\n",
    "- n = number of GPUs in the compute cluster\n",
    "- x = achieved teraflops per GPU \n",
    "\n",
    "\n",
    "The above equation was extracted from this paper : [Efficient Large-Scale Language Model Training on GPU Clusters](https://arxiv.org/pdf/2104.04473.pdf)\n",
    "\n",
    "---------------------------------------------------------------------------------------------------\n",
    "\n",
    "Assets provided below for you convenience : \n",
    "\n",
    "<center><img src=\"./Megatron-LM/pics/GPT3_all.png\" width=\"700\"/></center>\n",
    "\n",
    "<center><img src=\"./Megatron-LM/pics/achieved_teraflops_per_gpu.JPG\" width=\"700\"/></center>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "complicated-reproduction",
   "metadata": {},
   "source": [
    "---\n",
    "## Sanity check - \n",
    "\n",
    "<left><img src=\"./Megatron-LM/pics/TrainingTimeEstimate.JPG\" width=\"500\"/></left>\n",
    "\n",
    "Two scenarios were extracted from the above paper (screenshot above) : [Efficient Large-Scale Language Model Training on GPU Clusters](https://arxiv.org/pdf/2104.04473.pdf) \n",
    "\n",
    "**Scenario 1** -\n",
    "\n",
    "T = 300Billion tokens # assumed data size measured in tokens\n",
    "\n",
    "P = 175 Billion GPT3 model\n",
    "\n",
    "n = 1024 GPUs\n",
    "\n",
    "x = 140 teraFLOP/s per GPU\n",
    "\n",
    "Question : How many hours/ days will you need given the scenaio above for you to compute an end to end training job ?\n",
    "\n",
    "Answer : We should observe around **34 days** for an end to end training run\n",
    "\n",
    "\n",
    "**Scenario 2** - \n",
    "\n",
    "T =  450 Billion tokens  \n",
    "\n",
    "P = 1 Trillion parameters GPT 3 model\n",
    "\n",
    "n = 3072 \n",
    "\n",
    "x = 163 teraFLOP/s per GPU \n",
    "\n",
    "Question: How many hours/ days will you need given this scenaio above for you to compute an end to end training job ?\n",
    "\n",
    "Answer: We should observe around **84 days** for an end to end training run\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "hundred-array",
   "metadata": {},
   "outputs": [],
   "source": [
    "# The following code block contain automatic functions which assist calculating time-to-compute for an end to end training run.\n",
    "import numpy as np\n",
    "# T = dataset size measured in numbers of tokens in the dataset\n",
    "# P = model parameters for GPT3 varients\n",
    "# n = number of GPUs in the compute cluster\n",
    "# x = achieved teraflops per GPU \n",
    "\n",
    "def calculate_days_needed(T , P , n ,x):\n",
    "    if x is None:\n",
    "        return 'not a good SuperPOD use case, let us try a bigger model :)'\n",
    "    else:        \n",
    "        tot=8*T*P\n",
    "        div=n*x\n",
    "        compute_sec=tot/div\n",
    "        #convert compute seconds to days\n",
    "        to_days=round(compute_sec/(3600*24),1)\n",
    "        return to_days\n",
    "## sanity check against the two scenarios above \n",
    "T=[300*1e+9, 450*1e+9]\n",
    "n=[1024,3072]\n",
    "GPT3_models_labels=[  'gpt3_175B','gpt3_1Trillion']\n",
    "GPT3_model_params=[ 175*1e+9,1*1e+12 ]\n",
    "GPT3_model_params_str=['175 Billion','1Trillion']\n",
    "#according to the table above\n",
    "GPT3_X=[140*1e+12,163*1e+12]\n",
    "print(\"all below are measured with dataset size **300 billion** measured in tokens \\n\")\n",
    "scene=1\n",
    "for gpt3_name, gpt3_params, gpt3_param_str, x, n_,t in zip(GPT3_models_labels,GPT3_model_params,GPT3_model_params_str, GPT3_X ,n,T):\n",
    "    days_needed=calculate_days_needed(t,gpt3_params,n_,x)\n",
    "    print(\" ----------------------------scenario {}-----------------------------------\".format(scene))\n",
    "    print(\" language model :{} with {} number of parameters , it will need {} days to compute \\n\".format(gpt3_name, gpt3_param_str, str(days_needed)))\n",
    "    scene+=1"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "noted-sense",
   "metadata": {},
   "source": [
    "Below is an example of expected outputs :\n",
    "\n",
    "     ----------------------------scenario 1-----------------------------------\n",
    "     language model :gpt3_175B with 175 Billion number of parameters , it will need 33.9 days to compute \n",
    "\n",
    "     ----------------------------scenario 2-----------------------------------\n",
    "     language model :gpt3_1Trillion with 1Trillion number of parameters , it will need 83.2 days to compute\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "indie-schema",
   "metadata": {},
   "source": [
    "---\n",
    "**Exercise** -\n",
    "\n",
    "For a GPT3 model size of 70B parameters with approximatedly 300 Billion tokens in an existing dataset\n",
    "You have requested 1/4 of the total number of gpus available in [BerzeLiUs](https://www.nsc.liu.se/support/systems/berzelius-getting-started/).\n",
    "\n",
    "\n",
    "Question -\n",
    "\n",
    "How many hours/days would you need to do an end to end training run ? \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cosmetic-gregory",
   "metadata": {},
   "outputs": [],
   "source": [
    "T=<FILL_IN> \n",
    "p=<FILL_IN> \n",
    "n=<FILL_IN> \n",
    "x=<FILL_IN> \n",
    "gpt3_params=<FILL_IN> \n",
    "calculate_days_needed(T,gpt3_params,n,x)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "viral-upper",
   "metadata": {},
   "source": [
    "--- \n",
    "\n",
    "## Links and Resources\n",
    "Don't forget to check out additional resources such as [Efficient Large-Scale Language Model Training on GPU Clusters](https://arxiv.org/pdf/2104.04473.pdf ), [Language Models are Few-Shot Learners](https://arxiv.org/pdf/2005.14165.pdf) and [Scaling Laws for Neural Language Models](https://arxiv.org/pdf/2001.08361.pdf)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "spiritual-dancing",
   "metadata": {},
   "source": [
    "-----\n",
    "## <p style=\"text-align:center;border:3px; padding: 1em\"> <a href=../Start_Here.ipynb>HOME</a>&nbsp; &nbsp; &nbsp; <a href=./Lab1-3_MegatronFundementals.ipynb>NEXT</a></p>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "silent-kruger",
   "metadata": {},
   "source": [
    "-----\n",
    "\n",
    "\n",
    "## Licensing \n",
    "\n",
    "This material is released by OpenACC-Standard.org, in collaboration with NVIDIA Corporation, under the Creative Commons Attribution 4.0 International (CC BY 4.0). "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}