{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "proud-packet",
   "metadata": {},
   "source": [
    "# \n",
    "\n",
    "# 1_Estimate compute hours/days needed to execute one end-to-end run\n",
    "---\n",
    "\n",
    "## Learning Objectives\n",
    "- **The goal of this lab is to:**\n",
    "Understand how reserve compute resource per given data volume + model configuration for a training run. This is important not only for cluster capacity planning as well as for strategic research planning ( how many end to end experiments one can run given the compute capacity and duration )\n",
    "\n",
    "    - Extracting the formular in from the paper [Efficient Large-Scale Language Model Training on GPU Clusters](https://arxiv.org/pdf/2104.04473.pdf), per given [GPT3 variants](https://arxiv.org/pdf/2005.14165.pdf) based on assumed [Teraflops reference table](https://arxiv.org/pdf/2104.04473.pdf)\n",
    "    - Understanding how to estimate compute needed per dataset volume ( measured in number of tokens ) \n",
    "    - Apply to your own/ imagenary data volume and and compute cluster set-ups\n",
    "---------------------------------------------------------------------------------------------------\n",
    "\n",
    "- assuming the following information \n",
    "- T = dataset size measured in numbers of tokens in the dataset\n",
    "- P = model parameters for GPT3 varients\n",
    "- n = number of GPUs in the compute cluster\n",
    "- x = achieved teraflops per GPU \n",
    "\n",
    "you will need the following tables from the above papers for the estimation \n",
    "\n",
    "<center><img src=\"./Megatron-LM/pics/GPT3_all.png\" width=\"700\"/></center>\n",
    "\n",
    "<center><img src=\"./Megatron-LM/pics/achieved_teraflops_per_gpu.JPG\" width=\"700\"/></center>\n",
    "\n",
    "<center><img src=\"./Megatron-LM/pics/TrainingTimeEstimate.JPG\" width=\"500\"/></center>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "political-oriental",
   "metadata": {},
   "source": [
    "---\n",
    "## let's do a sanity check \n",
    "scenario 1 - Given 300Billion tokens , 1024 GPUs, with 175 Billion model parmeters , assuming 140 teraFLOP/s per GPU \n",
    "we should observe around **34 days** for an end to end training run\n",
    "\n",
    "scenario 2 - Given 450Billion tokens , 3072 GPUs, with 1 Trillion model parmeters , assuming 163 teraFLOP/s per GPU \n",
    "we should observe around **84 days** for an end to end training run\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "linear-collector",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "all below are measured with dataset size **300 billion** measured in tokens \n",
      "\n",
      " ----------------------------------------------------------------------------------------\n",
      " language model :gpt3_175B with 175 Billion number of parameters , it will need 33.9 days to compute \n",
      "\n",
      " ----------------------------------------------------------------------------------------\n",
      " language model :gpt3_1Trillion with 1Trillion number of parameters , it will need 83.2 days to compute \n",
      "\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "\n",
    "def calculate_days_needed(T , P , n ,x):\n",
    "    if x is None:\n",
    "        return 'not a good SuperPOD use case, let us try a bigger model :)'\n",
    "    else:\n",
    "        #x=140*1e+12 # TeraFlop/s per GPU\n",
    "        tot=8*T*P\n",
    "        div=n*x\n",
    "        compute_sec=tot/div\n",
    "        #convert compute seconds to days\n",
    "        to_days=round(compute_sec/(3600*24),1)\n",
    "        return to_days\n",
    "## sanity check against the paper reported figure above \n",
    "T=[300*1e+9, 450*1e+9]\n",
    "n=[1024,3072]\n",
    "GPT3_models_labels=[  'gpt3_175B','gpt3_1Trillion']\n",
    "GPT3_model_params=[ 175*1e+9,1*1e+12 ]\n",
    "GPT3_model_params_str=['175 Billion','1Trillion']\n",
    "#according to the table above\n",
    "GPT3_X=[140*1e+12,163*1e+12]\n",
    "print(\"all below are measured with dataset size **300 billion** measured in tokens \\n\")\n",
    "for gpt3_name, gpt3_params, gpt3_param_str, x, n_,t in zip(GPT3_models_labels,GPT3_model_params,GPT3_model_params_str, GPT3_X ,n,T):\n",
    "    days_needed=calculate_days_needed(t,gpt3_params,n_,x)\n",
    "    print(\" ----------------------------------------------------------------------------------------\")\n",
    "    print(\" language model :{} with {} number of parameters , it will need {} days to compute \\n\".format(gpt3_name, gpt3_param_str, str(days_needed)))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fancy-hollywood",
   "metadata": {},
   "source": [
    "---\n",
    "## Exercise -\n",
    "Question -\n",
    "for a GPT3 model size of 70B parameters with approximatedly 300 Billion tokens in existing dataset\n",
    "giveing a 1/4 of the BerzeLiUs compute avaialbility.   \n",
    "how may hours/days would you need to compute \n",
    "when you are ready , check against the solution uncollapse \n",
    "**. . .**\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "interior-technology",
   "metadata": {
    "collapsed": true,
    "jupyter": {
     "outputs_hidden": true,
     "source_hidden": true
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "115.7"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "T=300*1e+9 #oftokens in the dataset\n",
    "n=int(480*0.25) # Berzelius Max 480 GPUs # number of GPUs in the compute cluster\n",
    "x=140*1e+12\n",
    "gpt3_params=70*1e+9\n",
    "calculate_days_needed(T,gpt3_params,n,x)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "distinguished-electricity",
   "metadata": {},
   "source": [
    "---\n",
    "## Up Next : \n",
    "\n",
    "[Understanding the core of Megatron - mpu ](./Day2-2_MegatronFundementals.ipynb)\n",
    "\n",
    "## Back To Start Menu\n",
    "[start menu](../Start_Here.ipynb)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "brilliant-delta",
   "metadata": {},
   "source": [
    "-----\n",
    "\n",
    "\n",
    "## Licensing \n",
    "\n",
    "This material is released by OpenACC-Standard.org, in collaboration with NVIDIA Corporation, under the Creative Commons Attribution 4.0 International (CC BY 4.0). "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}