# 

# 1_Estimate compute hours/days needed to execute one end-to-end run
---

## Learning Objectives
- **The goal of this lab is to:**
Understand how reserve compute resource per given data volume + model configuration for a training run. This is important not only for cluster capacity planning as well as for strategic research planning ( how many end to end experiments one can run given the compute capacity and duration )

    - Extracting the formular in from the paper [Efficient Large-Scale Language Model Training on GPU Clusters](https://arxiv.org/pdf/2104.04473.pdf), per given [GPT3 variants](https://arxiv.org/pdf/2005.14165.pdf) based on assumed [Teraflops reference table](https://arxiv.org/pdf/2104.04473.pdf)
    - Understanding how to estimate compute needed per dataset volume ( measured in number of tokens ) 
    - Apply to your own/ imagenary data volume and and compute cluster set-ups
---------------------------------------------------------------------------------------------------

- assuming the following information 
- T = dataset size measured in numbers of tokens in the dataset
- P = model parameters for GPT3 varients
- n = number of GPUs in the compute cluster
- x = achieved teraflops per GPU 

you will need the following tables from the above papers for the estimation 

<center><img src="./Megatron-LM/pics/GPT3_all.png" width="700"/></center>

<center><img src="./Megatron-LM/pics/achieved_teraflops_per_gpu.JPG" width="700"/></center>

<center><img src="./Megatron-LM/pics/TrainingTimeEstimate.JPG" width="500"/></center>

---
## let's do a sanity check 
scenario 1 - Given 300Billion tokens , 1024 GPUs, with 175 Billion model parmeters , assuming 140 teraFLOP/s per GPU 
we should observe around **34 days** for an end to end training run

scenario 2 - Given 450Billion tokens , 3072 GPUs, with 1 Trillion model parmeters , assuming 163 teraFLOP/s per GPU 
we should observe around **84 days** for an end to end training run


In [16]:
import numpy as np

def calculate_days_needed(T , P , n ,x):
    if x is None:
        return 'not a good SuperPOD use case, let us try a bigger model :)'
    else:
        #x=140*1e+12 # TeraFlop/s per GPU
        tot=8*T*P
        div=n*x
        compute_sec=tot/div
        #convert compute seconds to days
        to_days=round(compute_sec/(3600*24),1)
        return to_days
## sanity check against the paper reported figure above 
T=[300*1e+9, 450*1e+9]
n=[1024,3072]
GPT3_models_labels=[  'gpt3_175B','gpt3_1Trillion']
GPT3_model_params=[ 175*1e+9,1*1e+12 ]
GPT3_model_params_str=['175 Billion','1Trillion']
#according to the table above
GPT3_X=[140*1e+12,163*1e+12]
print("all below are measured with dataset size **300 billion** measured in tokens \n")
for gpt3_name, gpt3_params, gpt3_param_str, x, n_,t in zip(GPT3_models_labels,GPT3_model_params,GPT3_model_params_str, GPT3_X ,n,T):
    days_needed=calculate_days_needed(t,gpt3_params,n_,x)
    print(" ----------------------------------------------------------------------------------------")
    print(" language model :{} with {} number of parameters , it will need {} days to compute \n".format(gpt3_name, gpt3_param_str, str(days_needed)))


all below are measured with dataset size **300 billion** measured in tokens 

 ----------------------------------------------------------------------------------------
 language model :gpt3_175B with 175 Billion number of parameters , it will need 33.9 days to compute 

 ----------------------------------------------------------------------------------------
 language model :gpt3_1Trillion with 1Trillion number of parameters , it will need 83.2 days to compute 



---
## Exercise -
Question -
for a GPT3 model size of 70B parameters with approximatedly 300 Billion tokens in existing dataset
giveing a 1/4 of the BerzeLiUs compute avaialbility.   
how may hours/days would you need to compute 
when you are ready , check against the solution uncollapse 
**. . .**


In [5]:
T=300*1e+9 #oftokens in the dataset
n=int(480*0.25) # Berzelius Max 480 GPUs # number of GPUs in the compute cluster
x=140*1e+12
gpt3_params=70*1e+9
calculate_days_needed(T,gpt3_params,n,x)

115.7

---
## Up Next : 

[Understanding the core of Megatron - mpu ](./Day2-2_MegatronFundementals.ipynb)

## Back To Start Menu
[start menu](../Start_Here.ipynb)

-----


## Licensing 

This material is released by OpenACC-Standard.org, in collaboration with NVIDIA Corporation, under the Creative Commons Attribution 4.0 International (CC BY 4.0). 