{ "cells": [ { "cell_type": "markdown", "id": "special-singer", "metadata": {}, "source": [ "# Estimate Time\n", "---\n", "\n", "## Learning Objectives\n", "The goal of this lab is to estimate compute time needed for an end to end training run.\n", "\n", "**Motivation**: In order to request for computing resources for a training job on a cluster, one must provide information such as, the number of nodes/gpus and the estimated time of the training job run.\n", "\n", "Training time (in seconds) is approximated with this equation : 8*T*P/n*X\n", "\n", "- T = dataset size measured in numbers of tokens in the dataset\n", "- P = model parameters for GPT3 varients\n", "- n = number of GPUs in the compute cluster\n", "- x = achieved teraflops per GPU \n", "\n", "\n", "The above equation was extracted from this paper : [Efficient Large-Scale Language Model Training on GPU Clusters](https://arxiv.org/pdf/2104.04473.pdf)\n", "\n", "---------------------------------------------------------------------------------------------------\n", "\n", "Assets provided below for you convenience : \n", "\n", "