{ "cells": [ { "cell_type": "markdown", "id": "emerging-victoria", "metadata": {}, "source": [ "# Estimate Time\n", "---\n", "\n", "## Learning Objectives\n", "The goal of this lab is to estimate compute time needed for an end to end training run.\n", "\n", "**Motivation**: In order to request for computing resources for a training job runs on a super computing cluster, one must provide information such as, the number of nodes/gpus requested as well as the estimated time for the training job run.\n", "\n", "Note: now that we obtained the toy text data via webscarpping from the previous lab, the next step is to request for compute resources. In order to train a very large langauge model, large compute resources must be requested and approved in advance. For the cluster admin to allocate resources to the training job, the applicant must provide minimal information such as the number of gpus necessary for the training job to run as well as estimate how long, i.e the time it takes, to compute one end to end training run.\n", "\n", "This is what we are trying to achieve in this notebook, estimate training time per given parameters below.\n", "\n", "Training time (in seconds) is approximated with this equation : 8*T*P/n*X\n", "\n", "- T = dataset size measured in numbers of tokens in the dataset\n", "- P = model parameters for GPT3 varients\n", "- n = number of GPUs in the compute cluster\n", "- x = achieved teraflops per GPU \n", "\n", "\n", "The above equation was extracted from this paper : [Efficient Large-Scale Language Model Training on GPU Clusters](https://arxiv.org/pdf/2104.04473.pdf)\n", "\n", "---------------------------------------------------------------------------------------------------\n", "\n", "Assets provided below for you convenience : \n", "\n", "