{ "cells": [ { "cell_type": "markdown", "id": "strong-match", "metadata": {}, "source": [ "# Estimate compute hours/days needed to execute one end-to-end run\n", "---\n", "\n", "## Learning Objectives\n", "The goal of this lab is size the problem :\n", "Understanding how to calculate hours/days needed in order to reserve compute resources for the training job per given existing data volume and desired model size. \n", "It is important for both the admin in the compute cluster to do capacity forecasting and for researchers to plan their experiments strategically.\n", "\n", "- Extracting the formular from the paper [Efficient Large-Scale Language Model Training on GPU Clusters](https://arxiv.org/pdf/2104.04473.pdf), per given [GPT3 variants](https://arxiv.org/pdf/2005.14165.pdf) based on assumed [Teraflops reference table](https://arxiv.org/pdf/2104.04473.pdf)\n", "\n", "- Understanding how to estimate compute resource needed per dataset volume ( measured in # of tokens ) and a chosen model size\n", "\n", "- Apply to your own imagenary data volume and a figurative compute cluster set-ups\n", "---------------------------------------------------------------------------------------------------\n", "\n", "- assuming the following information \n", "- T = dataset size measured in numbers of tokens in the dataset\n", "- P = model parameters for GPT3 varients\n", "- n = number of GPUs in the compute cluster\n", "- x = achieved teraflops per GPU \n", "\n", "Training time (in seconds) is approximated with this equation : 8*T*P/n*X\n", "you will need the following tables from the above papers for the estimation \n", "\n", "