{ "cells": [ { "cell_type": "markdown", "id": "proud-packet", "metadata": {}, "source": [ "# \n", "\n", "# 1_Estimate compute hours/days needed to execute one end-to-end run\n", "---\n", "\n", "## Learning Objectives\n", "- **The goal of this lab is to:**\n", "Understand how reserve compute resource per given data volume + model configuration for a training run. This is important not only for cluster capacity planning as well as for strategic research planning ( how many end to end experiments one can run given the compute capacity and duration )\n", "\n", " - Extracting the formular in from the paper [Efficient Large-Scale Language Model Training on GPU Clusters](https://arxiv.org/pdf/2104.04473.pdf), per given [GPT3 variants](https://arxiv.org/pdf/2005.14165.pdf) based on assumed [Teraflops reference table](https://arxiv.org/pdf/2104.04473.pdf)\n", " - Understanding how to estimate compute needed per dataset volume ( measured in number of tokens ) \n", " - Apply to your own/ imagenary data volume and and compute cluster set-ups\n", "---------------------------------------------------------------------------------------------------\n", "\n", "- assuming the following information \n", "- T = dataset size measured in numbers of tokens in the dataset\n", "- P = model parameters for GPT3 varients\n", "- n = number of GPUs in the compute cluster\n", "- x = achieved teraflops per GPU \n", "\n", "you will need the following tables from the above papers for the estimation \n", "\n", "