|
@@ -1,639 +0,0 @@
|
|
|
-{
|
|
|
- "cells": [
|
|
|
- {
|
|
|
- "cell_type": "markdown",
|
|
|
- "id": "alternate-collar",
|
|
|
- "metadata": {},
|
|
|
- "source": [
|
|
|
- "# \n",
|
|
|
- "\n",
|
|
|
- "# 5 Monitor GPT training performance with varying config\n",
|
|
|
- "---\n",
|
|
|
- "\n",
|
|
|
- "## Learning Objectives\n",
|
|
|
- "- **The goal of this lab is to monitor the performance of your training runs with different GPT training configurations **\n",
|
|
|
- " - motivation : why should we care ? \n",
|
|
|
- " \n",
|
|
|
- " Answer : bad config result in very low / inconsistent gpus utilizations which in turn, slow down training and therefore longer experiments per run, it's a lose-lose-lose situation on all sides.\n",
|
|
|
- " ![see example](./Megatron-LM/pics/naive_run.JPG)\n",
|
|
|
- " \n",
|
|
|
- " - example : naive run vs. improved run \n",
|
|
|
- " starts with multiGPUs --> multinode ( if we get at least 2 nodes per person / team ) \n",
|
|
|
- " - exercise : beat the record !\n",
|
|
|
- "\n",
|
|
|
- "it is possible to obtain more than 90% GPU utilizations overall with high tensorcore ops sustained throughout forward and backward training throughout all gpus used in training. \n"
|
|
|
- ]
|
|
|
- },
|
|
|
- {
|
|
|
- "cell_type": "code",
|
|
|
- "execution_count": null,
|
|
|
- "id": "fifty-swimming",
|
|
|
- "metadata": {},
|
|
|
- "outputs": [],
|
|
|
- "source": [
|
|
|
- "!rm -fr ./Megatron-LM/sv_ckpt/*"
|
|
|
- ]
|
|
|
- },
|
|
|
- {
|
|
|
- "cell_type": "markdown",
|
|
|
- "id": "copyrighted-belarus",
|
|
|
- "metadata": {},
|
|
|
- "source": [
|
|
|
- "## Let's verify the environment is ready "
|
|
|
- ]
|
|
|
- },
|
|
|
- {
|
|
|
- "cell_type": "code",
|
|
|
- "execution_count": null,
|
|
|
- "id": "chronic-bradley",
|
|
|
- "metadata": {},
|
|
|
- "outputs": [],
|
|
|
- "source": [
|
|
|
- "!nvidia-smi"
|
|
|
- ]
|
|
|
- },
|
|
|
- {
|
|
|
- "cell_type": "code",
|
|
|
- "execution_count": null,
|
|
|
- "id": "personalized-walker",
|
|
|
- "metadata": {},
|
|
|
- "outputs": [],
|
|
|
- "source": [
|
|
|
- "!nvidia-smi nvlink --status "
|
|
|
- ]
|
|
|
- },
|
|
|
- {
|
|
|
- "cell_type": "markdown",
|
|
|
- "id": "minimal-extreme",
|
|
|
- "metadata": {},
|
|
|
- "source": []
|
|
|
- },
|
|
|
- {
|
|
|
- "cell_type": "code",
|
|
|
- "execution_count": null,
|
|
|
- "id": "prostate-trouble",
|
|
|
- "metadata": {},
|
|
|
- "outputs": [],
|
|
|
- "source": []
|
|
|
- },
|
|
|
- {
|
|
|
- "cell_type": "code",
|
|
|
- "execution_count": 2,
|
|
|
- "id": "industrial-index",
|
|
|
- "metadata": {},
|
|
|
- "outputs": [
|
|
|
- {
|
|
|
- "name": "stdout",
|
|
|
- "output_type": "stream",
|
|
|
- "text": [
|
|
|
- "Collecting data...\n",
|
|
|
- "using world size: 8, data-parallel-size: 8, tensor-model-parallel size: 1, pipeline-model-parallel size: 1 \n",
|
|
|
- "using torch.float32 for parameters ...\n",
|
|
|
- "------------------------ arguments ------------------------\n",
|
|
|
- " accumulate_allreduce_grads_in_fp32 .............. False\n",
|
|
|
- " adam_beta1 ...................................... 0.9\n",
|
|
|
- " adam_beta2 ...................................... 0.999\n",
|
|
|
- " adam_eps ........................................ 1e-08\n",
|
|
|
- " adlr_autoresume ................................. False\n",
|
|
|
- " adlr_autoresume_interval ........................ 1000\n",
|
|
|
- " apply_query_key_layer_scaling ................... True\n",
|
|
|
- " apply_residual_connection_post_layernorm ........ False\n",
|
|
|
- " attention_dropout ............................... 0.1\n",
|
|
|
- " attention_softmax_in_fp32 ....................... False\n",
|
|
|
- " bert_binary_head ................................ True\n",
|
|
|
- " bert_load ....................................... None\n",
|
|
|
- " bf16 ............................................ False\n",
|
|
|
- " bias_dropout_fusion ............................. True\n",
|
|
|
- " bias_gelu_fusion ................................ True\n",
|
|
|
- " biencoder_projection_dim ........................ 0\n",
|
|
|
- " biencoder_shared_query_context_model ............ False\n",
|
|
|
- " block_data_path ................................. None\n",
|
|
|
- " checkpoint_activations .......................... True\n",
|
|
|
- " checkpoint_num_layers ........................... 1\n",
|
|
|
- " clip_grad ....................................... 1.0\n",
|
|
|
- " consumed_train_samples .......................... 0\n",
|
|
|
- " consumed_valid_samples .......................... 0\n",
|
|
|
- " data_impl ....................................... mmap\n",
|
|
|
- " data_parallel_size .............................. 8\n",
|
|
|
- " data_path ....................................... ['../dataset/EN/NVblogs_text_document']\n",
|
|
|
- " dataloader_type ................................. single\n",
|
|
|
- " DDP_impl ........................................ local\n",
|
|
|
- " decoder_seq_length .............................. None\n",
|
|
|
- " distribute_checkpointed_activations ............. False\n",
|
|
|
- " distributed_backend ............................. nccl\n",
|
|
|
- " embedding_path .................................. None\n",
|
|
|
- " encoder_seq_length .............................. 512\n",
|
|
|
- " eod_mask_loss ................................... False\n",
|
|
|
- " eval_interval ................................... 100\n",
|
|
|
- " eval_iters ...................................... 10\n",
|
|
|
- " evidence_data_path .............................. None\n",
|
|
|
- " exit_duration_in_mins ........................... None\n",
|
|
|
- " exit_interval ................................... None\n",
|
|
|
- " ffn_hidden_size ................................. 4096\n",
|
|
|
- " finetune ........................................ False\n",
|
|
|
- " fp16 ............................................ False\n",
|
|
|
- " fp16_lm_cross_entropy ........................... False\n",
|
|
|
- " fp32_residual_connection ........................ False\n",
|
|
|
- " global_batch_size ............................... 8\n",
|
|
|
- " hidden_dropout .................................. 0.1\n",
|
|
|
- " hidden_size ..................................... 1024\n",
|
|
|
- " hysteresis ...................................... 2\n",
|
|
|
- " ict_head_size ................................... None\n",
|
|
|
- " ict_load ........................................ None\n",
|
|
|
- " img_dim ......................................... 224\n",
|
|
|
- " indexer_batch_size .............................. 128\n",
|
|
|
- " indexer_log_interval ............................ 1000\n",
|
|
|
- " init_method_std ................................. 0.02\n",
|
|
|
- " init_method_xavier_uniform ...................... False\n",
|
|
|
- " initial_loss_scale .............................. 4294967296\n",
|
|
|
- " kv_channels ..................................... 64\n",
|
|
|
- " layernorm_epsilon ............................... 1e-05\n",
|
|
|
- " lazy_mpu_init ................................... None\n",
|
|
|
- " load ............................................ ./Megatron-LM/sv_ckpt/\n",
|
|
|
- " local_rank ...................................... 0\n",
|
|
|
- " log_batch_size_to_tensorboard ................... False\n",
|
|
|
- " log_interval .................................... 10\n",
|
|
|
- " log_learning_rate_to_tensorboard ................ True\n",
|
|
|
- " log_loss_scale_to_tensorboard ................... True\n",
|
|
|
- " log_num_zeros_in_grad ........................... False\n",
|
|
|
- " log_params_norm ................................. False\n",
|
|
|
- " log_timers_to_tensorboard ....................... False\n",
|
|
|
- " log_validation_ppl_to_tensorboard ............... False\n",
|
|
|
- " loss_scale ...................................... None\n",
|
|
|
- " loss_scale_window ............................... 1000\n",
|
|
|
- " lr .............................................. 0.00015\n",
|
|
|
- " lr_decay_iters .................................. None\n",
|
|
|
- " lr_decay_samples ................................ None\n",
|
|
|
- " lr_decay_style .................................. cosine\n",
|
|
|
- " lr_warmup_fraction .............................. 0.01\n",
|
|
|
- " lr_warmup_iters ................................. 0\n",
|
|
|
- " lr_warmup_samples ............................... 0\n",
|
|
|
- " make_vocab_size_divisible_by .................... 128\n",
|
|
|
- " mask_prob ....................................... 0.15\n",
|
|
|
- " masked_softmax_fusion ........................... True\n",
|
|
|
- " max_position_embeddings ......................... 512\n",
|
|
|
- " merge_file ...................................... ../dataset/EN/50k/gpt2-merges.txt\n",
|
|
|
- " micro_batch_size ................................ 1\n",
|
|
|
- " min_loss_scale .................................. 1.0\n",
|
|
|
- " min_lr .......................................... 1e-05\n",
|
|
|
- " mmap_warmup ..................................... False\n",
|
|
|
- " no_load_optim ................................... None\n",
|
|
|
- " no_load_rng ..................................... None\n",
|
|
|
- " no_save_optim ................................... None\n",
|
|
|
- " no_save_rng ..................................... None\n",
|
|
|
- " num_attention_heads ............................. 16\n",
|
|
|
- " num_channels .................................... 3\n",
|
|
|
- " num_classes ..................................... 1000\n",
|
|
|
- " num_layers ...................................... 16\n",
|
|
|
- " num_layers_per_virtual_pipeline_stage ........... None\n",
|
|
|
- " num_workers ..................................... 2\n",
|
|
|
- " onnx_safe ....................................... None\n",
|
|
|
- " openai_gelu ..................................... False\n",
|
|
|
- " optimizer ....................................... adam\n",
|
|
|
- " override_lr_scheduler ........................... False\n",
|
|
|
- " params_dtype .................................... torch.float32\n",
|
|
|
- " patch_dim ....................................... 16\n",
|
|
|
- " pipeline_model_parallel_size .................... 1\n",
|
|
|
- " query_in_block_prob ............................. 0.1\n",
|
|
|
- " rampup_batch_size ............................... None\n",
|
|
|
- " rank ............................................ 0\n",
|
|
|
- " reset_attention_mask ............................ False\n",
|
|
|
- " reset_position_ids .............................. False\n",
|
|
|
- " retriever_report_topk_accuracies ................ []\n",
|
|
|
- " retriever_score_scaling ......................... False\n",
|
|
|
- " retriever_seq_length ............................ 256\n",
|
|
|
- " sample_rate ..................................... 1.0\n",
|
|
|
- " save ............................................ ./Megatron-LM/sv_ckpt/\n",
|
|
|
- " save_interval ................................... 100\n",
|
|
|
- " scatter_gather_tensors_in_pipeline .............. True\n",
|
|
|
- " seed ............................................ 1234\n",
|
|
|
- " seq_length ...................................... 512\n",
|
|
|
- " sgd_momentum .................................... 0.9\n",
|
|
|
- " short_seq_prob .................................. 0.1\n",
|
|
|
- " split ........................................... 949,50,1\n",
|
|
|
- " tensor_model_parallel_size ...................... 1\n",
|
|
|
- " tensorboard_dir ................................. None\n",
|
|
|
- " tensorboard_log_interval ........................ 1\n",
|
|
|
- " tensorboard_queue_size .......................... 1000\n",
|
|
|
- " titles_data_path ................................ None\n",
|
|
|
- " tokenizer_type .................................. GPT2BPETokenizer\n",
|
|
|
- " train_iters ..................................... None\n",
|
|
|
- " train_samples ................................... 100\n",
|
|
|
- " use_checkpoint_lr_scheduler ..................... False\n",
|
|
|
- " use_contiguous_buffers_in_ddp ................... False\n",
|
|
|
- " use_cpu_initialization .......................... None\n",
|
|
|
- " use_one_sent_docs ............................... False\n",
|
|
|
- " virtual_pipeline_model_parallel_size ............ None\n",
|
|
|
- " vocab_extra_ids ................................. 0\n",
|
|
|
- " vocab_file ...................................... ../dataset/EN/50k/gpt2-vocab.json\n",
|
|
|
- " weight_decay .................................... 0.01\n",
|
|
|
- " world_size ...................................... 8\n",
|
|
|
- "-------------------- end of arguments ---------------------\n",
|
|
|
- "setting number of micro-batches to constant 1\n",
|
|
|
- "> building GPT2BPETokenizer tokenizer ...\n",
|
|
|
- " > padded vocab (size: 50257) with 47 dummy tokens (new size: 50304)\n",
|
|
|
- "> initializing torch distributed ...\n",
|
|
|
- "> initializing tensor model parallel with size 1\n",
|
|
|
- "> initializing pipeline model parallel with size 1\n",
|
|
|
- "> setting random seeds to 1234 ...\n",
|
|
|
- "> initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234\n",
|
|
|
- "> compiling dataset index builder ...\n",
|
|
|
- "make: Entering directory '/home/zcharpy/bootcamp/jupyter_notebook/Megatron-LM/megatron/data'\n",
|
|
|
- "make: Nothing to be done for 'default'.\n",
|
|
|
- "make: Leaving directory '/home/zcharpy/bootcamp/jupyter_notebook/Megatron-LM/megatron/data'\n",
|
|
|
- ">>> done with dataset index builder. Compilation time: 0.573 seconds\n",
|
|
|
- "WARNING: constraints for invoking optimized fused softmax kernel are not met. We default back to unfused kernel invocations.\n",
|
|
|
- "> compiling and loading fused kernels ...\n",
|
|
|
- "Detected CUDA files, patching ldflags\n",
|
|
|
- "Emitting ninja build file /home/zcharpy/bootcamp/jupyter_notebook/Megatron-LM/megatron/fused_kernels/build/build.ninja...\n",
|
|
|
- "Building extension module scaled_upper_triang_masked_softmax_cuda...\n",
|
|
|
- "Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)\n",
|
|
|
- "ninja: no work to do.\n",
|
|
|
- "Loading extension module scaled_upper_triang_masked_softmax_cuda...\n",
|
|
|
- "Detected CUDA files, patching ldflags\n",
|
|
|
- "Emitting ninja build file /home/zcharpy/bootcamp/jupyter_notebook/Megatron-LM/megatron/fused_kernels/build/build.ninja...\n",
|
|
|
- "Building extension module scaled_masked_softmax_cuda...\n",
|
|
|
- "Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)\n",
|
|
|
- "ninja: no work to do.\n",
|
|
|
- "Loading extension module scaled_masked_softmax_cuda...\n",
|
|
|
- "Detected CUDA files, patching ldflags\n",
|
|
|
- "Emitting ninja build file /home/zcharpy/bootcamp/jupyter_notebook/Megatron-LM/megatron/fused_kernels/build/build.ninja...\n",
|
|
|
- "Building extension module fused_mix_prec_layer_norm_cuda...\n",
|
|
|
- "Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)\n",
|
|
|
- "ninja: no work to do.\n",
|
|
|
- "Loading extension module fused_mix_prec_layer_norm_cuda...\n",
|
|
|
- ">>> done with compiling and loading fused kernels. Compilation time: 31.516 seconds\n",
|
|
|
- "time to initialize megatron (seconds): 13.548\n",
|
|
|
- "[after megatron is initialized] datetime: 2021-08-26 00:28:13 \n",
|
|
|
- "building GPT model ...\n",
|
|
|
- " > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 253577216\n",
|
|
|
- "setting training iterations to 12\n",
|
|
|
- "> learning rate decay style: cosine\n",
|
|
|
- "WARNING: could not find the metadata file ./Megatron-LM/sv_ckpt/latest_checkpointed_iteration.txt \n",
|
|
|
- " will not load any checkpoints and will start from random\n",
|
|
|
- "time (ms) | load-checkpoint: 30.87\n",
|
|
|
- "[after model, optimizer, and learning rate scheduler are built] datetime: 2021-08-26 00:28:14 \n",
|
|
|
- "> building train, validation, and test datasets ...\n",
|
|
|
- " > datasets target sizes (minimum size):\n",
|
|
|
- " train: 100\n",
|
|
|
- " validation: 80\n",
|
|
|
- " test: 80\n",
|
|
|
- "> building train, validation, and test datasets for GPT ...\n",
|
|
|
- " > building dataset index ...\n",
|
|
|
- " reading sizes...\n",
|
|
|
- " reading pointers...\n",
|
|
|
- " reading document index...\n",
|
|
|
- " creating numpy buffer of mmap...\n",
|
|
|
- " creating memory view of numpy buffer...\n",
|
|
|
- " > finished creating indexed dataset in 0.003097 seconds\n",
|
|
|
- " number of documents: 74\n",
|
|
|
- " > dataset split:\n",
|
|
|
- " train:\n",
|
|
|
- " document indices in [0, 70) total of 70 documents\n",
|
|
|
- " validation:\n",
|
|
|
- " document indices in [70, 74) total of 4 documents\n",
|
|
|
- " test:\n",
|
|
|
- " document indices in [74, 74) total of 0 documents\n",
|
|
|
- " > loading doc-idx mapping from ../dataset/EN/NVblogs_text_document_train_indexmap_100ns_512sl_1234s_doc_idx.npy\n",
|
|
|
- " > loading sample-idx mapping from ../dataset/EN/NVblogs_text_document_train_indexmap_100ns_512sl_1234s_sample_idx.npy\n",
|
|
|
- " > loading shuffle-idx mapping from ../dataset/EN/NVblogs_text_document_train_indexmap_100ns_512sl_1234s_shuffle_idx.npy\n",
|
|
|
- " loaded indexed file in 0.018 seconds\n",
|
|
|
- " total number of samples: 142\n",
|
|
|
- " total number of epochs: 1\n",
|
|
|
- " > loading doc-idx mapping from ../dataset/EN/NVblogs_text_document_valid_indexmap_80ns_512sl_1234s_doc_idx.npy\n",
|
|
|
- " > loading sample-idx mapping from ../dataset/EN/NVblogs_text_document_valid_indexmap_80ns_512sl_1234s_sample_idx.npy\n",
|
|
|
- " > loading shuffle-idx mapping from ../dataset/EN/NVblogs_text_document_valid_indexmap_80ns_512sl_1234s_shuffle_idx.npy\n",
|
|
|
- " loaded indexed file in 0.022 seconds\n",
|
|
|
- " total number of samples: 86\n",
|
|
|
- " total number of epochs: 11\n",
|
|
|
- "> finished creating GPT datasets ...\n",
|
|
|
- "[after dataloaders are built] datetime: 2021-08-26 00:28:24 \n",
|
|
|
- "done with setup ...\n",
|
|
|
- "training ...\n",
|
|
|
- "time (ms) | model-and-optimizer-setup: 548.83 | train/valid/test-data-iterators-setup: 10068.67\n",
|
|
|
- "[before the start of training step] datetime: 2021-08-26 00:28:24 \n",
|
|
|
- " iteration 10/ 12 | consumed samples: 80 | elapsed time per iteration (ms): 2141.9 | learning rate: 2.363E-05 | global batch size: 8 | lm loss: 9.601698E+00 | loss scale: 1.0 | grad norm: 1.856 | number of skipped iterations: 0 | number of nan iterations: 0 |\n",
|
|
|
- "time (ms) | forward-compute: 1252.78 | backward-compute: 643.07 | backward-params-all-reduce: 146.14 | backward-embedding-all-reduce: 0.06 | optimizer: 94.92 | batch-generator: 12.73\n",
|
|
|
- "[Rank 0] (after 10 iterations) memory (MB) | allocated: 3869.28369140625 | max allocated: 5229.60595703125 | reserved: 7306.0 | max reserved: 7306.0\n",
|
|
|
- "[after training is done] datetime: 2021-08-26 00:28:47 \n",
|
|
|
- "------------------------------------------------------------------------------------------------------------------saving checkpoint at iteration 12 to ./Megatron-LM/sv_ckpt/\n",
|
|
|
- "\n",
|
|
|
- " validation loss at the end of training for val data | lm loss value: 8.891883E+00 | lm loss PPL: 7.272700E+03 | \n",
|
|
|
- "------------------------------------------------------------------------------------------------------------------\n",
|
|
|
- " successfully saved checkpoint at iteration 12 to ./Megatron-LM/sv_ckpt/\n",
|
|
|
- "*****************************************\n",
|
|
|
- "Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. \n",
|
|
|
- "*****************************************\n",
|
|
|
- "Processing events...\n",
|
|
|
- "Capturing symbol files...\n",
|
|
|
- "Saving temporary \"/tmp/nsys-report-5e6b-aa12-9711-df33.qdstrm\" file to disk...\n",
|
|
|
- "Creating final output files...\n",
|
|
|
- "\n",
|
|
|
- "Processing [==============================================================100%]\n",
|
|
|
- "Saved report file to \"/tmp/nsys-report-5e6b-aa12-9711-df33.qdrep\"\n",
|
|
|
- "Exporting 1665820 events: [===============================================100%]\n",
|
|
|
- "\n",
|
|
|
- "Exported successfully to\n",
|
|
|
- "/tmp/nsys-report-5e6b-aa12-9711-df33.sqlite\n",
|
|
|
- "\n",
|
|
|
- "\n",
|
|
|
- "CUDA API Statistics:\n",
|
|
|
- "\n",
|
|
|
- " Time(%) Total Time (ns) Num Calls Average Minimum Maximum Name \n",
|
|
|
- " ------- --------------- --------- ----------- ------- ----------- -------------------------------\n",
|
|
|
- " 56.2 285320355235 1696 168231341.5 3530 29703799378 cudaDeviceSynchronize \n",
|
|
|
- " 29.6 149955233137 4392 34142812.6 4864 17084387260 cudaMalloc \n",
|
|
|
- " 7.9 39849705340 2960 13462738.3 1275 2244485039 cudaFree \n",
|
|
|
- " 1.7 8434557988 132848 63490.3 4159 108060983 cudaLaunchKernel \n",
|
|
|
- " 1.6 8056133957 1136 7091667.2 5385 73711234 cudaHostAlloc \n",
|
|
|
- " 1.4 7314617708 23844 306769.7 6186 758561184 cudaMemcpyAsync \n",
|
|
|
- " 0.4 1978894053 720 2748464.0 352582 35632270 cudaIpcOpenMemHandle \n",
|
|
|
- " 0.3 1558283230 1064 1464551.9 5957 26847462 cudaFreeHost \n",
|
|
|
- " 0.3 1519497814 720 2110413.6 240178 21756247 cudaIpcCloseMemHandle \n",
|
|
|
- " 0.2 869040030 15104 57537.1 7662 32041419 cuLaunchKernel \n",
|
|
|
- " 0.1 760874918 14944 50915.1 5032 103123066 cudaMemsetAsync \n",
|
|
|
- " 0.1 416824863 63658 6547.9 361 79176267 cudaStreamIsCapturing_v10000 \n",
|
|
|
- " 0.1 289499706 16483 17563.5 1293 40910490 cudaEventQuery \n",
|
|
|
- " 0.1 283699184 21864 12975.6 643 60138070 cudaEventRecord \n",
|
|
|
- " 0.0 163990222 44832 3657.9 279 24748108 cudaStreamGetCaptureInfo_v10010\n",
|
|
|
- " 0.0 149197647 512 291401.7 2613 37658462 cudaStreamCreateWithPriority \n",
|
|
|
- " 0.0 138131763 4504 30668.7 523 32220166 cudaEventDestroy \n",
|
|
|
- " 0.0 113211871 2732 41439.2 2572 30272284 cudaStreamSynchronize \n",
|
|
|
- " 0.0 75811639 2848 26619.3 3170 29446146 cudaMemset \n",
|
|
|
- " 0.0 68895475 40 1722386.9 101401 32335797 cuModuleLoadData \n",
|
|
|
- " 0.0 36296853 1824 19899.6 6299 6783812 cudaMemcpy \n",
|
|
|
- " 0.0 13397443 40 334936.1 21288 4248804 cuModuleUnload \n",
|
|
|
- " 0.0 12237981 4520 2707.5 381 2838236 cudaEventCreateWithFlags \n",
|
|
|
- " 0.0 11913924 2784 4279.4 1276 3291031 cudaStreamWaitEvent \n",
|
|
|
- " 0.0 100758 24 4198.3 1730 10252 cuInit \n",
|
|
|
- "\n",
|
|
|
- "\n",
|
|
|
- "\n",
|
|
|
- "CUDA Kernel Statistics:\n",
|
|
|
- "\n",
|
|
|
- " Time(%) Total Time (ns) Instances Average Minimum Maximum Name \n",
|
|
|
- " ------- --------------- --------- ------------ ------- ----------- ----------------------------------------------------------------------------------------------------\n",
|
|
|
- " 88.2 260838216539 88 2964070642.5 9664 29673260794 ncclKernel_AllReduce_RING_LL_Sum_uint8_t(ncclWorkElem) \n",
|
|
|
- " 6.4 18834798439 280 67267137.3 13952 423636076 ncclKernel_AllReduce_RING_LL_Sum_float(ncclWorkElem) \n",
|
|
|
- " 1.3 3737595881 13056 286274.2 100127 445790 volta_sgemm_128x32_tn \n",
|
|
|
- " 0.6 1922114218 6240 308031.1 88128 4093366 volta_sgemm_128x32_nt \n",
|
|
|
- " 0.6 1914679497 4528 422853.2 268063 4114387 volta_sgemm_128x64_tn \n",
|
|
|
- " 0.6 1628497447 6144 265054.9 91936 432223 volta_sgemm_128x32_nn \n",
|
|
|
- " 0.3 1019927165 16 63745447.8 12160 185146385 ncclKernel_AllReduce_RING_LL_Sum_int64_t(ncclWorkElem) \n",
|
|
|
- " 0.3 911914712 1632 558771.3 5824 807487 void multi_tensor_apply_kernel<TensorListMetadata<4>, AdamFunctor<float>, float, float, float, floa…\n",
|
|
|
- " 0.2 624933625 5888 106136.8 96128 125600 volta_sgemm_64x64_nn \n",
|
|
|
- " 0.2 516397176 9608 53746.6 2495 2498036 void at::native::vectorized_elementwise_kernel<4, at::native::MulScalarFunctor<float, float>, at::d…\n",
|
|
|
- " 0.1 370199942 96 3856249.4 3595413 4220668 volta_sgemm_64x32_sliced1x4_nn \n",
|
|
|
- " 0.1 365762037 5888 62119.9 53312 81952 volta_sgemm_64x64_tn \n",
|
|
|
- " 0.1 317760917 3072 103437.8 94048 120768 volta_sgemm_64x64_nt \n",
|
|
|
- " 0.1 291143645 5888 49447.0 45408 58304 void at::native::unrolled_elementwise_kernel<at::native::(anonymous namespace)::masked_fill_kernel<…\n",
|
|
|
- " 0.1 253378189 192 1319678.1 725760 1933403 void at::native::(anonymous namespace)::CatArrayBatchedCopy<float, unsigned int, 1, 128, 1>(float*,…\n",
|
|
|
- " 0.1 251234285 1440 174468.3 3585 228607 void multi_tensor_apply_kernel<TensorListMetadata<2>, ScaleFunctor<float, float>, float>(int, int v…\n",
|
|
|
- " 0.1 211542704 4352 48608.2 45728 53376 void (anonymous namespace)::softmax_warp_forward<float, float, float, 9, false>(float*, float const…\n",
|
|
|
- " 0.1 165985735 3168 52394.5 7648 57663 void at::native::(anonymous namespace)::fused_dropout_kernel_vec<float, float, unsigned int, 1, 4>(…\n",
|
|
|
- " 0.1 154413148 7088 21785.2 7424 259423 void at::native::unrolled_elementwise_kernel<at::native::AddFunctor<float>, at::detail::Array<char*…\n",
|
|
|
- " 0.0 126251751 6144 20548.8 12864 29088 void at::native::reduce_kernel<128, 4, at::native::ReduceOp<float, at::native::func_wrapper_t<float…\n",
|
|
|
- " 0.0 119034048 1344 88567.0 5120 108128 void multi_tensor_apply_kernel<TensorListMetadata<1>, L2NormFunctor<float>, float*, float*, bool, i…\n",
|
|
|
- " 0.0 118285804 6352 18621.8 2560 733214 void at::native::vectorized_elementwise_kernel<4, at::native::AddFunctor<float>, at::detail::Array<…\n",
|
|
|
- " 0.0 105684444 4352 24284.1 21791 37184 kernel_1 \n",
|
|
|
- " 0.0 98824724 1904 51903.7 2496 70079 void at::native::vectorized_elementwise_kernel<4, at::native::MulFunctor<float>, at::detail::Array<…\n",
|
|
|
- " 0.0 97740010 1536 63632.8 62784 68160 void (anonymous namespace)::softmax_warp_backward<float, float, float, 9, false>(float*, float cons…\n",
|
|
|
- " 0.0 82467209 8880 9286.8 6656 16928 void cuApplyLayerNorm<float, float, float>(float*, float*, float*, float const*, int, int, float, f…\n",
|
|
|
- " 0.0 76246947 1632 46719.9 5920 52928 void at::native::vectorized_elementwise_kernel<4, at::native::(anonymous namespace)::masked_scale_k…\n",
|
|
|
- " 0.0 66689595 5952 11204.6 8864 18848 kernel_2 \n",
|
|
|
- " 0.0 65696173 4560 14407.1 2335 344991 void at::native::vectorized_elementwise_kernel<4, at::native::FillFunctor<float>, at::detail::Array…\n",
|
|
|
- " 0.0 54335809 1536 35374.9 33120 40640 kernel_4 \n",
|
|
|
- " 0.0 51338187 4352 11796.5 8576 16288 void at::native::unrolled_elementwise_kernel<at::native::copy_device_to_device(at::TensorIterator&,…\n",
|
|
|
- " 0.0 45344126 176 257637.1 254302 260671 void at::native::unrolled_elementwise_kernel<at::native::DivFunctor<float>, at::detail::Array<char*…\n",
|
|
|
- " 0.0 44337536 176 251917.8 250559 253407 void at::native::vectorized_elementwise_kernel<4, at::native::exp_kernel_cuda(at::TensorIterator&):…\n",
|
|
|
- " 0.0 44304532 1536 28844.1 25920 36992 void at::native::(anonymous namespace)::CatArrayBatchedCopy<float, unsigned int, 4, 64, 64>(float*,…\n",
|
|
|
- " 0.0 33892087 3168 10698.3 8480 18304 void cuComputePartGradGammaBeta<float, float, float>(float const*, float const*, int, int, float co…\n",
|
|
|
- " 0.0 29677485 3168 9367.9 8032 18464 void cuComputeGradInput<float, float, float>(float const*, float const*, int, int, float const*, fl…\n",
|
|
|
- " 0.0 28793796 3072 9373.0 6816 16351 kernel_3 \n",
|
|
|
- " 0.0 25638427 176 145672.9 141248 151744 void at::native::reduce_kernel<512, 1, at::native::ReduceOp<float, at::native::MaxOps<float>, unsig…\n",
|
|
|
- " 0.0 24934651 192 129868.0 6208 254687 void at::native::unrolled_elementwise_kernel<at::native::MulFunctor<float>, at::detail::Array<char*…\n",
|
|
|
- " 0.0 24768005 624 39692.3 3872 133248 void at::native::reduce_kernel<512, 1, at::native::ReduceOp<float, at::native::func_wrapper_t<float…\n",
|
|
|
- " 0.0 14595499 520 28068.3 11168 322080 _ZN2at6native89_GLOBAL__N__65_tmpxft_00001007_00000000_13_DistributionNormal_compute_86_cpp1_ii_7d8…\n",
|
|
|
- " 0.0 14342499 3168 4527.3 4031 11296 void cuComputeGradGammaBeta<float, float>(float const*, float const*, int, int, int, float*, float*)\n",
|
|
|
- " 0.0 4617682 192 24050.4 18080 34848 void at::native::(anonymous namespace)::embedding_backward_feature_kernel<float, float, long>(long*…\n",
|
|
|
- " 0.0 4383059 352 12451.9 11072 14975 void at::native::(anonymous namespace)::indexSelectLargeIndex<float, long, unsigned int, 2, 2, -2, …\n",
|
|
|
- " 0.0 2511679 192 13081.7 10368 22560 kernel_0 \n",
|
|
|
- " 0.0 2431636 272 8939.8 7648 10272 void at::native::index_elementwise_kernel<128, 4, at::native::gpu_index_kernel<at::native::index_ke…\n",
|
|
|
- " 0.0 1403583 352 3987.5 2976 11584 void cub::DeviceReduceSingleTileKernel<cub::DeviceReducePolicy<bool, int, int, cub::Sum>::Policy600…\n",
|
|
|
- " 0.0 1350339 352 3836.2 2911 5216 void cub::DeviceSelectSweepKernel<cub::DispatchSelectIf<cub::CountingInputIterator<long, long>, cub…\n",
|
|
|
- " 0.0 1254812 176 7129.6 6464 9217 void at::native::triu_tril_kernel<float, int, false>(at::cuda::detail::TensorInfo<float, int>, at::…\n",
|
|
|
- " 0.0 1248508 448 2786.8 2399 3840 void (anonymous namespace)::elementwise_kernel_with_index<int, at::native::arange_cuda_out(at::Tens…\n",
|
|
|
- " 0.0 991701 272 3646.0 2752 4512 void at::native::vectorized_elementwise_kernel<4, at::native::DivFunctor<float>, at::detail::Array<…\n",
|
|
|
- " 0.0 942900 352 2678.7 2336 3552 void cub::DeviceCompactInitKernel<cub::ScanTileState<int, true>, int*>(cub::ScanTileState<int, true…\n",
|
|
|
- " 0.0 872668 96 9090.3 8352 10048 void at::native::unrolled_elementwise_kernel<at::native::copy_device_to_device(at::TensorIterator&,…\n",
|
|
|
- " 0.0 826148 176 4694.0 4352 5601 void at::native::vectorized_elementwise_kernel<4, at::native::BUnaryFunctor<at::native::CompareLTFu…\n",
|
|
|
- " 0.0 775099 96 8073.9 7424 8992 void at::native::index_elementwise_kernel<128, 4, at::native::gpu_index_kernel<at::native::index_pu…\n",
|
|
|
- " 0.0 765821 176 4351.3 3936 5184 void at::native::vectorized_elementwise_kernel<4, at::native::log_kernel_cuda(at::TensorIterator&):…\n",
|
|
|
- " 0.0 688480 176 3911.8 3456 5024 void at::native::unrolled_elementwise_kernel<at::native::BUnaryFunctor<at::native::CompareLTFunctor…\n",
|
|
|
- " 0.0 608608 176 3458.0 3136 4384 void at::native::unrolled_elementwise_kernel<at::native::BUnaryFunctor<at::native::CompareGEFunctor…\n",
|
|
|
- " 0.0 548866 176 3118.6 2912 3680 void at::native::vectorized_elementwise_kernel<4, at::native::BUnaryFunctor<at::native::AddFunctor<…\n",
|
|
|
- " 0.0 543391 176 3087.4 2816 4000 void at::native::vectorized_elementwise_kernel<4, at::native::BitwiseOrFunctor<bool>, at::detail::A…\n",
|
|
|
- " 0.0 502590 176 2855.6 2655 3488 void at::native::vectorized_elementwise_kernel<4, at::native::FillFunctor<long>, at::detail::Array<…\n",
|
|
|
- " 0.0 383140 96 3991.0 3744 4480 cleanup(float*, float*, float*, float*, bool, int) \n",
|
|
|
- " 0.0 319140 96 3324.4 2912 16032 void at::native::vectorized_elementwise_kernel<4, at::native::BUnaryFunctor<at::native::AddFunctor<…\n",
|
|
|
- " 0.0 313667 96 3267.4 2912 3968 void at::native::vectorized_elementwise_kernel<4, at::native::(anonymous namespace)::pow_tensor_sca…\n",
|
|
|
- " 0.0 287259 96 2992.3 2784 3328 void at::native::vectorized_elementwise_kernel<4, at::native::AUnaryFunctor<at::native::AddFunctor<…\n",
|
|
|
- "\n",
|
|
|
- "\n",
|
|
|
- "\n",
|
|
|
- "CUDA Memory Operation Statistics (by time):\n",
|
|
|
- "\n",
|
|
|
- " Time(%) Total Time (ns) Operations Average Minimum Maximum Operation \n",
|
|
|
- " ------- --------------- ---------- --------- ------- --------- ------------------\n",
|
|
|
- " 89.1 3465094446 1380 2510938.0 1343 757281996 [CUDA memcpy DtoH]\n",
|
|
|
- " 9.9 385763414 21152 18237.7 2431 527263 [CUDA memcpy DtoD]\n",
|
|
|
- " 0.8 31439598 17792 1767.1 1247 18912 [CUDA memset] \n",
|
|
|
- " 0.2 6520963 3136 2079.4 1247 274623 [CUDA memcpy HtoD]\n",
|
|
|
- "\n",
|
|
|
- "\n",
|
|
|
- "\n",
|
|
|
- "CUDA Memory Operation Statistics (by size in KiB):\n",
|
|
|
- "\n",
|
|
|
- " Total Operations Average Minimum Maximum Operation \n",
|
|
|
- " ------------- ---------- -------- ------- ---------- ------------------\n",
|
|
|
- " 4433452.563 17792 249.182 0.004 10240.000 [CUDA memset] \n",
|
|
|
- " 2971617.438 1380 2153.346 0.004 201216.000 [CUDA memcpy DtoH]\n",
|
|
|
- " 19824.688 3136 6.322 0.004 2048.000 [CUDA memcpy HtoD]\n",
|
|
|
- " 120454945.375 21152 5694.731 0.004 201216.000 [CUDA memcpy DtoD]\n",
|
|
|
- "\n",
|
|
|
- "\n",
|
|
|
- "\n",
|
|
|
- "Operating System Runtime API Statistics:\n",
|
|
|
- "\n",
|
|
|
- " Time(%) Total Time (ns) Num Calls Average Minimum Maximum Name \n",
|
|
|
- " ------- --------------- --------- ------------- -------- ----------- ----------------------\n",
|
|
|
- " 31.8 5593434303269 368 15199549737.1 23610 79120616452 pthread_cond_wait \n",
|
|
|
- " 27.9 4910125179481 17570 279460738.7 1963 10130282621 pthread_cond_timedwait\n",
|
|
|
- " 24.3 4278054729224 34186 125140546.7 1290 44497241962 poll \n",
|
|
|
- " 10.3 1803196509173 355 5079426786.4 23310 33853558996 sem_wait \n",
|
|
|
- " 3.6 641269443401 576 1113315005.9 1015 2997228374 futex \n",
|
|
|
- " 0.8 145245328959 688 211112396.7 10065411 1007923775 select \n",
|
|
|
- " 0.6 109784339681 50957 2154450.6 1003 109559784 ioctl \n",
|
|
|
- " 0.2 28042017648 117 239675364.5 164118 2317633343 fork \n",
|
|
|
- " 0.1 15313635731 988 15499631.3 106209 1353294979 pthread_create \n",
|
|
|
- " 0.1 13206505992 11995 1101000.9 999 319033247 read \n",
|
|
|
- " 0.1 12657774292 4603 2749896.7 1004 119470493 pthread_tryjoin_np \n",
|
|
|
- " 0.1 9296666389 2070 4491143.2 1020 742142953 write \n",
|
|
|
- " 0.0 8550210410 167 51198864.7 5064806 1000155935 nanosleep \n",
|
|
|
- " 0.0 5473575733 340 16098752.2 1716 753451009 accept \n",
|
|
|
- " 0.0 4563965549 5670 804932.2 1000 744010044 recv \n",
|
|
|
- " 0.0 2881757706 8761 328930.2 3498 52730526 open64 \n",
|
|
|
- " 0.0 2738750168 970 2823453.8 1029 108977691 pthread_cond_signal \n",
|
|
|
- " 0.0 1674433469 695 2409256.8 3814 447909461 pthread_join \n",
|
|
|
- " 0.0 1307932755 574 2278628.5 999 552483526 waitpid \n",
|
|
|
- " 0.0 787392363 116 6787865.2 1039 118398933 pthread_mutex_lock \n",
|
|
|
- " 0.0 466428334 7253 64308.3 1000 29379070 send \n",
|
|
|
- " 0.0 365132680 3392 107645.2 2498 19909417 mmap \n",
|
|
|
- " 0.0 342197730 563 607811.2 6018 44240427 connect \n",
|
|
|
- " 0.0 249098555 3159 78853.6 1421 34377391 fopen \n",
|
|
|
- " 0.0 217823404 260 837782.3 2407 34361262 pipe2 \n",
|
|
|
- " 0.0 217659306 40 5441482.7 1558 48632942 openat \n",
|
|
|
- " 0.0 180993013 4202 43073.1 1374 32116690 munmap \n",
|
|
|
- " 0.0 77052008 18 4280667.1 61137 13838109 sleep \n",
|
|
|
- " 0.0 61461276 3602 17063.1 2001 20022904 mmap64 \n",
|
|
|
- " 0.0 48242345 204 236482.1 2249 29899269 open \n",
|
|
|
- " 0.0 39481067 228 173162.6 1032 3767436 recvmsg \n",
|
|
|
- " 0.0 38664317 726 53256.6 1774 27390519 socket \n",
|
|
|
- " 0.0 36313974 1633 22237.6 1000 1736639 fgets \n",
|
|
|
- " 0.0 22467169 2648 8484.6 1000 11488792 fread \n",
|
|
|
- " 0.0 20593267 73 282099.5 29914 5176911 sem_timedwait \n",
|
|
|
- " 0.0 19633001 3116 6300.7 1000 7802890 fclose \n",
|
|
|
- " 0.0 8898623 86 103472.4 1007 8486628 fgetc \n",
|
|
|
- " 0.0 6623741 1610 4114.1 999 3032106 sched_yield \n",
|
|
|
- " 0.0 3582154 3 1194051.3 343132 2284636 pthread_rwlock_wrlock \n",
|
|
|
- " 0.0 1912398 26 73553.8 2155 1181798 fopen64 \n",
|
|
|
- " 0.0 1550002 20 77500.1 2507 938762 fwrite_unlocked \n",
|
|
|
- " 0.0 1303265 949 1373.3 1000 123053 fcntl \n",
|
|
|
- " 0.0 551681 160 3448.0 1014 141280 bind \n",
|
|
|
- " 0.0 546706 41 13334.3 1026 224142 getdelim \n",
|
|
|
- " 0.0 244769 73 3353.0 1000 22723 fflush \n",
|
|
|
- " 0.0 198165 80 2477.1 1081 9301 waitid \n",
|
|
|
- " 0.0 172780 122 1416.2 1001 3555 listen \n",
|
|
|
- " 0.0 152498 16 9531.1 1347 20498 sendmsg \n",
|
|
|
- " 0.0 130015 1 130015.0 130015 130015 pthread_cond_broadcast\n",
|
|
|
- " 0.0 117991 3 39330.3 2098 107700 fputs \n",
|
|
|
- " 0.0 114666 2 57333.0 19023 95643 wait \n",
|
|
|
- " 0.0 104388 36 2899.7 1074 7418 signal \n",
|
|
|
- " 0.0 98697 59 1672.8 1009 6189 sigaction \n",
|
|
|
- " 0.0 85207 12 7100.6 4060 27106 pipe \n",
|
|
|
- " 0.0 66940 38 1761.6 1187 3590 pthread_mutex_trylock \n",
|
|
|
- " 0.0 66109 13 5085.3 3490 7378 fputs_unlocked \n",
|
|
|
- " 0.0 64428 54 1193.1 1001 1787 dup \n",
|
|
|
- " 0.0 43299 7 6185.6 3683 16998 wait3 \n",
|
|
|
- " 0.0 3502 2 1751.0 1396 2106 dup2 \n",
|
|
|
- "\n",
|
|
|
- "Report file moved to \"/home/zcharpy/profiles/GPT360M_naive.qdrep\"\n",
|
|
|
- "Report file moved to \"/home/zcharpy/profiles/GPT360M_naive.sqlite\"\n"
|
|
|
- ]
|
|
|
- }
|
|
|
- ],
|
|
|
- "source": [
|
|
|
- "!bash ./Megatron-LM/nsys_test.sh"
|
|
|
- ]
|
|
|
- },
|
|
|
- {
|
|
|
- "cell_type": "markdown",
|
|
|
- "id": "upset-skating",
|
|
|
- "metadata": {},
|
|
|
- "source": []
|
|
|
- },
|
|
|
- {
|
|
|
- "cell_type": "markdown",
|
|
|
- "id": "governmental-welding",
|
|
|
- "metadata": {},
|
|
|
- "source": [
|
|
|
- "---\n",
|
|
|
- "## below is a ReRun cell to experiment training configurations\n",
|
|
|
- "<a id=\"Rerun_Cell\"></a>"
|
|
|
- ]
|
|
|
- },
|
|
|
- {
|
|
|
- "cell_type": "code",
|
|
|
- "execution_count": null,
|
|
|
- "id": "requested-clause",
|
|
|
- "metadata": {},
|
|
|
- "outputs": [],
|
|
|
- "source": [
|
|
|
- "!bash ./Megatron-LM/dlprof_2nd_run.sh"
|
|
|
- ]
|
|
|
- },
|
|
|
- {
|
|
|
- "cell_type": "code",
|
|
|
- "execution_count": null,
|
|
|
- "id": "written-trace",
|
|
|
- "metadata": {},
|
|
|
- "outputs": [],
|
|
|
- "source": []
|
|
|
- },
|
|
|
- {
|
|
|
- "cell_type": "markdown",
|
|
|
- "id": "sunrise-borough",
|
|
|
- "metadata": {},
|
|
|
- "source": [
|
|
|
- "----------------\n",
|
|
|
- "\n",
|
|
|
- "## **Challenge ** - the best profile\n",
|
|
|
- "\n",
|
|
|
- "with the exact same compute limitations ( i.e # of gpus you currently have ) \n",
|
|
|
- "\n",
|
|
|
- "task: modify the [profiling bash script](./Megatron-LM/dlprof_2nd_run.sh) and rerun \n",
|
|
|
- "<a href=\"./Day2-5_Observe_GPT_runs_vs_performance.ipynb#Rerun_Cell\">Jump to ReRun Cell</a> \n",
|
|
|
- "monitor the training runs to get an overall >80% gpu utils in **training** runs \n",
|
|
|
- "\n",
|
|
|
- "```\n",
|
|
|
- " TENSOR_MP_SIZE=1\n",
|
|
|
- " PIPELINE_MP_SIZE=1\n",
|
|
|
- "\n",
|
|
|
- " #GPT Config \n",
|
|
|
- " LAYERS= \n",
|
|
|
- " HIDDEN_SIZE=\n",
|
|
|
- " ATTN_HEADS=\n",
|
|
|
- " MICRO_BZ=\n",
|
|
|
- " GB_BZ=\n",
|
|
|
- " SEQ_LEN=\n",
|
|
|
- " MAX_POS_EM=\n",
|
|
|
- "``` \n",
|
|
|
- "<a id=\"TheChallenge\"></a>"
|
|
|
- ]
|
|
|
- },
|
|
|
- {
|
|
|
- "cell_type": "markdown",
|
|
|
- "id": "streaming-artist",
|
|
|
- "metadata": {},
|
|
|
- "source": [
|
|
|
- "-----\n",
|
|
|
- "\n",
|
|
|
- "\n",
|
|
|
- "## Licensing \n",
|
|
|
- "\n",
|
|
|
- "This material is released by OpenACC-Standard.org, in collaboration with NVIDIA Corporation, under the Creative Commons Attribution 4.0 International (CC BY 4.0). "
|
|
|
- ]
|
|
|
- }
|
|
|
- ],
|
|
|
- "metadata": {
|
|
|
- "kernelspec": {
|
|
|
- "display_name": "Python 3",
|
|
|
- "language": "python",
|
|
|
- "name": "python3"
|
|
|
- },
|
|
|
- "language_info": {
|
|
|
- "codemirror_mode": {
|
|
|
- "name": "ipython",
|
|
|
- "version": 3
|
|
|
- },
|
|
|
- "file_extension": ".py",
|
|
|
- "mimetype": "text/x-python",
|
|
|
- "name": "python",
|
|
|
- "nbconvert_exporter": "python",
|
|
|
- "pygments_lexer": "ipython3",
|
|
|
- "version": "3.8.8"
|
|
|
- }
|
|
|
- },
|
|
|
- "nbformat": 4,
|
|
|
- "nbformat_minor": 5
|
|
|
-}
|