|
@@ -1,110 +0,0 @@
|
|
|
-## TL;DR Installation
|
|
|
-
|
|
|
-### Kubernetes
|
|
|
-Install Slurm and Kubernetes, along with all dependencies
|
|
|
-```
|
|
|
-ansible-playbook -i host_inventory_file omnia.yml
|
|
|
-```
|
|
|
-
|
|
|
-Install Slurm only
|
|
|
-```
|
|
|
-ansible-playbook -i host_inventory_file omnia.yml --skip-tags "k8s"
|
|
|
-```
|
|
|
-
|
|
|
-Install Kubernetes only
|
|
|
-```
|
|
|
-ansible-playbook -i host_inventory_file omnia.yml --skip-tags "slurm"
|
|
|
-
|
|
|
-
|
|
|
-Initialize Kubernetes cluster (packages already installed)
|
|
|
-```
|
|
|
-ansible-playbook -i host_inventory_file omnia.yml --skip-tags "slurm" --tags "init"
|
|
|
-```
|
|
|
-
|
|
|
-### Install Kubeflow
|
|
|
-```
|
|
|
-ansible-playbook -i host_inventory_file platforms/kubeflow.yml
|
|
|
-```
|
|
|
-
|
|
|
-# Omnia
|
|
|
-Omnia is a collection of [Ansible](https://www.ansible.com/) playbooks which perform:
|
|
|
-* Installation of [Slurm](https://slurm.schedmd.com/) and/or [Kubernetes](https://kubernetes.io/) on servers already provisioned with a standard [CentOS](https://www.centos.org/) image.
|
|
|
-* Installation of auxiliary scripts for administrator functions such as moving nodes between Slurm and Kubernetes personalities.
|
|
|
-
|
|
|
-Omnia playbooks perform several tasks:
|
|
|
-`common` playbook handles installation of software
|
|
|
-* Add yum repositories:
|
|
|
- - Kubernetes (Google)
|
|
|
- - El Repo (for Nvidia drivers)
|
|
|
- - EPEL (Extra Packages for Enterprise Linux)
|
|
|
-* Install Packages from repos:
|
|
|
- - bash-completion
|
|
|
- - docker
|
|
|
- - gcc
|
|
|
- - python-pip
|
|
|
- - kubelet
|
|
|
- - kubeadm
|
|
|
- - kubectl
|
|
|
- - nfs-utils
|
|
|
- - nvidia-detect
|
|
|
- - yum-plugin-versionlock
|
|
|
-* Restart and enable system level services
|
|
|
- - Docker
|
|
|
- - Kubelet
|
|
|
-
|
|
|
-`computeGPU` playbook installs Nvidia drivers and nvidia-container-runtime-hook
|
|
|
-* Add yum repositories:
|
|
|
- - Nvidia (container runtime)
|
|
|
-* Install Packages from repos:
|
|
|
- - kmod-nvidia
|
|
|
- - nvidia-container-runtime-hook
|
|
|
-* Restart and enable system level services
|
|
|
- - Docker
|
|
|
- - Kubelet
|
|
|
-* Configuration:
|
|
|
- - Enable GPU Device Plugins (nvidia-container-runtime-hook)
|
|
|
- - Modify kubeadm config to allow GPUs as schedulable resource
|
|
|
-* Restart and enable system level services
|
|
|
- - Docker
|
|
|
- - Kubelet
|
|
|
-
|
|
|
-`master` playbook
|
|
|
-* Install Helm v3
|
|
|
-* (optional) add firewall rules for Slurm and kubernetes
|
|
|
-
|
|
|
-Everything from this point on can be called by using the `init` tag
|
|
|
-```
|
|
|
-ansible-playbook -i host_inventory_file kubernetes/kubernetes.yml --tags "init"
|
|
|
-```
|
|
|
-
|
|
|
-`startmaster` playbook
|
|
|
-* turn off swap
|
|
|
-*Initialize Kubernetes
|
|
|
- * Head/master
|
|
|
- - Start K8S pass startup token to compute/slaves
|
|
|
- - Initialize software defined networking (Calico)
|
|
|
-
|
|
|
-`startworkers` playbook
|
|
|
-* turn off swap
|
|
|
-* Join k8s cluster
|
|
|
-
|
|
|
-`startservices` playbook
|
|
|
-* Setup K8S Dashboard
|
|
|
-* Add `stable` repo to helm
|
|
|
-* Add `jupyterhub` repo to helm
|
|
|
-* Update helm repos
|
|
|
-* Deploy NFS client Provisioner
|
|
|
-* Deploy Jupyterhub
|
|
|
-* Deploy Prometheus
|
|
|
-* Install MPI Operator
|
|
|
-
|
|
|
-
|
|
|
-### Slurm
|
|
|
-* Downloads and builds Slurm from source
|
|
|
-* Install package dependencies
|
|
|
- - Python3
|
|
|
- - munge
|
|
|
- - MariaDB
|
|
|
- - MariaDB development libraries
|
|
|
-* Build Slurm configuration files
|
|
|
-
|