TL;DR Installation
Kubernetes
Install Kubernetes and all dependencies
ansible-playbook -i host_inventory_file kubernetes/kubernetes.yml
Initialize K8s cluster
ansible-playbook -i host_inventory_file kubernetes/kubernetes.yml --tags "init"
Install Kubeflow
ansible-playbook -i host_inventory_file kubernetes/kubeflow.yaml
Slurm
ansible-playbook -i host_inventory_file slurm/slurm.yml
Omnia
Omnia is a collection of Ansible playbooks which perform:
- Installation of Slurm and/or Kubernetes on servers already provisioned with a standard CentOS image.
- Installation of auxiliary scripts for administrator functions such as moving nodes between Slurm and Kubernetes personalities.
Omnia playbooks perform several tasks:
common
playbook handles installation of software
- Add yum repositories:
- Kubernetes (Google)
- El Repo (for Nvidia drivers)
- EPEL (Extra Packages for Enterprise Linux)
- Install Packages from repos:
- bash-completion
- docker
- gcc
- python-pip
- kubelet
- kubeadm
- kubectl
- nfs-utils
- nvidia-detect
- yum-plugin-versionlock
- Restart and enable system level services
computeGPU
playbook installs Nvidia drivers and nvidia-container-runtime-hook
- Add yum repositories:
- Nvidia (container runtime)
- Install Packages from repos:
- kmod-nvidia
- nvidia-container-runtime-hook
- Restart and enable system level services
- Configuration:
- Enable GPU Device Plugins (nvidia-container-runtime-hook)
- Modify kubeadm config to allow GPUs as schedulable resource
- Restart and enable system level services
master
playbook
- Install Helm v3
- (optional) add firewall rules for Slurm and kubernetes
Everything from this point on can be called by using the init
tag
ansible-playbook -i host_inventory_file kubernetes/kubernetes.yml --tags "init"
startmaster
playbook
- turn off swap
*Initialize Kubernetes
- Head/master
- Start K8S pass startup token to compute/slaves
- Initialize software defined networking (Calico)
startworkers
playbook
- turn off swap
- Join k8s cluster
startservices
playbook
- Setup K8S Dashboard
- Add
stable
repo to helm
- Add
jupyterhub
repo to helm
- Update helm repos
- Deploy NFS client Provisioner
- Deploy Jupyterhub
- Deploy Prometheus
- Install MPI Operator
Slurm
- Downloads and builds Slurm from source
- Install package dependencies
- Python3
- munge
- MariaDB
- MariaDB development libraries
- Build Slurm configuration files