Bläddra i källkod

Adding site content to /site directory (see #194)

Signed-off-by: Luke Wilson <luke.wilson@dell.com>
Luke Wilson 4 år sedan
förälder
incheckning
d3cd793d21

+ 6 - 0
site/CONTRIBUTORS.md

@@ -0,0 +1,6 @@
+# Omnia Maintainers
+- Luke Wilson and John Lockman (Dell Technologies)
+<img src="images/delltech.jpg" height="90px" alt="Dell Technologies">
+
+# Omnia Contributors
+<img src="images/delltech.jpg" height="90px" alt="Dell Technologies"> <img src="images/pisa.png" height="100px" alt="Universita di Pisa">

+ 105 - 0
site/INSTALL.md

@@ -0,0 +1,105 @@
+## TL;DR Installation
+ 
+### Kubernetes
+Install Kubernetes and all dependencies
+```
+ansible-playbook -i host_inventory_file kubernetes/kubernetes.yml
+```
+
+Initialize K8s cluster
+```
+ansible-playbook -i host_inventory_file kubernetes/kubernetes.yml --tags "init"
+```
+
+### Install Kubeflow 
+```
+ansible-playbook -i host_inventory_file kubernetes/kubeflow.yaml
+```
+
+### Slurm
+```
+ansible-playbook -i host_inventory_file slurm/slurm.yml
+```
+
+# Omnia  
+Omnia is a collection of [Ansible](https://www.ansible.com/) playbooks which perform:
+* Installation of [Slurm](https://slurm.schedmd.com/) and/or [Kubernetes](https://kubernetes.io/) on servers already provisioned with a standard [CentOS](https://www.centos.org/) image.
+* Installation of auxiliary scripts for administrator functions such as moving nodes between Slurm and Kubernetes personalities.
+
+Omnia playbooks perform several tasks:
+`common` playbook handles installation of software 
+* Add yum repositories:
+    - Kubernetes (Google)
+    - El Repo (for Nvidia drivers)
+    - EPEL (Extra Packages for Enterprise Linux)
+* Install Packages from repos:
+    - bash-completion
+    - docker
+    - gcc
+    - python-pip
+    - kubelet
+    - kubeadm
+    - kubectl
+    - nfs-utils
+    - nvidia-detect
+    - yum-plugin-versionlock
+* Restart and enable system level services
+    - Docker
+    - Kubelet
+
+`computeGPU` playbook installs Nvidia drivers and nvidia-container-runtime-hook
+* Add yum repositories:
+    - Nvidia (container runtime)
+* Install Packages from repos:
+    - kmod-nvidia
+    - nvidia-container-runtime-hook
+* Restart and enable system level services
+    - Docker
+    - Kubelet
+* Configuration:
+    - Enable GPU Device Plugins (nvidia-container-runtime-hook)
+    - Modify kubeadm config to allow GPUs as schedulable resource 
+* Restart and enable system level services
+    - Docker
+    - Kubelet
+
+`master` playbook
+* Install Helm v3
+* (optional) add firewall rules for Slurm and kubernetes
+
+Everything from this point on can be called by using the `init` tag
+```
+ansible-playbook -i host_inventory_file kubernetes/kubernetes.yml --tags "init"
+```
+
+`startmaster` playbook
+* turn off swap
+*Initialize Kubernetes
+    * Head/master
+        - Start K8S pass startup token to compute/slaves
+        - Initialize software defined networking (Calico)
+
+`startworkers` playbook
+* turn off swap
+* Join k8s cluster
+
+`startservices` playbook
+* Setup K8S Dashboard
+* Add `stable` repo to helm
+* Add `jupyterhub` repo to helm
+* Update helm repos
+* Deploy NFS client Provisioner
+* Deploy Jupyterhub
+* Deploy Prometheus
+* Install MPI Operator
+
+
+### Slurm
+* Downloads and builds Slurm from source
+* Install package dependencies
+    - Python3
+    - munge
+    - MariaDB
+    - MariaDB development libraries
+* Build Slurm configuration files
+

+ 27 - 0
site/PREINSTALL.md

@@ -0,0 +1,27 @@
+# Pre-Installation Preparation
+
+## Assumptions
+Omnia assumes that prior to installation:
+* Systems have a base operating system (currently CentOS 7 or 8)
+* Network(s) has been cabled and nodes can reach the internet
+* SSH Keys for `root` have been installed on all nodes to allow for password-less SSH
+* Ansible is installed on either the master node or a separate deployment node
+```
+yum install ansible
+```
+
+## Example system designs
+Omnia can configure systems which use Ethernet- or Infiniband-based fabric to connect the compute servers.
+
+![Example system configuration with Ethernet fabric](images/example-system-ethernet.png)
+
+![Example system configuration with Infiniband fabric](images/example-system-infiniband.png)
+
+## Network Setup
+Omnia assumes that servers are already connected to the network and have access to the internet.
+### Network Topology
+Possible network configurations include:
+* A flat topology where all nodes are connected to a switch which includes an uplink to the internet. This requires multiple externally-facing IP addresses
+* A hierarchical topology where compute nodes are connected to a common switch, but the master node contains a second network connection which is connected to the internet. All outbound/inbound traffic would be routed through the master node. This requires setting up firewall rules for IP masquerade, see [here](https://www.server-world.info/en/note?os=CentOS_7&p=firewalld&f=2) for an example.
+### IP and Hostname Assignment
+The recommended setup is to assign IP addresses to individual servers. This can be done manually by logging onto each node, or via DHCP.

Filskillnaden har hållts tillbaka eftersom den är för stor
+ 43 - 0
site/README.md


+ 4 - 0
site/_config.yml

@@ -0,0 +1,4 @@
+theme: jekyll-theme-minimal
+title: Omnia
+description: Ansible playbook-based tools for deploying Slurm and Kubernetes clusters for High Performance Computing, Machine Learning, Deep Learning, and High-Performance Data Analytics
+logo: images/omnia-logo.png

BIN
site/images/delltech.jpg


BIN
site/images/example-system-ethernet.png


BIN
site/images/example-system-infiniband.png


BIN
site/images/omnia-branch-structure.png


BIN
site/images/omnia-k8s.png


BIN
site/images/omnia-logo.png


BIN
site/images/omnia-overview.png


BIN
site/images/omnia-slurm.png


BIN
site/images/pisa.png


+ 10 - 0
site/metalLB/README.md

@@ -0,0 +1,10 @@
+# MetalLB 
+
+MetalLB is a load-balancer implementation for bare metal Kubernetes clusters, using standard routing protocols.
+https://metallb.universe.tf/
+
+Omnia installs MetalLB by manifest in the playbook `startservices`. A default configuration is provdied for layer2 protocol and an example for providing an address pool. Modify metal-config.yaml to suit your network requirements and apply the changes using with: 
+
+``` 
+kubectl apply -f metal-config.yaml
+```

+ 21 - 0
site/metalLB/metal-config.yaml

@@ -0,0 +1,21 @@
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  namespace: metallb-system
+  name: config
+data:
+  config: |
+    address-pools:
+    - name: default
+      protocol: layer2
+      addresses:
+      - 192.168.2.150/32
+      - 192.168.2.151/32
+      - 192.168.2.152/32
+      - 192.168.2.153/32
+      - 192.168.2.154/32
+      - 192.168.2.155/32
+      - 192.168.2.156/32
+      - 192.168.2.157/32
+      - 192.168.2.158/32
+      - 192.168.2.159/32