Browse Source

Merge pull request #110 from dellhpc/release-v0.2

Omnia Release v0.2
John Lockman 4 years ago
parent
commit
986eccbd63
45 changed files with 881 additions and 335 deletions
  1. 70 46
      CONTRIBUTING.md
  2. 1 1
      LICENSE
  3. 9 3
      README.md
  4. 6 0
      docs/CONTRIBUTORS.md
  5. 66 26
      docs/INSTALL.md
  6. 1 1
      docs/PREINSTALL.md
  7. 23 4
      docs/README.md
  8. 4 1
      docs/_config.yml
  9. BIN
      docs/images/delltech.jpg
  10. BIN
      docs/images/omnia-branch-structure.png
  11. BIN
      docs/images/omnia-k8s.png
  12. BIN
      docs/images/omnia-logo.png
  13. BIN
      docs/images/omnia-overview.png
  14. BIN
      docs/images/omnia-slurm.png
  15. BIN
      docs/images/pisa.png
  16. 10 0
      docs/metalLB/README.md
  17. 21 0
      docs/metalLB/metal-config.yaml
  18. 20 0
      examples/PyTorch/pytorch-deploy.yaml
  19. 54 0
      examples/PyTorch/pytorch-example.py
  20. 69 0
      examples/k8s-tensorflow-nvidia-ngc-resnet50-multinode-mpioperator.yaml
  21. 20 20
      kubernetes/host_inventory_file
  22. 22 0
      kubernetes/jupyterhub.yaml
  23. 22 0
      kubernetes/kubeflow.yaml
  24. 14 0
      kubernetes/build-kubernetes-cluster.yml
  25. 15 15
      kubernetes/roles/common/tasks/main.yml
  26. 14 0
      kubernetes/roles/computeGPU/tasks/main.yml
  27. 42 0
      kubernetes/roles/jupyterhub/files/jupyter_config.yaml
  28. 26 0
      kubernetes/roles/jupyterhub/tasks/main.yml
  29. 122 0
      kubernetes/roles/kubeflow/tasks/main.yml
  30. 26 12
      kubernetes/roles/master/tasks/main.yml
  31. 35 5
      kubernetes/roles/startmaster/tasks/main.yml
  32. 0 16
      kubernetes/roles/startservices/files/jhub-db-pv.yaml
  33. 0 50
      kubernetes/roles/startservices/files/jupyter-pvc.yaml
  34. 0 62
      kubernetes/roles/startservices/files/jupyter_config.yaml
  35. 10 10
      kubernetes/roles/startservices/files/metal-config.yaml
  36. 26 40
      kubernetes/roles/startservices/tasks/main.yml
  37. 15 0
      kubernetes/roles/startworkers/tasks/main.yml
  38. 14 0
      kubernetes/scuttle
  39. 13 0
      slurm/roles/slurm-common/tasks/main.yaml
  40. 14 0
      slurm/roles/slurm-master/tasks/main.yaml
  41. 14 0
      slurm/roles/start-slurm-workers/tasks/main.yml
  42. 0 23
      slurm/slurm-cluster.yaml
  43. 36 0
      slurm/slurm.yml
  44. 14 0
      tools/change_personality
  45. 13 0
      tools/install_tools.yml

File diff suppressed because it is too large
+ 70 - 46
CONTRIBUTING.md


+ 1 - 1
LICENSE

@@ -186,7 +186,7 @@
       same "printed page" as the copyright notice for easier
       identification within third-party archives.
 
-   Copyright [yyyy] [name of copyright owner]
+   Copyright 2020 Dell Inc. or its subsidiaries. All Rights Reserved. 
 
    Licensed under the Apache License, Version 2.0 (the "License");
    you may not use this file except in compliance with the License.

+ 9 - 3
README.md

@@ -1,11 +1,17 @@
-# Omnia
+<img src="docs/images/omnia-logo.png" width="500px">
+
+![GitHub](https://img.shields.io/github/license/dellhpc/omnia) ![GitHub issues](https://img.shields.io/github/issues-raw/dellhpc/omnia) ![GitHub release (latest by date including pre-releases)](https://img.shields.io/github/v/release/dellhpc/omnia?include_prereleases) ![GitHub last commit (branch)](https://img.shields.io/github/last-commit/dellhpc/omnia/devel) ![GitHub commits since tagged version](https://img.shields.io/github/commits-since/dellhpc/omnia/omnia-v0.2/devel) 
+
 #### Ansible playbook-based deployment of Slurm and Kubernetes on Dell EMC PowerEdge servers running an RPM-based Linux OS
 
 Omnia (Latin: all or everything) is a deployment tool to turn Dell EMC PowerEdge servers with RPM-based Linux images into a functioning Slurm/Kubernetes cluster.
 
 ## Omnia Documentation
-For Omnia documentation, including installation and contribution instructions, see [docs](docs/README.md).
+For Omnia documentation, including installation and contribution instructions, please see the [website](https://dellhpc.github.io/omnia).
 
-### Current maintainers:
+## Current maintainers:
 * Lucas A. Wilson (Dell Technologies)
 * John Lockman (Dell Technologies)
+
+## Omnia Contributors:
+<img src="docs/images/delltech.jpg" height="150px" alt="Dell Technologies"> <img src="docs/images/pisa.png" height="150px" alt="Universita di Pisa">

+ 6 - 0
docs/CONTRIBUTORS.md

@@ -0,0 +1,6 @@
+# Omnia Maintainers
+- Luke Wilson and John Lockman (Dell Technologies)
+<img src="images/delltech.jpg" height="90px" alt="Dell Technologies">
+
+# Omnia Contributors
+<img src="images/delltech.jpg" height="90px" alt="Dell Technologies"> <img src="images/pisa.png" height="100px" alt="Universita di Pisa">

+ 66 - 26
docs/INSTALL.md

@@ -1,7 +1,5 @@
-# Installing Omnia
-
-## TL;DR
-
+## TL;DR Installation
+ 
 ### Kubernetes
 Install Kubernetes and all dependencies
 ```
@@ -12,54 +10,96 @@ Initialize K8s cluster
 ```
 ansible-playbook -i host_inventory_file kubernetes/kubernetes.yml --tags "init"
 ```
+
+### Install Kubeflow 
+```
+ansible-playbook -i host_inventory_file kubernetes/kubeflow.yaml
+```
+
 ### Slurm
 ```
 ansible-playbook -i host_inventory_file slurm/slurm.yml
 ```
 
-## Build/Install
+# Omnia  
 Omnia is a collection of [Ansible](https://www.ansible.com/) playbooks which perform:
 * Installation of [Slurm](https://slurm.schedmd.com/) and/or [Kubernetes](https://kubernetes.io/) on servers already provisioned with a standard [CentOS](https://www.centos.org/) image.
 * Installation of auxiliary scripts for administrator functions such as moving nodes between Slurm and Kubernetes personalities.
 
-### Kubernetes
-
-* Add additional repositories:
+Omnia playbooks perform several tasks:
+`common` playbook handles installation of software 
+* Add yum repositories:
     - Kubernetes (Google)
-    - El Repo (nvidia drivers)
-    - Nvidia (nvidia-docker)
+    - El Repo (for Nvidia drivers)
     - EPEL (Extra Packages for Enterprise Linux)
-* Install common packages
+* Install Packages from repos:
+    - bash-completion
+    - docker
     - gcc
     - python-pip
-    - docker
     - kubelet
     - kubeadm
     - kubectl
+    - nfs-utils
     - nvidia-detect
+    - yum-plugin-versionlock
+* Restart and enable system level services
+    - Docker
+    - Kubelet
+
+`computeGPU` playbook installs Nvidia drivers and nvidia-container-runtime-hook
+* Add yum repositories:
+    - Nvidia (container runtime)
+* Install Packages from repos:
     - kmod-nvidia
-    - nvidia-x11-drv
-    - nvidia-container-runtime
-    - ksonnet (CLI framework for K8S configs)
-* Enable GPU Device Plugins (nvidia-container-runtime-hook)
-* Modify kubeadm config to allow GPUs as schedulable resource 
-* Start and enable services
+    - nvidia-container-runtime-hook
+* Restart and enable system level services
     - Docker
     - Kubelet
-* Initialize Cluster
+* Configuration:
+    - Enable GPU Device Plugins (nvidia-container-runtime-hook)
+    - Modify kubeadm config to allow GPUs as schedulable resource 
+* Restart and enable system level services
+    - Docker
+    - Kubelet
+
+`master` playbook
+* Install Helm v3
+* (optional) add firewall rules for Slurm and kubernetes
+
+Everything from this point on can be called by using the `init` tag
+```
+ansible-playbook -i host_inventory_file kubernetes/kubernetes.yml --tags "init"
+```
+
+`startmaster` playbook
+* turn off swap
+*Initialize Kubernetes
     * Head/master
         - Start K8S pass startup token to compute/slaves
-        - Initialize networking (Currently using WeaveNet)
-        - Setup K8S Dashboard
-        - Create dynamic/persistent volumes
-    * Compute/slaves
-        - Join k8s cluster
+        - Initialize software defined networking (Calico)
+
+`startworkers` playbook
+* turn off swap
+* Join k8s cluster
+
+`startservices` playbook
+* Setup K8S Dashboard
+* Add `stable` repo to helm
+* Add `jupyterhub` repo to helm
+* Update helm repos
+* Deploy NFS client Provisioner
+* Deploy Jupyterhub
+* Deploy Prometheus
+* Install MPI Operator
+
 
 ### Slurm
-* Download and build Slurm source
-* Install necessary dependencies
+* Downloads and builds Slurm from source
+* Install package dependencies
     - Python3
     - munge
     - MariaDB
     - MariaDB development libraries
 * Build Slurm configuration files
+

+ 1 - 1
docs/PREINSTALL.md

@@ -5,7 +5,7 @@ Omnia assumes that prior to installation:
 * Systems have a base operating system (currently CentOS 7 or 8)
 * Network(s) has been cabled and nodes can reach the internet
 * SSH Keys for `root` have been installed on all nodes to allow for password-less SSH
-* Ansible is installed on the master node
+* Ansible is installed on either the master node or a separate deployment node
 ```
 yum install ansible
 ```

File diff suppressed because it is too large
+ 23 - 4
docs/README.md


+ 4 - 1
docs/_config.yml

@@ -1 +1,4 @@
-theme: jekyll-theme-minimal
+theme: jekyll-theme-minimal
+title: Omnia
+description: Ansible playbook-based tools for deploying Slurm and Kubernetes clusters for High Performance Computing, Machine Learning, Deep Learning, and High-Performance Data Analytics
+logo: images/omnia-logo.png

BIN
docs/images/delltech.jpg


BIN
docs/images/omnia-branch-structure.png


BIN
docs/images/omnia-k8s.png


BIN
docs/images/omnia-logo.png


BIN
docs/images/omnia-overview.png


BIN
docs/images/omnia-slurm.png


BIN
docs/images/pisa.png


+ 10 - 0
docs/metalLB/README.md

@@ -0,0 +1,10 @@
+# MetalLB 
+
+MetalLB is a load-balancer implementation for bare metal Kubernetes clusters, using standard routing protocols.
+https://metallb.universe.tf/
+
+Omnia installs MetalLB by manifest in the playbook `startservices`. A default configuration is provdied for layer2 protocol and an example for providing an address pool. Modify metal-config.yaml to suit your network requirements and apply the changes using with: 
+
+``` 
+kubectl apply -f metal-config.yaml
+```

+ 21 - 0
docs/metalLB/metal-config.yaml

@@ -0,0 +1,21 @@
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  namespace: metallb-system
+  name: config
+data:
+  config: |
+    address-pools:
+    - name: default
+      protocol: layer2
+      addresses:
+      - 192.168.2.150/32
+      - 192.168.2.151/32
+      - 192.168.2.152/32
+      - 192.168.2.153/32
+      - 192.168.2.154/32
+      - 192.168.2.155/32
+      - 192.168.2.156/32
+      - 192.168.2.157/32
+      - 192.168.2.158/32
+      - 192.168.2.159/32

+ 20 - 0
examples/PyTorch/pytorch-deploy.yaml

@@ -0,0 +1,20 @@
+apiVersion: batch/v1
+kind: Job
+metadata:
+  name: pytorch-cpu-simple
+  namespace: default
+spec:
+  template:
+    spec:
+      containers:
+      - name: cpu-pytorch
+        image: docker.io/mapler/pytorch-cpu:latest
+        volumeMounts:
+        - mountPath: /pyscript
+          name: torch-job-volume
+        command: ["bash","-c","python /pyscript/pytorchcpu-example.py"]
+      restartPolicy: Never
+      volumes:
+      - name: torch-job-volume
+        hostPath:
+          path: /home/k8s/torch-example

+ 54 - 0
examples/PyTorch/pytorch-example.py

@@ -0,0 +1,54 @@
+import random
+import torch
+
+class DynamicNet(torch.nn.Module):
+    def __init__(self, D_in, H, D_out):
+        """
+        In the constructor we construct three nn.Linear instances that we will use
+        in the forward pass.
+        """
+        super(DynamicNet, self).__init__()
+        self.input_linear = torch.nn.Linear(D_in, H)
+        self.middle_linear = torch.nn.Linear(H, H)
+        self.output_linear = torch.nn.Linear(H, D_out)
+    def forward(self, x):
+        """
+        For the forward pass of the model, we randomly choose either 0, 1, 2, or 3
+        and reuse the middle_linear Module that many times to compute hidden layer
+        representations.
+        Since each forward pass builds a dynamic computation graph, we can use normal
+        Python control-flow operators like loops or conditional statements when
+        defining the forward pass of the model.
+        Here we also see that it is perfectly safe to reuse the same Module many
+        times when defining a computational graph. This is a big improvement from Lua
+        Torch, where each Module could be used only once.
+        """
+        h_relu = self.input_linear(x).clamp(min=0)
+        for _ in range(random.randint(0, 3)):
+            h_relu = self.middle_linear(h_relu).clamp(min=0)
+        y_pred = self.output_linear(h_relu)
+        return y_pred
+
+# N is batch size; D_in is input dimension;
+# H is hidden dimension; D_out is output dimension.
+N, D_in, H, D_out = 64, 1000, 100, 10
+# Create random Tensors to hold inputs and outputs
+x = torch.randn(N, D_in)
+y = torch.randn(N, D_out)
+# Construct our model by instantiating the class defined above
+model = DynamicNet(D_in, H, D_out)
+# Construct our loss function and an Optimizer. Training this strange model with
+# vanilla stochastic gradient descent is tough, so we use momentum
+criterion = torch.nn.MSELoss(reduction='sum')
+optimizer = torch.optim.SGD(model.parameters(), lr=1e-4, momentum=0.9)
+for t in range(500):
+    # Forward pass: Compute predicted y by passing x to the model
+    y_pred = model(x)
+    # Compute and print loss
+    loss = criterion(y_pred, y)
+    if t % 100 == 99:
+        print(t, loss.item())
+    # Zero gradients, perform a backward pass, and update the weights.
+    optimizer.zero_grad()
+    loss.backward()
+    optimizer.step()

+ 69 - 0
examples/k8s-tensorflow-nvidia-ngc-resnet50-multinode-mpioperator.yaml

@@ -0,0 +1,69 @@
+apiVersion: kubeflow.org/v1alpha2
+kind: MPIJob
+metadata:
+  name: tensorflow-benchmarks
+spec:
+  slotsPerWorker: 4
+  cleanPodPolicy: Running
+  mpiReplicaSpecs:
+    Launcher:
+      replicas: 1
+      template:
+         spec:
+           containers:
+           - image: nvcr.io/nvidia/tensorflow:19.06-py3
+             imagePullPolicy: IfNotPresent
+             name: tensorflow-benchmarks
+             volumeMounts:
+               - mountPath: /local_mount
+                 name: work-volume
+             command:
+             - mpirun
+             - --allow-run-as-root
+             - -np
+             - "4"
+             - -bind-to
+             - none
+             - -map-by
+             #- slot
+             - numa
+             - -x
+             - NCCL_DEBUG=INFO
+             - -x
+             - LD_LIBRARY_PATH
+             - python
+             - /local_mount/tensorflow/benchmarks/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py
+             - --batch_size=512
+             - --model=resnet50
+             - --variable_update=horovod
+             - --optimizer=momentum
+             - --nodistortions
+             - --gradient_repacking=8
+             - --weight_decay=1e-4
+             - --use_fp16=true
+           volumes:
+             - name: work-volume
+               hostPath:
+                 # directory locally mounted on host
+                 path: /work
+                 type: Directory
+    Worker:
+      replicas: 1
+      template:
+        spec:
+          containers:
+          - image: nvcr.io/nvidia/tensorflow:19.06-py3
+            imagePullPolicy: IfNotPresent
+            name: tensorflow-benchmarks
+            resources:
+              limits:
+                nvidia.com/gpu: 4
+            volumeMounts:
+              - mountPath: /local_mount
+                name: work-volume
+          volumes:
+            - name: work-volume
+              hostPath:
+                # directory locally mounted on host
+                path: /work
+                type: Directory

+ 20 - 20
kubernetes/host_inventory_file

@@ -1,20 +1,20 @@
-[master]
-friday
-
-[compute]
-compute000
-compute[002:005]
-
-[gpus]
-#compute001
-compute002
-compute004
-compute005
-
-[workers:children]
-compute
-gpus
-
-[cluster:children]
-master
-workers
+all: 
+  children:
+    cluster:
+      children:
+        master:
+          hosts:
+            compute000:
+        workers:
+          children:
+            compute:
+              hosts:
+                compute003:
+            gpus:
+              hosts:
+                compute002:
+                compute004:
+                compute005:
+      vars:
+        single_node: false
+        master_ip: 10.0.0.100

+ 22 - 0
kubernetes/jupyterhub.yaml

@@ -0,0 +1,22 @@
+#  Copyright 2020 Dell Inc. or its subsidiaries. All Rights Reserved. 
+#
+#  Licensed under the Apache License, Version 2.0 (the "License");
+#  you may not use this file except in compliance with the License.
+#  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+#  limitations under the License.
+
+---
+#Playbook for installing JupyterHub v1.1.0 in Omnia
+ 
+# Start K8s worker servers
+- hosts: master
+  gather_facts: false
+  roles:
+    - jupyterhub

+ 22 - 0
kubernetes/kubeflow.yaml

@@ -0,0 +1,22 @@
+#  Copyright 2020 Dell Inc. or its subsidiaries. All Rights Reserved. 
+#
+#  Licensed under the Apache License, Version 2.0 (the "License");
+#  you may not use this file except in compliance with the License.
+#  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+#  limitations under the License.
+
+---
+#Playbook for installing Kubeflow v1.0 on Omnia
+ 
+# Start K8s worker servers
+- hosts: master
+  gather_facts: false
+  roles:
+    - kubeflow

+ 14 - 0
kubernetes/build-kubernetes-cluster.yml

@@ -1,3 +1,17 @@
+#  Copyright 2020 Dell Inc. or its subsidiaries. All Rights Reserved.
+#
+#  Licensed under the Apache License, Version 2.0 (the "License");
+#  you may not use this file except in compliance with the License.
+#  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+#  limitations under the License.
+
 ---
 #Playbook for kubernetes cluster 
 

+ 15 - 15
kubernetes/roles/common/tasks/main.yml

@@ -1,3 +1,17 @@
+#  Copyright 2020 Dell Inc. or its subsidiaries. All Rights Reserved. 
+#
+#  Licensed under the Apache License, Version 2.0 (the "License");
+#  you may not use this file except in compliance with the License.
+#  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+#  limitations under the License.
+
 ---
 
 - name: add kubernetes repo
@@ -68,20 +82,6 @@
     name: "@Infiniband Support"
     state: present
 
-- name: Install KSonnet
-  unarchive:
-    src: https://github.com/ksonnet/ksonnet/releases/download/v0.13.1/ks_0.13.1_linux_amd64.tar.gz
-    dest: /usr/bin/
-    extra_opts: [--strip-components=1]
-    remote_src: yes
-    exclude:
-      - ks_0.11.0_linux_amd64/CHANGELOG.md
-      - ks_0.11.0_linux_amd64/CODE-OF-CONDUCT.md
-      - ks_0.11.0_linux_amd64/CONTRIBUTING.md
-      - ks_0.11.0_linux_amd64/LICENSE
-      - ks_0.11.0_linux_amd64/README.md
-  tags: install
-
 - name: upgrade pip
   command: /bin/pip install --upgrade pip
   tags: install
@@ -128,7 +128,7 @@
 - name: Start and nfs-lock service
   service:
     name: nfs-lock
-    state: restarted
+    #state: restarted
     enabled: yes
   tags: install
 

+ 14 - 0
kubernetes/roles/computeGPU/tasks/main.yml

@@ -1,3 +1,17 @@
+#  Copyright 2020 Dell Inc. or its subsidiaries. All Rights Reserved.
+#
+#  Licensed under the Apache License, Version 2.0 (the "License");
+#  you may not use this file except in compliance with the License.
+#  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+#  limitations under the License.
+
 ---
 - name: install Nvidia driver
   yum: 

+ 42 - 0
kubernetes/roles/jupyterhub/files/jupyter_config.yaml

@@ -0,0 +1,42 @@
+proxy:
+  secretToken: "1c8572f630701e8792bede122ec9c4179d9087f801e1a85ed32cce69887aec1b"
+
+hub:
+  cookieSecret: "1c8572f630701e8792bede122ec9c4179d9087f801e1a85ed32cce69887aec1b"
+  service:
+    type: LoadBalancer
+  db:
+    type: sqlite-pvc
+  extraConfig:
+    jupyterlab: |
+      c.Spawner.cmd = ['jupyter-labhub']
+
+singleuser:
+  image:
+    name: dellhpc/datasciencelab-base
+    tag: "1.0"
+  profileList:
+    - display_name: "DellHPC Improved Environment"
+      description: "Dell curated Jupyter Stacks"
+      kubespawner_override:
+        image: "dellhpc/datasciencelab-cpu:1.0"
+    - display_name: "DellHPC GPU Environment"
+      description: "Dell curated Jupyter Stacks 1 GPU"
+      kubespawner_override:
+        image: "dellhpc/datasciencelab-gpu:1.0"
+        extra_resource_limits:
+          nvidia.com/gpu: "1"
+  storage:
+    dynamic:
+      storageClass: nfs-client
+  cpu:
+    limit: 1
+  memory:
+    limit: 5G
+    guarantee: 1G
+  defaultUrl: "/lab"
+
+
+prePuller:
+  continuous:
+    enabled: true

+ 26 - 0
kubernetes/roles/jupyterhub/tasks/main.yml

@@ -0,0 +1,26 @@
+#  Copyright 2020 Dell Inc. or its subsidiaries. All Rights Reserved.
+#
+#  Licensed under the Apache License, Version 2.0 (the "License");
+#  you may not use this file except in compliance with the License.
+#  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+#  limitations under the License.
+
+---
+- name: Helm - Add JupyterHub Repo
+  shell: helm repo add jupyterhub https://jupyterhub.github.io/helm-chart/
+
+- name: Helm - Update Repo
+  shell: helm repo update
+
+- name: JupyterHub Custom Config (files)  
+  copy: src=jupyter_config.yaml dest=/root/k8s/jupyter_config.yaml owner=root group=root mode=655
+ 
+- name: jupyterHub deploy
+  shell: helm install jupyterhub/jupyterhub  --namespace default --version 0.9.0 --values /root/k8s/jupyter_config.yaml --generate-name --wait --timeout 60m

+ 122 - 0
kubernetes/roles/kubeflow/tasks/main.yml

@@ -0,0 +1,122 @@
+#  Copyright 2020 Dell Inc. or its subsidiaries. All Rights Reserved.
+#
+#  Licensed under the Apache License, Version 2.0 (the "License");
+#  you may not use this file except in compliance with the License.
+#  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+#  limitations under the License.
+
+---
+
+#Configure build and deploy kubeflow v1.0 
+
+- name: Download kfctl v1.0.2 release from the Kubeflow releases page.
+  unarchive:
+    src: https://github.com/kubeflow/kfctl/releases/download/v1.0.2/kfctl_v1.0.2-0-ga476281_linux.tar.gz
+    dest: /usr/bin/
+    remote_src: yes
+
+- name: Delete Omnia Kubeflow Directory if exists
+  file:
+    path: /root/k8s/omnia-kubeflow
+    state: absent
+
+- name: Create Kubeflow Directory
+  file:
+    path: /root/k8s/omnia-kubeflow
+    state: directory
+    recurse: yes
+
+- name: Build Kubeflow Configuration
+  shell: 
+    cmd: /usr/bin/kfctl build -V -f https://raw.githubusercontent.com/kubeflow/manifests/v1.0-branch/kfdef/kfctl_k8s_istio.v1.0.2.yaml
+    chdir: /root/k8s/omnia-kubeflow
+
+- name: Modify Cpu Limit for istio-ingressgateway-service-account 
+  replace:
+    path: /root/k8s/omnia-kubeflow/kustomize/istio-install/base/istio-noauth.yaml
+    after: 'serviceAccountName: istio-ingressgateway-service-account'
+    before: '---'
+    regexp: 'cpu: 100m'
+    replace: 'cpu: 2'
+  
+- name: Modify Mem Limit for istio-ingressgateway-service-account 
+  replace:
+    path: /root/k8s/omnia-kubeflow/kustomize/istio-install/base/istio-noauth.yaml
+    after: 'serviceAccountName: istio-ingressgateway-service-account'
+    before: '---'
+    regexp: 'memory: 128Mi'
+    replace: 'memory: 512Mi'
+
+- name: Modify Cpu Request for istio-ingressgateway-service-account 
+  replace:
+    path: /root/k8s/omnia-kubeflow/kustomize/istio-install/base/istio-noauth.yaml
+    after: 'serviceAccountName: istio-ingressgateway-service-account'
+    before: '---'
+    regexp: 'cpu: 10m'
+    replace: 'cpu: 1'
+  
+- name: Modify Mem Request for istio-ingressgateway-service-account 
+  replace:
+    path: /root/k8s/omnia-kubeflow/kustomize/istio-install/base/istio-noauth.yaml
+    after: 'serviceAccountName: istio-ingressgateway-service-account'
+    before: '---'
+    regexp: 'memory: 40Mi'
+    replace: 'memory: 256Mi'
+
+
+- name: Modify Cpu Limit for kfserving-gateway
+  replace:
+    path: /root/k8s/omnia-kubeflow/kustomize/kfserving-gateway/base/deployment.yaml
+    after: 'serviceAccountName: istio-ingressgateway-service-account'
+    before: 'env:'
+    regexp: 'cpu: 100m'
+    replace: 'cpu: 2'
+  
+- name: Modify Mem Limit for kfserving-gateway
+  replace:
+    path: /root/k8s/omnia-kubeflow/kustomize/kfserving-gateway/base/deployment.yaml
+    after: 'serviceAccountName: istio-ingressgateway-service-account'
+    before: 'env:'
+    regexp: 'memory: 128Mi'
+    replace: 'memory: 512Mi'
+
+- name: Modify Cpu Request for kfserving-gateway
+  replace:
+    path: /root/k8s/omnia-kubeflow/kustomize/kfserving-gateway/base/deployment.yaml
+    after: 'serviceAccountName: istio-ingressgateway-service-account'
+    before: 'env:'
+    regexp: 'cpu: 10m'
+    replace: 'cpu: 1'
+  
+- name: Modify Mem Request for kfserving-gateway
+  replace:
+    path: /root/k8s/omnia-kubeflow/kustomize/kfserving-gateway/base/deployment.yaml
+    after: 'serviceAccountName: istio-ingressgateway-service-account'
+    before: 'env:'
+    regexp: 'memory: 40Mi'
+    replace: 'memory: 256Mi'
+
+
+- name: Change Argo base service from NodePort to LoadBalancer
+  replace:
+    path: /root/k8s/omnia-kubeflow/kustomize/argo/base/service.yaml
+    regexp: 'NodePort'
+    replace: 'LoadBalancer'
+
+- name: Change istio-install base istio-noauth service from NodePort to LoadBalancer
+  replace:
+    path: /root/k8s/omnia-kubeflow/kustomize/istio-install/base/istio-noauth.yaml
+    regexp: 'NodePort'
+    replace: 'LoadBalancer'
+
+- name: Apply Kubeflow Configuration
+  shell: 
+    cmd: /usr/bin/kfctl apply -V -f /root/k8s/omnia-kubeflow/kfctl_k8s_istio.v1.0.2.yaml
+    chdir: /root/k8s/omnia-kubeflow

+ 26 - 12
kubernetes/roles/master/tasks/main.yml

@@ -1,16 +1,30 @@
----
-- name: Firewall Rule K8s:6443/tcp
-  command: firewall-cmd  --zone=internal --add-port=6443/tcp --permanent
-  tags: master
-
-- name: Firewall Rule K8s:10250/tcp
-  command: firewall-cmd  --zone=internal --add-port=10250/tcp --permanent
-  tags: master
-
-- name: Firewall Reload
-  command: firewall-cmd  --reload
-  tags: master
+#  Copyright 2020 Dell Inc. or its subsidiaries. All Rights Reserved. 
+#
+#  Licensed under the Apache License, Version 2.0 (the "License");
+#  you may not use this file except in compliance with the License.
+#  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+#  limitations under the License.
 
+---
+#- name: Firewall Rule K8s:6443/tcp
+  #command: firewall-cmd  --zone=internal --add-port=6443/tcp --permanent
+  #tags: master
+#
+#- name: Firewall Rule K8s:10250/tcp
+  #command: firewall-cmd  --zone=internal --add-port=10250/tcp --permanent
+  #tags: master
+##
+#- name: Firewall Reload
+  #command: firewall-cmd  --reload
+  #tags: master
+#
 - name: Create /root/bin (if it doesn't exist)
   file:
     path: /root/bin

+ 35 - 5
kubernetes/roles/startmaster/tasks/main.yml

@@ -1,10 +1,24 @@
+#  Copyright 2020 Dell Inc. or its subsidiaries. All Rights Reserved. 
+#
+#  Licensed under the Apache License, Version 2.0 (the "License");
+#  you may not use this file except in compliance with the License.
+#  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+#  limitations under the License.
+
 ---
 - name: Turn Swap OFF (if not already disabled)
   command: /usr/sbin/swapoff -a
   tags: init
 
 - name: Initialize kubeadm
-  command: /bin/kubeadm init --pod-network-cidr=10.244.0.0/16 --apiserver-advertise-address=10.0.0.1
+  command: /bin/kubeadm init --pod-network-cidr=10.244.0.0/16 --apiserver-advertise-address={{ master_ip }}
   #command: /bin/kubeadm init 
   register: init_output 
   tags: init
@@ -14,7 +28,13 @@
   tags: init
 
 - name: Copy Kubernetes Config for root #do this for other users too?
-  copy: src=/etc/kubernetes/admin.conf dest=/root/.kube/config owner=root group=root mode=644
+  copy: 
+    src: /etc/kubernetes/admin.conf 
+    dest: /root/.kube/config 
+    owner: root 
+    group: root 
+    mode: 644
+    remote_src: yes
   tags: init
 
 - name: Cluster token
@@ -32,8 +52,7 @@
     name:   "K8S_TOKEN_HOLDER"
     token:  "{{ K8S_TOKEN.stdout }}"
     hash:   "{{ K8S_MASTER_CA_HASH.stdout }}"
-    #ip:     "{{ ansible_ib0.ipv4.address }}"
-    ip:     "{{ ansible_p3p1.ipv4.address }}"
+    ip:     "{{ master_ip }}"
   tags: init
 
 - name:
@@ -48,7 +67,7 @@
 
 - name:
   debug:
-    msg: "[Master] K8S_MASTER_IP is  {{ hostvars['K8S_TOKEN_HOLDER']['ip'] }}"
+    msg: "[Master] K8S_MASTER_IP is  {{ master_ip }}"
   tags: init
 
 - name: Setup Calico SDN network
@@ -65,6 +84,11 @@
   register: gpu_enable
   tags: init
 
+- name: Deploy Xilinx Device Plugin 
+  shell: kubectl create -f https://raw.githubusercontent.com/Xilinx/FPGA_as_a_Service/master/k8s-fpga-device-plugin/fpga-device-plugin.yml
+  register: fpga_enable
+  tags: init
+
 - name: Create yaml repo for setup
   file: 
     path: /root/k8s 
@@ -91,6 +115,12 @@
   shell: kubectl -n kube-system describe secret $(kubectl -n kube-system get secret | grep admin-user | awk '{print $1}') > /root/k8s/token
   tags: init
 
+- name: Edge / Workstation Install allows pods to scheudle on master
+  shell: kubectl taint nodes --all node-role.kubernetes.io/master-
+  when: single_node 
+  tags: init
+
+
 # If more debug information is needed during init uncomment the following 2 lines
 #- debug: var=init_output.stdout_lines
   #tags: init

+ 0 - 16
kubernetes/roles/startservices/files/jhub-db-pv.yaml

@@ -1,16 +0,0 @@
-apiVersion: v1
-kind: PersistentVolume
-metadata:
-  name: jupyterhub-db-pv
-spec:
-  capacity:
-    storage: 1Gi
-  accessModes:
-  - ReadWriteOnce
-  - ReadOnlyMany
-  - ReadWriteMany
-  nfs:
-    server: 10.0.0.1
-    path: /work/k8s/jhub-db
-  persistentVolumeReclaimPolicy: Recycle
-

+ 0 - 50
kubernetes/roles/startservices/files/jupyter-pvc.yaml

@@ -1,50 +0,0 @@
-apiVersion: v1 
-kind: PersistentVolume 
-metadata: 
-  name: jupyter-nfs
-spec: 
-  capacity: 
-    storage: 1Gi 
-  accessModes: 
-    - ReadWriteMany 
-  nfs: 
-    server: 10.0.0.1
-    path: "/work/jupyter1" 
-
----
-apiVersion: v1 
-kind: PersistentVolume 
-metadata: 
-  name: jupyter-hub-nfs
-spec: 
-  capacity: 
-    storage: 1Gi 
-  accessModes: 
-    - ReadWriteMany 
-  nfs: 
-    server: 10.0.0.1
-    path: "/work/jupyter2" 
- 
---- 
-kind: PersistentVolumeClaim 
-apiVersion: v1 
-metadata: 
-  name: jupyter-nfs-pvc
-spec: 
-  accessModes: 
-    - ReadWriteMany 
-  storageClassName: "nfs" 
-  resources: 
-    requests: 
-
---- 
-kind: PersistentVolumeClaim 
-apiVersion: v1 
-metadata: 
-  name: jupyter-hub-nfs-pvc
-spec: 
-  accessModes: 
-    - ReadWriteMany 
-  storageClassName: "nfs" 
-  resources: 
-    requests: 

+ 0 - 62
kubernetes/roles/startservices/files/jupyter_config.yaml

@@ -1,62 +0,0 @@
-proxy:
-  secretToken: "1c8572f630701e8792bede122ec9c4179d9087f801e1a85ed32cce69887aec1b"
-
-hub:
-  cookieSecret: "1c8572f630701e8792bede122ec9c4179d9087f801e1a85ed32cce69887aec1b"
-  service:
-    type: LoadBalancer
-  db: 
-    type: sqlite-pvc 
-  extraConfig:
-    jupyterlab: |
-      c.Spawner.cmd = ['jupyter-labhub']
-
-singleuser:
-  image:
-    name: jupyter/minimal-notebook
-    tag: 2343e33dec46
-  profileList:
-    - display_name: "Minimal environment"
-      description: "Short and sweet, no bells or whistles, vanilla: Python."
-      default: true
-    - display_name: "Datascience environment"
-      description: "Some additional bells and whistles: Python, R, and Julia."
-      kubespawner_override:
-        image: jupyter/datascience-notebook:2343e33dec46
-    - display_name: "Spark environment"
-      description: "The Jupyter Stacks with Spark"
-      kubespawner_override:
-        image: jupyter/all-spark-notebook:2343e33dec46
-    - display_name: "Learning Data Science"
-      description: "Datascience Environment with Sample Notebooks"
-      kubespawner_override:
-        image: jupyter/datascience-notebook:2343e33dec46
-        lifecycle_hooks:
-          postStart:
-            exec:
-              command:
-                - "sh"
-                - "-c"
-                - >
-                  gitpuller https://github.com/data-8/materials-fa17 master materials-fa;
-    - display_name: "GPU Environment"
-      description: "1 GPU for intro folks"
-      kubespawner_override:
-        image: jupyter/datascience-notebook:2343e33dec46
-        extra_resource_limits:
-          nvidia.com/gpu: "1"
-  storage:
-    dynamic:
-      storageClass: nfs-client
-  cpu:
-    limit: 1
-  memory:
-    limit: 100G
-    guarantee: 1G
-  defaultUrl: "/lab"
-
-
-prePuller:
-  continuous:
-    enabled: true
-

+ 10 - 10
kubernetes/roles/startservices/files/metal-config.yaml

@@ -9,13 +9,13 @@ data:
     - name: default
       protocol: layer2
       addresses:
-      - 10.0.0.150/32
-      - 10.0.0.151/32
-      - 10.0.0.152/32
-      - 10.0.0.153/32
-      - 10.0.0.154/32
-      - 10.0.0.155/32
-      - 10.0.0.156/32
-      - 10.0.0.157/32
-      - 10.0.0.158/32
-      - 10.0.0.159/32
+      - 192.168.2.150/32
+      - 192.168.2.151/32
+      - 192.168.2.152/32
+      - 192.168.2.153/32
+      - 192.168.2.154/32
+      - 192.168.2.155/32
+      - 192.168.2.156/32
+      - 192.168.2.157/32
+      - 192.168.2.158/32
+      - 192.168.2.159/32

+ 26 - 40
kubernetes/roles/startservices/tasks/main.yml

@@ -1,3 +1,17 @@
+#  Copyright 2020 Dell Technologies
+#
+#  Licensed under the Apache License, Version 2.0 (the "License");
+#  you may not use this file except in compliance with the License.
+#  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+#  limitations under the License.
+
 ---
 #- name: Kick CoreDNS (this is a hack that needs to be fixed)
   #shell:  kubectl get pods -n kube-system --no-headers=true | awk '/coredns/{print $1}'|xargs kubectl delete -n kube-system pod
@@ -27,58 +41,30 @@
   shell: kubectl apply -f /root/k8s/metal-config.yaml
   tags: init
 
-#- name: Helm - create service account
-  #shell: kubectl create serviceaccount --namespace kube-system tiller
-  #tags: init
-
-#- name: Helm - create clusterRole Binding for tiller-cluster-rule
-  #shell: kubectl create clusterrolebinding tiller-cluster-rule --clusterrole=cluster-admin --serviceaccount=kube-system:tiller
-  #tags: init
-
-#- name: Helm - create clusterRoleBinding for admin
-  #shell: kubectl create clusterrolebinding tiller-cluster-admin --clusterrole=cluster-admin --serviceaccount=kube-system:tiller
-  #tags: init
-
-#- name: Helm - init
-  #shell: helm init  --upgrade
-  #tags: init
-
-#- name: Wait for tiller to start 
-  #shell: kubectl rollout status deployment/tiller-deploy -n kube-system
-  #tags: init
-
-#- name: Helm - patch cluster Role Binding for tiller
-  #shell:  kubectl --namespace kube-system patch deploy tiller-deploy -p '{"spec":{"template":{"spec":{"serviceAccount":"tiller"}}}}'
-  #tags: init
-
-#- name: Wait for tiller to start 
-  #shell: kubectl rollout status deployment/tiller-deploy -n kube-system
-  #tags: init
-
 - name: Start K8S Dashboard
   shell: kubectl create -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.0.0-beta6/aio/deploy/recommended.yaml
   tags: init
 
-- name: Start NFS Client Provisioner
-  shell: helm install stable/nfs-client-provisioner --set nfs.server=10.0.0.1 --set nfs.path=/work --generate-name
+- name: Helm - Add Stable Repo
+  shell: helm repo add stable https://kubernetes-charts.storage.googleapis.com/
   tags: init
 
-- name: JupyterHub Persistent Volume Creation (files)  
-  copy: src=jhub-db-pv.yaml dest=/root/k8s/jhub-db-pv.yaml owner=root group=root mode=655
+- name: Helm - Update Repo
+  shell: helm repo update
   tags: init
 
-- name: jupyterHub Persistent Volume creation
-  shell: kubectl create -f /root/k8s/jhub-db-pv.yaml
+- name: Start NFS Client Provisioner
+  shell: helm install stable/nfs-client-provisioner --set nfs.server=10.0.0.1 --set nfs.path=/work --generate-name
   tags: init
 
-- name: JupyterHub Custom Config (files)  
-  copy: src=jupyter_config.yaml dest=/root/k8s/jupyter_config.yaml owner=root group=root mode=655
-  tags: init
- 
-- name: jupyterHub deploy
-  shell: helm install jupyterhub/jupyterhub  --namespace default --version 0.8.2 --values /root/k8s/jupyter_config.yaml --generate-name
+- name: Set NFS-Client Provisioner as DEFAULT StorageClass
+  shell: "kubectl patch storageclasses.storage.k8s.io nfs-client -p '{\"metadata\": {\"annotations\":{\"storageclass.kubernetes.io/is-default-class\":\"true\"}}}'"
   tags: init
 
 - name: Prometheus deployment
   shell: helm install stable/prometheus --set alertmanager.persistentVolume.storageClass=nfs-client,server.persistentVolume.storageClass=nfs-client,server.service.type=LoadBalancer --generate-name
   tags: init
+
+- name: Install MPI Operator
+  shell: kubectl create -f https://raw.githubusercontent.com/kubeflow/mpi-operator/master/deploy/v1alpha2/mpi-operator.yaml
+  tags: init

+ 15 - 0
kubernetes/roles/startworkers/tasks/main.yml

@@ -1,3 +1,17 @@
+#  Copyright 2020 Dell Inc. or its subsidiaries. All Rights Reserved. 
+#
+#  Licensed under the Apache License, Version 2.0 (the "License");
+#  you may not use this file except in compliance with the License.
+#  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+#  limitations under the License.
+
 ---
 
 - name: Turn Swap OFF (if not already disabled)
@@ -24,6 +38,7 @@
     kubeadm join --token={{ hostvars['K8S_TOKEN_HOLDER']['token'] }}
     --discovery-token-ca-cert-hash sha256:{{ hostvars['K8S_TOKEN_HOLDER']['hash'] }}
     {{ hostvars['K8S_TOKEN_HOLDER']['ip'] }}:6443
+  when: not single_node
   tags: init
 
 

+ 14 - 0
kubernetes/scuttle

@@ -1,3 +1,17 @@
+#  Copyright 2020 Dell Inc. or its subsidiaries. All Rights Reserved. 
+#
+#  Licensed under the Apache License, Version 2.0 (the "License");
+#  you may not use this file except in compliance with the License.
+#  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+#  limitations under the License.
+
 #!/bin/bash
 
 kubeadm reset -f

+ 13 - 0
slurm/roles/slurm-common/tasks/main.yaml

@@ -1,3 +1,16 @@
+#  Copyright 2020 Dell Inc. or its subsidiaries. All Rights Reserved. 
+#
+#  Licensed under the Apache License, Version 2.0 (the "License");
+#  you may not use this file except in compliance with the License.
+#  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+#  limitations under the License.
 ---
 
 - name: install packages for slurm

+ 14 - 0
slurm/roles/slurm-master/tasks/main.yaml

@@ -1,3 +1,17 @@
+#  Copyright 2020 Dell Inc. or its subsidiaries. All Rights Reserved. 
+#
+#  Licensed under the Apache License, Version 2.0 (the "License");
+#  you may not use this file except in compliance with the License.
+#  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+#  limitations under the License.
+
 ---
 
 - name: Download Slurm source

+ 14 - 0
slurm/roles/start-slurm-workers/tasks/main.yml

@@ -1,3 +1,17 @@
+#  Copyright 2020 Dell Inc. or its subsidiaries. All Rights Reserved. 
+#
+#  Licensed under the Apache License, Version 2.0 (the "License");
+#  you may not use this file except in compliance with the License.
+#  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+#  limitations under the License.
+
 ---
 - name: Install SLURM RPMs on compute
   yum:

+ 0 - 23
slurm/slurm-cluster.yaml

@@ -1,23 +0,0 @@
----
-#Playbook for installing Slurm on a cluster 
-
-#collect info from everything
-- hosts: all
-
-# Apply Common Installation and Config
-- hosts: cluster
-  gather_facts: false
-  roles:
-    - slurm-common
-
-# Apply Master Config, start services
-- hosts: master
-  gather_facts: false
-  roles:
-    - slurm-master
-
-# Start SLURM workers
-- hosts: compute
-  gather_facts: false
-  roles:
-    - start-slurm-workers

+ 36 - 0
slurm/slurm.yml

@@ -0,0 +1,36 @@
+#  Copyright 2020 Dell Inc. or its subsidiaries. All Rights Reserved. 
+#
+#  Licensed under the Apache License, Version 2.0 (the "License");
+#  you may not use this file except in compliance with the License.
+#  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+#  limitations under the License.
+---
+#Playbook for installing Slurm on a cluster 
+
+#collect info from everything
+- hosts: all
+
+# Apply Common Installation and Config
+- hosts: cluster
+  gather_facts: false
+  roles:
+    - slurm-common
+
+# Apply Master Config, start services
+- hosts: master
+  gather_facts: false
+  roles:
+    - slurm-master
+
+# Start SLURM workers
+- hosts: compute
+  gather_facts: false
+  roles:
+    - start-slurm-workers

+ 14 - 0
tools/change_personality

@@ -1,3 +1,17 @@
+#  Copyright 2020 Dell Inc. or its subsidiaries. All Rights Reserved. 
+#
+#  Licensed under the Apache License, Version 2.0 (the "License");
+#  you may not use this file except in compliance with the License.
+#  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+#  limitations under the License.
+
 #!/bin/bash
 
 #Usage: change_personality <k|s> <node_name>

+ 13 - 0
tools/install_tools.yml

@@ -1,3 +1,16 @@
+#  Copyright 2020 Dell Inc. or its subsidiaries. All Rights Reserved. 
+#
+#  Licensed under the Apache License, Version 2.0 (the "License");
+#  you may not use this file except in compliance with the License.
+#  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+#  limitations under the License.
 ---
 
 - hosts: master