git clone -b release https://github.com/dellhpc/omnia.git
cd omnia
omnia_config.yml
file to:mariadb_password
and k8s_cni
respectively.The default value of Kubernetes Pod Network CIDR is 10.244.0.0/16. If 10.244.0.0/16 is already in use within your network, select a different Pod Network CIDR. For more information, see https://docs.projectcalico.org/getting-started/kubernetes/quickstart.
Change the directory to omnia->appliance: cd omnia/appliance
Edit the appliance_config.yml
file to:
a. Provide passwords for Cobbler and AWX under provision_password
and awx_password
respectively.
Note: Minimum length of the password must be at least eight characters and a maximum of 30 characters. Do not use these characters while entering a password: -, \, "", and \'
b. Change the NIC for the DHCP server under hpc_nic
, and the NIC used to connect to the Internet under public_nic
. The default values of hpc_nic and public_nic are set to em1 and em2 respectively.
c. Provide the CentOS-7-x86_64-Minimal-2009 ISO file path under iso_file_path
. This ISO file is used by Cobbler to provision the OS on the compute nodes.
Note: It is recommended that the ISO image file is not renamed. And, you must not change the path of this ISO image file as the provisioning of the OS on the compute nodes may be impacted.
d. Provide a mapping file for DHCP configuration under mapping_file_path
. The mapping_file.csv template file is present under omnia/examples
. Enter the details in the order: MAC, Hostname, IP
. The header in the template file must not be deleted before saving the file.
If you want to continue without providing a mapping file, leave the mapping_file_path
value as blank.
Note: Ensure that duplicate values are not provided for MAC, Hostname, and IP in the mapping file. The Hostname should not contain the following characters: , (comma), . (period), and _ (underscore).
e. Provide valid DHCP range for HPC cluster under the variables dhcp_start_ip_range
and dhcp_end_ip_range
.
f. GMT is the default configured time zone set during the provisioning of OS on compute nodes. To change the time zone, edit the timezone
variable and enter a time zone. You can set the time zone to EST, CET, MST, CST6CDT, or PST8PDT. For a list of available time zones, see the appliance/common/files/timezone.txt
file.
Omnia considers the following usernames as default:
cobbler
for Cobbler Serveradmin
for AWXslurm
for MariaDBansible-playbook appliance.yml
to install the Omnia appliance.Omnia creates a log file which is available at: /var/log/omnia.log
.
Note: If you want to view the Cobbler and AWX passwords provided in the appliance_config.yml file, run ansible-vault view appliance_config.yml --vault-password-file .vault_key
.
Omnia role used: provision
Ports used by Cobbler:
To create the Cobbler image, Omnia configures the following:
To access the Cobbler dashboard, enter https://<IP>/cobbler_web
where <IP>
is the Global IP address of the management node. For example, enter
https://100.98.24.225/cobbler_web
to access the Cobbler dashboard.
Note: After the Cobbler Server provisions the operating system on the nodes, IP addresses and host names are assigned by the DHCP service.
Note: If you want to add more nodes, append the new nodes in the existing mapping file. However, do not modify the previous nodes in the mapping file as it may impact the existing cluster.
Omnia role used: web_ui
The port used by AWX is 8081.
The AWX repository is cloned from the GitHub path: https://github.com/ansible/awx.git
Omnia performs the following configurations on AWX:
To access the AWX dashboard, enter http://<IP>:8081
where <IP> is the Global IP address of the management node. For example, enter http://100.98.24.225:8081
to access the AWX dashboard.
Note: The AWX configurations are automatically performed Omnia and Dell Technologies recommends that you do not change the default configurations provided by Omnia as the functionality may be impacted.
Note: Although AWX UI is accessible, hosts will be shown only after few nodes have been provisioned by Cobbler. It takes approximately 10 to 15 minutes to display the host details after the provisioning by Cobbler. If a server is provisioned but you are unable to view the host details on the AWX UI, then you can run the following command from omnia -> appliance ->tools folder to view the hosts which are reachable.
ansible-playbook -i ../roles/inventory/provisioned_hosts.yml provision_report.yml
Kubernetes and Slurm are installed by deploying the DeployOmnia template on the AWX dashboard.
slurm
and select slurm.kubernetes
skip tag.Note:
nfs_client
in the skip tag section to skip the k8s_nfs_client_setup role of Kubernetes.Note: If you want to install JupyterHub and Kubeflow playbooks, you have to first install the JupyterHub playbook and then install the Kubeflow playbook.
Note: To install JupyterHub and Kubeflow playbooks:
Note: When the Internet connectivity is unstable or slow, it may take more time to pull the images to create the Kubeflow containers. If the time limit is exceeded, the Apply Kubeflow configurations task may fail. To resolve this issue, you must redeploy Kubernetes cluster and reinstall Kubeflow by completing the following steps:
omnia_config.yml
file, change the k8s_cni variable value from calico to flannel.The DeployOmnia template may not run successfully if:
After DeployOmnia template is run from the AWX UI, the omnia.yml file installs Kubernetes and Slurm, or either Kubernetes or slurm, as per the selection in the template on the management node. Additionally, appropriate roles are assigned to the compute and manager groups.
The following kubernetes roles are provided by Omnia when omnia.yml file is run:
/home/k8snfs
, is created. Using this directory, compute nodes share the common files.Note:
kubeadm reset -f
on the nodes.omnia_config.yml
file to change the Kubernetes Pod Network CIDR. Suggested IP range is 192.168.0.0/16 and ensure you provide an IP which is not in use in your host network.The following Slurm roles are provided by Omnia when omnia.yml file is run:
If a new node is provisioned through Cobbler, the node address is automatically displayed on the AWX dashboard. The node is not assigned to any group. You can add the node to the compute group along with the existing nodes and run omnia.yml
to add the new node to the cluster and update the configurations in the manager node.