|
@@ -1,10 +1,10 @@
|
|
**Omnia** (Latin: all or everything) is a deployment tool to configure Dell EMC PowerEdge servers running standard RPM-based Linux OS images into clusters capable of supporting HPC, AI, and data analytics workloads. It uses Slurm, Kubernetes, and other packages to manage jobs and run diverse workloads on the same converged solution. It is a collection of [Ansible](https://ansible.com) playbooks, is open source, and is constantly being extended to enable comprehensive workloads.
|
|
**Omnia** (Latin: all or everything) is a deployment tool to configure Dell EMC PowerEdge servers running standard RPM-based Linux OS images into clusters capable of supporting HPC, AI, and data analytics workloads. It uses Slurm, Kubernetes, and other packages to manage jobs and run diverse workloads on the same converged solution. It is a collection of [Ansible](https://ansible.com) playbooks, is open source, and is constantly being extended to enable comprehensive workloads.
|
|
|
|
|
|
#### Current release version
|
|
#### Current release version
|
|
-1.1.1
|
|
|
|
|
|
+1.2
|
|
|
|
|
|
#### Previous release version
|
|
#### Previous release version
|
|
-1.1
|
|
|
|
|
|
+1.1.1
|
|
|
|
|
|
## Blogs about Omnia
|
|
## Blogs about Omnia
|
|
- [Introduction to Omnia](https://infohub.delltechnologies.com/p/omnia-open-source-deployment-of-high-performance-clusters-to-run-simulation-ai-and-data-analytics-workloads/)
|
|
- [Introduction to Omnia](https://infohub.delltechnologies.com/p/omnia-open-source-deployment-of-high-performance-clusters-to-run-simulation-ai-and-data-analytics-workloads/)
|
|
@@ -27,6 +27,8 @@ Omnia can install Kubernetes or Slurm (or both), along with additional drivers,
|
|

|
|

|
|
|
|
|
|
## What's new in this release
|
|
## What's new in this release
|
|
|
|
+* Extended support of Leap OS on Management station, login, compute and NFS nodes.
|
|
|
|
+* Omnia now supports Powervault configurations with 2 network interfaces.
|
|
* Provisioning of Rocky custom ISO on supported PowerEdge servers using iDRAC.
|
|
* Provisioning of Rocky custom ISO on supported PowerEdge servers using iDRAC.
|
|
* Configuring Dell EMC networking switches, Mellanox InfiniBand switches, and PowerVault storage devices in the cluster.
|
|
* Configuring Dell EMC networking switches, Mellanox InfiniBand switches, and PowerVault storage devices in the cluster.
|
|
* An option to configure a login node with the same configurations as the compute nodes in the cluster. With appropriate user privileges provided by the cluster administrator, users can log in to the login node and schedule Slurm jobs. The authentication mechanism in the login node uses the FreeIPA solution.
|
|
* An option to configure a login node with the same configurations as the compute nodes in the cluster. With appropriate user privileges provided by the cluster administrator, users can log in to the login node and schedule Slurm jobs. The authentication mechanism in the login node uses the FreeIPA solution.
|
|
@@ -46,8 +48,8 @@ The following table lists the software and operating system requirements on the
|
|
|
|
|
|
Requirements | Version
|
|
Requirements | Version
|
|
---------------------------------- | -------
|
|
---------------------------------- | -------
|
|
-OS pre-installed on the management station | CentOS 8.4/ Rocky 8.4
|
|
|
|
-OS deployed by Omnia on bare-metal Dell EMC PowerEdge Servers | CentOS 7.9 2009 Minimal Edition/ Rocky 8.4 Minimal Edition
|
|
|
|
|
|
+OS pre-installed on the management station | CentOS 8.4/ Rocky 8.5/ Leap 15.3
|
|
|
|
+OS deployed by Omnia on bare-metal Dell EMC PowerEdge Servers | Rocky 8.5 Minimal Edition/ Leap 15.3
|
|
Cobbler | 3.2.2
|
|
Cobbler | 3.2.2
|
|
Ansible AWX | 19.1.0
|
|
Ansible AWX | 19.1.0
|
|
Slurm Workload Manager | 20.11.2
|
|
Slurm Workload Manager | 20.11.2
|
|
@@ -55,6 +57,9 @@ Kubernetes on the management station | 1.21.0
|
|
Kubernetes on the manager and compute nodes | 1.16.7 or 1.19.3
|
|
Kubernetes on the manager and compute nodes | 1.16.7 or 1.19.3
|
|
Kubeflow | 1
|
|
Kubeflow | 1
|
|
Prometheus | 2.23.0
|
|
Prometheus | 2.23.0
|
|
|
|
+Ansible | 2.9.21
|
|
|
|
+Python | 3.6.15
|
|
|
|
+CRI-O | 1.17.3
|
|
|
|
|
|
## Hardware managed by Omnia
|
|
## Hardware managed by Omnia
|
|
The following table lists the supported devices managed by Omnia. Other devices than those listed in the following table will be discovered by Omnia, but features offered by Omnia will not be applicable.
|
|
The following table lists the supported devices managed by Omnia. Other devices than those listed in the following table will be discovered by Omnia, but features offered by Omnia will not be applicable.
|
|
@@ -72,10 +77,11 @@ The following table lists the software and its compatible version managed by Omn
|
|
|
|
|
|
Software | License | Compatible Version | Description
|
|
Software | License | Compatible Version | Description
|
|
----------- | ------- | ---------------- | -----------------
|
|
----------- | ------- | ---------------- | -----------------
|
|
|
|
+LeapOS 15.3 | - | 15.3| Operating system on entire cluster
|
|
CentOS Linux release 7.9.2009 (Core) | - | 7.9 | Operating system on entire cluster except for management station
|
|
CentOS Linux release 7.9.2009 (Core) | - | 7.9 | Operating system on entire cluster except for management station
|
|
-Rocky 8.4 | - | 8.4 | Operating system on entire cluster except for management station
|
|
|
|
|
|
+Rocky 8.5 | - | 8.5 | Operating system on entire cluster except for management station
|
|
CentOS Linux release 8.4.2105 | - | 8.4 | Operating system on the management station
|
|
CentOS Linux release 8.4.2105 | - | 8.4 | Operating system on the management station
|
|
-Rocky 8.4 | - | 8.4 | Operating system on the management station
|
|
|
|
|
|
+Rocky 8.5 | - | 8.5 | Operating system on the management station
|
|
MariaDB | GPL 2.0 | 5.5.68 | Relational database used by Slurm
|
|
MariaDB | GPL 2.0 | 5.5.68 | Relational database used by Slurm
|
|
Slurm | GNU General Public | 20.11.7 | HPC Workload Manager
|
|
Slurm | GNU General Public | 20.11.7 | HPC Workload Manager
|
|
Docker CE | Apache-2.0 | 20.10.2 | Docker Service
|
|
Docker CE | Apache-2.0 | 20.10.2 | Docker Service
|
|
@@ -195,8 +201,6 @@ If hosts are listed, then an IP address has been assigned to them by DHCP. Howev
|
|
* **Issue**: Hosts are not automatically deleted from awx UI when redeploying the cluster.
|
|
* **Issue**: Hosts are not automatically deleted from awx UI when redeploying the cluster.
|
|
**Resolution**: Before re-deploying the cluster, ensure that the user manually deletes all hosts from the awx UI.
|
|
**Resolution**: Before re-deploying the cluster, ensure that the user manually deletes all hosts from the awx UI.
|
|
|
|
|
|
-* **Issue**: Decomissioned compute nodes do not get deleted automatically from the awx UI.
|
|
|
|
- **Resolution**: Once a node is decommisioned, ensure that the user manually deletes decomissioned hosts from the awx UI.
|
|
|
|
|
|
|
|
# [Frequently asked questions](FAQ.md)
|
|
# [Frequently asked questions](FAQ.md)
|
|
|
|
|
|
@@ -209,6 +213,8 @@ If hosts are listed, then an IP address has been assigned to them by DHCP. Howev
|
|
* To change the Kubernetes version from 1.16 to 1.19 or 1.19 to 1.16, you must redeploy the entire cluster.
|
|
* To change the Kubernetes version from 1.16 to 1.19 or 1.19 to 1.16, you must redeploy the entire cluster.
|
|
* The Kubernetes pods will not be able to access the Internet or start when firewalld is enabled on the node. This is a limitation in Kubernetes. So, the firewalld daemon will be disabled on all the nodes as part of omnia.yml execution.
|
|
* The Kubernetes pods will not be able to access the Internet or start when firewalld is enabled on the node. This is a limitation in Kubernetes. So, the firewalld daemon will be disabled on all the nodes as part of omnia.yml execution.
|
|
* Only one storage instance (Powervault) is currently supported in the HPC cluster.
|
|
* Only one storage instance (Powervault) is currently supported in the HPC cluster.
|
|
|
|
+* Cobbler web support has been discontinued from Omnia 1.2 onwards.
|
|
|
|
+
|
|
|
|
|
|
# Contributing to Omnia
|
|
# Contributing to Omnia
|
|
The Omnia project was started to give members of the [Dell Technologies HPC Community](https://dellhpc.org) a way to easily set up clusters of Dell EMC servers, and to contribute useful tools, fixes, and functionality back to the HPC Community.
|
|
The Omnia project was started to give members of the [Dell Technologies HPC Community](https://dellhpc.org) a way to easily set up clusters of Dell EMC servers, and to contribute useful tools, fixes, and functionality back to the HPC Community.
|