Browse Source

Updating docs

Signed-off-by: cgoveas <cassandra.goveas@dell.com>
cgoveas 3 years ago
parent
commit
52e502916a

+ 4 - 1
README.md

@@ -17,6 +17,7 @@ Omnia (Latin: all or everything) is a deployment tool to turn servers with RPM-b
 ## Pre Requisites before installing Omnia
 - [Python3](https://www.python.org/)
 - [Ansible 2.11.9](https://www.ansible.com/)
+- [RockyOS](https://rockylinux.org/)
 
 
 ## Installing Omnia
@@ -25,7 +26,9 @@ Omnia can be used in two ways:
 
 1. To [set up clusters on existing deployed hardware](docs/INSTALL_OMNIA.md) and then [monitor the clusters](docs/MONITOR_CLUSTERS.md)
 
-2. To [deploy OS's, packages and other open source software](docs/INSTALL_OMNIA_CONTROL_PLANE.md)
+2. To [deploy OS's, packages, open source software and set up security features](docs/INSTALL_OMNIA_CONTROL_PLANE.md)
+
+![Omnia Slurm Stack](docs/images/Omnia_Flow.png)
 
 
 ## Omnia Documentation

+ 6 - 1
docs/EXAMPLE_SYSTEM_DESIGNS.md

@@ -6,7 +6,12 @@ Omnia can configure systems which use Ethernet or Infiniband-based fabric to con
 ![Example system configuration with Infiniband fabric](images/example-system-infiniband.png)
 
 ## Network Setup
-Omnia assumes that servers are already connected to the network and have access to the internet.
+With Omnia 1.2, only the management station requires internet access. In such a situation, the network topology would follow the below diagram:
+![Network Connections when only the Management Station is connected to Internet](images/Omnia_NetworkConfig_NoInet.png)
+
+If the user would like to have all compute nodes connect to the internet, the following network diagram can be followed.
+![Network Connections when all servers are connected to the internet](images/Omnia_NetworkConfig_Inet.png)
+
 ### Network Topology
 Possible network configurations include:
 * A flat topology where all nodes are connected to a switch which includes an uplink to the internet. This requires multiple externally-facing IP addresses

+ 8 - 6
docs/INSTALL_OMNIA_CONTROL_PLANE.md

@@ -1,9 +1,9 @@
 # Install the Omnia Control Plane
 
-## Typical layout of a HPC cluster supported by Omnia 1.2
+## Typical layout of an HPC cluster supported by Omnia 1.2
 Using Omnia 1.2, you can provision and monitor hardware devices such as servers, storage devices, network switches, and InfiniBand switches in an HPC cluster. To enable Omnia to provision or configure the supported hardware devices, Omnia requires the following connections to be made available in your HPC cluster environment. 
 
-![Typical layout of a HPC cluster](images/typical_layout_hpc_cluster.jpg)
+![Typical layout of a HPC cluster](images/Omnia_Architecture.png)
 
 * Connecting a Pass-Through Switch: Provision and configure a 1GBE pass-through switch which will be used as a pass-through uplink switch. One of the NIC on the management station must be connected to a data port on the pass-through switch and a second connection must be established from a data port on the pass-through switch to the management port of the TOR network switch.  
 >> **Note:**  Omnia is not responsible for provisioning and configuring the pass-through switch.
@@ -23,11 +23,11 @@ Using Omnia 1.2, you can provision and monitor hardware devices such as servers,
 
 Depending on the pass-through switch configured in your HPC environment, the number of racks will be limited based on the number of ports available on the pass-through switch. To support additional racks, you can form an L1-L2 topology and configure a network of Passthrough Switches. A typical layout of an HPC cluster with a network of pass-through switches is as per the following illustration:  
 
-![Typical layout of a HPC cluster with a network of pass-through switches](images/typical_layout_hpc_clsuter_passthrough_network.jpg)
+![Typical layout of an HPC cluster with a network of pass-through switches](images/Omnia_NetworkConfig_Inet.png)
 
 ## Prerequisites to install the Omnia Control Plane version 1.2
 * Ensure that a stable Internet connection is available on management station, manager node, login node, and compute nodes. 
-* Rocky 8 /Leap 15.3 is installed on the management station. 		 
+* Rocky 8 is installed on the management station. 		 
 * To provision the bare metal servers, download one of the following ISOs for deployment:
     1. [Leap 15.3](https://get.opensuse.org/leap/)
     2. [Rocky 8](https://rockylinux.org/)
@@ -78,7 +78,7 @@ Depending on the pass-through switch configured in your HPC environment, the num
 	2. `pip uninstall ansible-base (if ansible 2.9 is installed)`
 	3. `pip uninstall ansible-core (if ansible 2.10  > version is installed)`
 	
-	>>__Note__: If you are using LeapOS, zypper may need to be updated using this command before running Omnia: `zypper update -y`
+	>>__Note__: If you are using LeapOS, zypper will need to be updated using this command before running Omnia: `zypper update -y`
 
 
 	* On the management station, run the following commands to install Git:
@@ -375,4 +375,6 @@ From Omnia 1.2, the cobbler container OS will follow the OS on the management st
 >> 2. Run `control_plane.yml` to provision leap and create a profile called `leap-x86_64` in the cobbler container.
 >> 3. Set `provision_os` to rocky and `iso_file_path` to `/root/Rocky-8.x-x86_64-minimal.iso`.
 >> 4. Run `control_plane.yml` to provision rocky and create a profile called `rocky-x86_64` in the cobbler container.
- 
+
+
+>> __Note:__ All compute nodes in a cluster must run the same OS. 

+ 11 - 17
docs/README.md

@@ -15,7 +15,6 @@
 
 ## What Omnia does
 Omnia can build clusters that use Slurm or Kubernetes (or both!) for workload management. Omnia will install software from a variety of sources, including:
-- Standard CentOS and [ELRepo](http://elrepo.org) repositories
 - Helm repositories
 - Source code compilation
 - [OperatorHub](https://operatorhub.io)
@@ -23,19 +22,21 @@ Omnia can build clusters that use Slurm or Kubernetes (or both!) for workload ma
 Whenever possible, Omnia will leverage existing projects rather than reinvent the wheel.
 
 ### Omnia stacks
-Omnia can install Kubernetes or Slurm (or both), along with additional drivers, services, libraries, and user applications.
+Omnia can deploy firmware, install Kubernetes or Slurm (or both), along with additional drivers, services, libraries, and user applications.
 ![Omnia Kubernetes Stack](images/omnia-k8s.png)
 
 ![Omnia Slurm Stack](images/omnia-slurm.png)  
 
 ## What's new in this release
-* Extended support of Leap OS on Management station, login, compute and NFS nodes.
-* Omnia now supports Powervault configurations with 2 network interfaces.
-* Omnia now supports multi profile creation and multi cluster provisioning using Cobbler.
-* Provisioning of Rocky custom ISO on supported PowerEdge servers using iDRAC.
-* Configuring Dell EMC networking switches, Mellanox InfiniBand switches, and PowerVault storage devices in the cluster. 
-* An option to configure a login node with the same configurations as the compute nodes in the cluster. With appropriate user privileges provided by the cluster administrator, users can log in to the login node and schedule Slurm jobs. The authentication mechanism in the login node uses the FreeIPA solution.
-* Options to enable the security settings on the iDRAC such as system lockdown mode, secure boot mode, 2-factor authentication (2FA), and LDAP directory services.
+- Support for Rocky 8.x with latest python/ansible on the Management Station
+- Support for Leap 15.3 on the cluster
+- Support for Rocky 8.x on the cluster
+- Added Grafana integration for better monitoring capability
+- Added Loki Log aggregation of Var Logs
+- Added Slurm/K8s Monitoring capability
+- Added security features to comply with NIST 800-53 Revision 5 and 800-171 Revision 5
+- Added the ability to collect telemetry information from SLURM and iDRAC
+- Added Grafana plugins to view real time graphs of cluster/node statistics
 
 ## Deploying clusters using the Omnia control plane
 The Omnia Control Plane will automate the entire cluster deployment process, starting with provisioning the operating system on the supported devices and updating the firmware versions of PowerEdge Servers. 
@@ -51,15 +52,8 @@ The following table lists the software and operating system requirements on the
 
 Requirements  |   Version
 ----------------------------------  |   -------
-OS pre-installed on the management station  |  Rocky 8.x/ Leap 15.x
+OS pre-installed on the management station  |  Rocky 8.x
 OS deployed by Omnia on bare-metal Dell EMC PowerEdge Servers | Rocky 8.x Minimal Edition/ Leap 15.x
-Cobbler  |  3.2.2
-Ansible AWX  |  20.0.0
-Slurm Workload Manager  |  20.11.2
-Kubernetes on the management station  |  1.21.0
-Kubernetes on the manager and compute nodes	|	1.16.7 or 1.19.3
-Kubeflow  |  1
-Prometheus  |  2.23.0
 Ansible  |  2.9.21
 Python  |  3.6.15
 

+ 2 - 4
docs/Telemetry_Visualization/TELEMETRY.md

@@ -2,7 +2,7 @@
 
 Using Grafana, users can poll multiple devices and create graphs/visualizations of key system metrics such as temperature, System power consumption, Memory Usage, IO Usage, CPU Usage, Total Memory Power, System Output Power, Total Fan Power, Total Storage Power, System Input Power, Total CPU Power, RPM Readings, Total Heat Dissipation, Power to Cool ratio, System Air Flow Efficiency etc.
 
-A lot of these metrics are collected using iDRAC telemetry. iDRAC telemetry allows you to stream telemetry data from your servers to a centralized log/metrics servers. For more information on iDRAC telemetry, click [here]( https://github.com/dell/iDRAC-Telemetry-Reference-Tools).
+A lot of these metrics are collected using iDRAC telemetry. iDRAC telemetry allows you to stream telemetry data from your servers to a centralized log/metrics server. For more information on iDRAC telemetry, click [here]( https://github.com/dell/iDRAC-Telemetry-Reference-Tools).
 
 ## Prerequisites
 
@@ -60,6 +60,4 @@ Use any one of the following browsers to access the Grafana UI (https://< Grafan
 ## Adding a New Node to Telemetry
 After initiation, new nodes can be added to telemetry by running the following commands from `omnia/telemetry`:
 		
-` ansible-playbook add_idrac_node.yml `
-
-	
+` ansible-playbook add_idrac_node.yml`

+ 1 - 1
docs/control_plane/input_parameters/PROVISION_SERVERS.md

@@ -7,7 +7,7 @@ Edit the following files under the `control_plane/input_params` directory to pro
 	a. `provision_password`- password used while provisioning OS on bare metal servers.  
 	b. `cobbler_password`- password for Cobbler.    
 	c. `idrac_username` and `idrac_password`- iDRAC username and password.   
-	**NOTE**: Minimum length of the password must be at least eight characters and a maximum of 30 characters. Do not use these characters while entering a password: -, \\, "", and \'
+>>	**NOTE**: Minimum length of the password must be at least eight characters and a maximum of 30 characters. Do not use these characters while entering a password: -, \\, "", and \'
 2. Edit the following variables in the `idrac_vars.yml` file.  
 
 	File name	|	Variables</br> [Required/ Optional]	|	Default, choices	|	Description

BIN
docs/images/Omnia_Architecture.png


BIN
docs/images/Omnia_Flow.png


BIN
docs/images/Omnia_NetworkConfig_Inet.png


BIN
docs/images/Omnia_NetworkConfig_NoInet.png