浏览代码

#1016 Updating docs

Signed-off-by: cgoveas <cassandra.goveas@dell.com>
cgoveas 3 年之前
父节点
当前提交
fa7a78417a
共有 56 个文件被更改,包括 1509 次插入1438 次删除
  1. 12 9
      README.md
  2. 14 0
      docs/Device_Configuration/Ethernet_Switches.md
  3. 45 0
      docs/Device_Configuration/Infiniband_Switches.md
  4. 10 0
      docs/Device_Configuration/PowerVault.md
  5. 97 0
      docs/Device_Configuration/Servers.md
  6. 2 4
      docs/EXAMPLE_SYSTEM_DESIGNS.md
  7. 0 228
      docs/INSTALL_OMNIA.md
  8. 34 0
      docs/Input_Parameter_Guide/Control_Plane_Parameters/Device_Parameters/TOR_Interface_Keys.md
  9. 8 0
      docs/Input_Parameter_Guide/Control_Plane_Parameters/Device_Parameters/ethernet_tor_vars.md
  10. 10 0
      docs/Input_Parameter_Guide/Control_Plane_Parameters/Device_Parameters/ethernet_vars.md
  11. 10 0
      docs/Input_Parameter_Guide/Control_Plane_Parameters/Device_Parameters/ib_vars.md
  12. 12 0
      docs/Input_Parameter_Guide/Control_Plane_Parameters/Device_Parameters/idrac_2fa.md
  13. 18 0
      docs/Input_Parameter_Guide/Control_Plane_Parameters/Device_Parameters/idrac_ldap.md
  14. 25 0
      docs/Input_Parameter_Guide/Control_Plane_Parameters/Device_Parameters/idrac_tools_vars.md
  15. 12 0
      docs/Input_Parameter_Guide/Control_Plane_Parameters/Device_Parameters/idrac_vars.md
  16. 19 0
      docs/Input_Parameter_Guide/Control_Plane_Parameters/Device_Parameters/powervault_me4_vars.md
  17. 10 355
      docs/INSTALL_OMNIA_CONTROL_PLANE.md
  18. 21 0
      docs/Input_Parameter_Guide/Control_Plane_Parameters/login_vars.md
  19. 159 0
      docs/Input_Parameter_Guide/Control_Plane_Parameters/opensm_conf.md
  20. 5 115
      docs/Security/ENABLE_SECURITY_MANAGEMENT_STATION.md
  21. 11 0
      docs/Input_Parameter_Guide/Telemetry_Visualization_Parameters/telemetry_base_vars.md
  22. 11 0
      docs/Input_Parameter_Guide/Telemetry_Visualization_Parameters/telemetry_login_vars.md
  23. 19 0
      docs/Input_Parameter_Guide/omnia_config.md
  24. 151 0
      docs/Installation_Guides/INSTALL_OMNIA_CLI.md
  25. 257 0
      docs/Installation_Guides/INSTALL_OMNIA_CONTROL_PLANE.md
  26. 13 0
      docs/Installation_Guides/INSTALL_TELEMETRY.md
  27. 19 0
      docs/LIMITATIONS.md
  28. 30 30
      docs/MONITOR_CLUSTERS.md
  29. 17 0
      docs/PreRequisites/Control_Plane_Security_PreReqs.md
  30. 19 0
      docs/PreRequisites/Login_Node_Security_PreReqs.md
  31. 56 0
      docs/PreRequisites/Omnia_Control_Plane_PreReqs.md
  32. 15 0
      docs/PreRequisites/Telemetry_Visualization_PreReqs.md
  33. 23 168
      docs/README.md
  34. 82 0
      docs/Security/ENABLE_SECURITY_CONTROL_PLANE.md
  35. 2 38
      docs/Security/ENABLE_SECURITY_LOGIN_NODE.md
  36. 0 0
      docs/Security/FreeIPA_User_Creation.md
  37. 11 0
      docs/Support_Matrix/Hardware/Servers.md
  38. 7 0
      docs/Support_Matrix/Hardware/Storage.md
  39. 13 0
      docs/Support_Matrix/Hardware/Switches.md
  40. 65 0
      docs/Support_Matrix/Software/Additional_Software.md
  41. 5 0
      docs/Support_Matrix/Software/Operating_Systems/CentOS.md
  42. 6 0
      docs/Support_Matrix/Software/Operating_Systems/LeapOS.md
  43. 6 0
      docs/Support_Matrix/Software/Operating_Systems/RHEL.md
  44. 5 0
      docs/Support_Matrix/Software/Operating_Systems/Rocky.md
  45. 0 11
      docs/Telemetry_Visualization/VISUALIZATION.md
  46. 84 4
      docs/FAQ.md
  47. 59 0
      docs/Troubleshooting/Troubleshooting_Guide.md
  48. 0 70
      docs/control_plane/device_templates/CONFIGURE_INFINIBAND_SWITCHES.md
  49. 0 43
      docs/control_plane/device_templates/CONFIGURE_POWERSWITCHES.md
  50. 0 42
      docs/control_plane/device_templates/CONFIGURE_POWERVAULT_STORAGE.md
  51. 0 146
      docs/control_plane/device_templates/PROVISION_SERVERS.md
  52. 0 37
      docs/control_plane/input_parameters/INFINIBAND_SWITCHES.md
  53. 0 76
      docs/control_plane/input_parameters/POWERSWITCHES.md
  54. 0 37
      docs/control_plane/input_parameters/POWERVAULT_STORAGE.md
  55. 0 25
      docs/control_plane/input_parameters/PROVISION_SERVERS.md
  56. 二进制
      docs/images/nmcli_output.jpg

+ 12 - 9
README.md

@@ -20,16 +20,19 @@ Omnia (Latin: all or everything) is a deployment tool to turn servers with RPM-b
 - [RockyOS](https://rockylinux.org/)
 
 
-## Installing Omnia
+# Using Omnia
 
-Omnia can be used in two ways:
-
-1. To [set up clusters on existing deployed hardware](docs/INSTALL_OMNIA.md) and then [monitor the clusters](docs/MONITOR_CLUSTERS.md)
-
-2. To [deploy OS's, packages, open source software and set up security features](docs/INSTALL_OMNIA_CONTROL_PLANE.md)
-
-![Omnia Slurm Stack](docs/images/Omnia_Flow.png)
+1. Verify that your system meets Omnia's [hardware](docs/Support_Matrix/Hardware) and [software requirements](docs/Support_Matrix/Software/Operating_Systems)
+2. Ensure that all [pre-requisites](docs/PreRequisites) are met.
+3. Fill out all the required [input parameters](docs/Input_Parameter_Guide).
+4. [Run Control_Plane](docs/Installation_Guides/INSTALL_OMNIA_CONTROL_PLANE.md) to provision OS's, [configure devices](docs/Device_Configuration) and set up [security measures](docs/Security):
+5. [Run Omnia](docs/Installation_Guides/INSTALL_OMNIA_CLI.md) to set up Kubernetes and Slurm.
+6. Run the telemetry playbook to [set up](docs/Installation_Guides/INSTALL_TELEMETRY.md) and use [Telemetry and Visualization Services](docs/Telemetry_Visualization)
+   ![Omnia Flow](docs/images/Omnia_Flow.png)
 
+## Troubleshooting Omnia
+* For a list of commonly encountered issues, check out our [FAQs](docs/Troubleshooting/FAQ.md).
+* To troubleshoot Omnia, use our [Troubleshooting Guide](docs/Troubleshooting/Troubleshooting_Guide.md).
 
 ## Omnia Documentation
 For Omnia documentation, please see the [website](https://dellhpc.github.io/omnia).
@@ -43,7 +46,7 @@ For Omnia documentation, please see the [website](https://dellhpc.github.io/omni
 <img src="https://user-images.githubusercontent.com/5414112/153955170-0a4b199a-54f0-42af-939c-03eac76881c0.png" height="100px" alt="Texas Tech University">
 
 ## Contributors
-Thanks goes to everyone who makes Omnia possible ([emoji key](https://allcontributors.org/docs/en/emoji-key)):
+Our thanks go to everyone who makes Omnia possible ([emoji key](https://allcontributors.org/docs/en/emoji-key)):
 <!-- ALL-CONTRIBUTORS-LIST:START - Do not remove or modify this section -->
 <!-- prettier-ignore-start -->
 <!-- markdownlint-disable -->

+ 14 - 0
docs/Device_Configuration/Ethernet_Switches.md

@@ -0,0 +1,14 @@
+# Configuring Ethernet Switches
+
+* Enter the information required in `input_params/base_vars.yml`, `input_params/login_vars.yml`, `ethernet_vars.yml` and/or `input_params/ethernet_tor_vars.yml` per the provided [Input Parameter Guides](../Input_Parameter_Guide/Control_Plane_Parameters/Device_Parameters).
+
+>>__Note:__ 
+>> * Edit the `ethernet_tor_vars.yml` file for all S3* and S4* PowerSwitches such as S3048-ON, S4048T-ON, S4112F-ON, S4048-ON, S4048T-ON, S4112F-ON, S4112T-ON, and S4128F-ON.  
+>> * Edit the `ethernet_vars.yml` file for Dell PowerSwitch S5232F-ON and all other PowerSwitches except S3* and S4* switches.
+
+## Run `ethernet_template` on the AWX UI.
+1. Run `kubectl get svc -n awx`.
+2. Copy the Cluster-IP address of the awx-ui. 
+3. To retrieve the AWX UI password, run `kubectl get secret awx-admin-password -n awx -o jsonpath="{.data.password}" | base64 --decode`.
+4. Open the default web browser on the control plane and enter `http://<IP>:8052`, where IP is the awx-ui IP address and 8052 is the awx-ui port number. Log in to the AWX UI using the username as `admin` and the retrieved password.  
+5. Under __RESOURCES__ -> __Templates__, launch the **ethernet_template**.

文件差异内容过多而无法显示
+ 45 - 0
docs/Device_Configuration/Infiniband_Switches.md


+ 10 - 0
docs/Device_Configuration/PowerVault.md

@@ -0,0 +1,10 @@
+# Configuring Dell EMC PowerVault Storage  
+
+* Enter the information required in `input_params/base_vars.yml`, `input_params/login_vars.yml` and `input_params/powervault_me4_vars` per the provided [Input Parameter Guides](../Input_Parameter_Guide/Control_Plane_Parameters).
+
+## Run `ME4_template` on the AWX UI.
+1. Run `kubectl get svc -n awx`.
+2. Copy the Cluster-IP address of the awx-ui. 
+3. To retrieve the AWX UI password, run `kubectl get secret awx-admin-password -n awx -o jsonpath="{.data.password}" | base64 --decode`.
+4. Open the default web browser on the control plane and enter `http://<IP>:8052`, where IP is the awx-ui IP address and 8052 is the awx-ui port number. Log in to the AWX UI using the username as `admin` and the retrieved password.  
+5. Under __RESOURCES__ -> __Templates__, launch the **powervault_me4_template**.

+ 97 - 0
docs/Device_Configuration/Servers.md

@@ -0,0 +1,97 @@
+# Custom ISO provisioning on Dell EMC PowerEdge Servers
+
+* Enter the information required in `input_params/base_vars.yml`, `input_params/login_vars.yml` and `idrac_vars.yml` per the provided [Input Parameter Guides](../Input_Parameter_Guide).
+
+## Configuring Servers with Out-of-Band Management (Provision Method: iDRAC)
+
+### Generating a Custom ISO
+* Using the Omnia role _control_plane_customiso_, a custom ISO is generated. Based on the parameters entered above, the Kickstart file is configured and added to the custom ISO file. The *unattended_centos7.iso*, *unattended_rocky8.iso* or *unattended_leap15.iso* file is copied to an NFS share on the control plane to provision the PowerEdge servers using iDRAC. 
+
+### Run idrac_template on the AWX UI.
+1. Run `kubectl get svc -n awx`.
+2. Copy the Cluster-IP address of the awx-ui. 
+3. To retrieve the AWX UI password, run `kubectl get secret awx-admin-password -n awx -o jsonpath="{.data.password}" | base64 --decode`.
+4. Open the default web browser on the control plane and enter `http://<IP>:8052`, where IP is the awx-ui IP address and 8052 is the awx-ui port number. Log in to the AWX UI using the username as `admin` and the retrieved password.  
+5. Under __RESOURCES__ -> __Templates__, launch the **idrac_template**.
+
+Omnia role used to provision custom ISO on PowerEdge Servers using iDRAC: *provision_idrac*  
+
+For the `idrac.yml` file to successfully provision the custom ISO on the PowerEdge Servers, ensure that the following prerequisites are met:
+* The **idrac_inventory** file is updated with the iDRAC IP addresses.
+* Required input parameters are updated in the **idrac_vars.yml** file under **omnia/control_plane/input_params** directory.
+* An *unattended_centos7.iso*, *unattended_rocky8.iso* or *unattended_leap15.iso* file is available in an NFS path.
+* The Lifecycle Controller Remote Services of PowerEdge Servers is in the 'ready' state.
+* The Redfish services are enabled in the iDRAC settings under **Services**.
+* The PowerEdge Servers have the iDRAC Enterprise or Datacenter license. If the license is not found, servers will be PXE booted and provisioned using Cobbler.  
+* If `provision_method` is set to PXE in `base_vars.yml`, ensure that all PXE devices have a configured, active NIC. To verify/ configure NIC availability: On the server, go to `BIOS Setup -> Network Settings -> PXE Device`. For each listed device (typically 4), configure/ check for an active NIC under `PXE device settings`
+* iDRAC 9 based Dell EMC PowerEdge Servers with firmware versions 5.00.10.20 and above. (With the latest BIOS available)
+
+The **provision_idrac** file configures and validates the following:
+* Required input parameters and prerequisites.
+* BIOS and SNMP settings.
+* The latest available version of the iDRAC firmware is updated.
+* If bare metal servers have a RAID controller installed, Virtual disks are created for RAID configuration.
+* Availability of iDRAC Enterprise or Datacenter License on iDRAC.  
+
+After the configurations are validated, the **provision_idrac** file provisions the custom ISO on the PowerEdge Servers. After the OS is provisioned successfully, iDRAC IP addresses are updated in the *provisioned_idrac_inventory* in AWX.
+
+>>**Note**:
+>> * The `idrac.yml` file initiates the provisioning of custom ISO on the PowerEdge servers. Wait for some time for the node inventory to be updated on the AWX UI. 
+>> * Due to the latest `catalog.xml` file, Firmware updates may fail for certain components. Omnia execution doesn't get interrupted but an error gets logged on AWX. For now, please download those individual updates manually.
+
+### Provisioning newly added PowerEdge servers in the cluster
+To provision newly added servers, wait till the iDRAC IP addresses are automatically added to the *idrac_inventory*. After the iDRAC IP addresses are added, launch the iDRAC template on the AWX UI to provision CentOS custom OS on the servers.  
+
+If you want to re-provision all the servers in the cluster or any of the faulty servers, you must remove the respective iDRAC IP addresses from *provisioned_idrac_inventory* on AWX UI and then launch the iDRAC template. If required, you can delete the *provisioned_idrac_inventory* from the AWX UI to remove the IP addresses of provisioned servers. After the servers are provisioned, *provisioned_idrac_inventory* is created and updated on the AWX UI.
+
+## Configuring Servers with In-Band Management (Provision Method: PXE)
+
+Omnia role used: *provision_cobbler*  
+Ports used by Cobbler:  
+* TCP ports: 69,8000, 8008
+* UDP ports: 69,4011
+
+To create the Cobbler image, Omnia configures the following:
+* Firewall settings.
+* The kickstart file of Cobbler to enable the UEFI PXE boot.
+
+To access the Cobbler dashboard, enter `https://<IP>/cobbler_web` where `<IP>` is the Global IP address of the control plane. For example, enter
+`https://100.98.24.225/cobbler_web` to access the Cobbler dashboard.
+
+>>__Note__: After the Cobbler Server provisions the operating system on the servers, IP addresses and hostnames are assigned by the DHCP service.  
+>>* If a mapping file is not provided, the hostname to the server is provided based on the following format: **computexxx-xxx** where "xxx-xxx" is the last two octets of the Host IP address. For example, if the Host IP address is 172.17.0.11 then the assigned hostname by Omnia is compute0-11.  
+>>* If a mapping file is provided, the hostnames follow the format provided in the mapping file.  
+
+>>__Note__: If you want to add more nodes, append the new nodes in the existing mapping file. However, do not modify the previous nodes in the mapping file as it may impact the existing cluster.
+
+>> __Note__: With the addition of Multiple profiles, the cobbler container dynamically updates the mount point based on the value of `provision_os` in `base_vars.yml`.
+
+### DHCP routing using Cobbler
+Omnia now supports DHCP routing via Cobbler. To enable routing, update the `primary_dns` and `secondary_dns` in `base_vars` with the appropriate IPs (hostnames are currently not supported). For compute nodes that are not directly connected to the internet (ie only host network is configured), this configuration allows for internet connectivity.
+
+
+## Security enhancements  
+Omnia provides the following options to enhance security on the provisioned PowerEdge servers:
+* **System lockdown mode**: To enable the system lockdown mode on iDRAC, set the *system_lockdown* variable to "enabled" in the `idrac_vars.yml` file.
+* **Secure boot mode**: To enable the secure boot mode on iDRAC, set the *uefi_secure_boot* variable to "enabled" in the `idrac_vars.yml` file.
+* **2-factor authentication (2FA)**: To enable the 2FA on iDRAC, set the *two_factor_authentication* variable to "enabled" in the `idrac_vars.yml` file.  
+	
+	**WARNING**: If 2FA is enabled on iDRAC, you must manually disable 2FA on iDRAC by setting the *Easy 2FA State* to "Disabled" for the user specified in the `login_vars.yml` file to run other iDRAC playbooks. 
+	
+* Before executing the **idrac_2fa.yml**, you must edit the `idrac_tools_vars.yml` by running the following command: `ansible-vault edit idrac_tools_vars.yml --vault-password-file .idrac_vault_key`.   
+* Provide the relevant details in the **idrac_2fa.yml** file. (Information provided in the Parameter Guide) 
+>> **Note**: 2FA will be enabled on the iDRAC only if SMTP server details are valid and a test email notification is working using SMTP.  
+* **LDAP Directory Services**: To enable or disable the LDAP directory services, set the *ldap_directory_services* variable to "enabled" in the `idrac_vars.yml` file.  
+* Before executing the **idrac_ldap.yml** file, you must edit `idrac_tools_vars.yml` by running the following command: `ansible-vault edit idrac_tools_vars.yml --vault-password-file .idrac_vault_key`.  
+		* Provide the following values in the **idrac_ldap.yml** file.  
+		* To view the `idrac_tools_vars.yml` file, run the following command: `ansible-vault view idrac_tools_vars.yml --vault-password-file .idrac_vault_key`  
+	
+>>**Note**: It is suggested that you use the ansible-vault view or edit commands and that you do not use the ansible-vault decrypt or encrypt commands. If you have used the ansible-vault decrypt or encrypt commands, provide 644 permission to `idrac_tools_vars.yml`.  
+
+On the AWX Dashboard, select the respective security requirement playbook and launch the iDRAC template by performing the following steps.
+1. On the AWX Dashboard, under __RESOURCES__ -> __Templates__, select the **idrac_template**.
+2. Under the **Details** tab, click **Edit**.
+3. In the **Edit Details** page, click the **Playbook** drop-down menu and select **tools/idrac_system_lockdown.yml**, **tools/idrac_secure_boot.yml**, **tools/idrac_2fa.yml**, or **tools/idrac_ldap.yml**.
+4. Click **Save**.
+5. To launch the iDRAC template with the respective playbook selected, click **Launch**.  
+

+ 2 - 4
docs/EXAMPLE_SYSTEM_DESIGNS.md

@@ -6,8 +6,8 @@ Omnia can configure systems which use Ethernet or Infiniband-based fabric to con
 ![Example system configuration with Infiniband fabric](images/example-system-infiniband.png)
 
 ## Network Setup
-With Omnia 1.2, only the management station requires internet access. In such a situation, the network topology would follow the below diagram:
-![Network Connections when only the Management Station is connected to Internet](images/Omnia_NetworkConfig_NoInet.png)
+With Omnia 1.2, only the control plane requires internet access. In such a situation, the network topology would follow the below diagram:
+![Network Connections when only the Control Plane is connected to Internet](images/Omnia_NetworkConfig_NoInet.png)
 
 If the user would like to have all compute nodes connect to the internet, the following network diagram can be followed.
 ![Network Connections when all servers are connected to the internet](images/Omnia_NetworkConfig_Inet.png)
@@ -16,5 +16,3 @@ If the user would like to have all compute nodes connect to the internet, the fo
 Possible network configurations include:
 * A flat topology where all nodes are connected to a switch which includes an uplink to the internet. This requires multiple externally-facing IP addresses
 * A hierarchical topology where compute nodes are connected to a common switch, but the manager node contains a second network connection which is connected to the internet. All outbound/inbound traffic would be routed through the manager node. This requires setting up firewall rules for IP masquerade, see [here](https://www.server-world.info/en/note?os=CentOS_7&p=firewalld&f=2) for an example.
-### IP and Hostname Assignment
-The recommended setup is to assign IP addresses to individual servers. This can be done manually by logging onto each node, or via DHCP.

+ 0 - 228
docs/INSTALL_OMNIA.md

@@ -1,228 +0,0 @@
-# Install Omnia using CLI
-
-The following sections provide details on installing Omnia using CLI.  
-
-To install the Omnia control plane and manage workloads on your cluster using the Omnia control plane, see [Install the Omnia Control Plane](INSTALL_OMNIA_CONTROL_PLANE.md) and [Monitor Kubernetes and Slurm](MONITOR_CLUSTERS.md) for more information.
-
-## Prerequisites
-* The login, manager, and compute nodes must be running CentOS 7.9 2009 OS/ Rocky 8.x/ LeapOS 15.3.
->> __Note:__ If you are using LeapOS, the following repositories will be enabled when running `omnia.yml`:
->> * OSS ([Repository](http://download.opensuse.org/distribution/leap/15.3/repo/oss/) + [Update](http://download.opensuse.org/update/leap/15.3/oss/))
->> * Non-OSS ([Repository](http://download.opensuse.org/distribution/leap/15.3/repo/non-oss/) + [Update](http://download.opensuse.org/update/leap/15.3/non-oss/))
-* If you have configured the `omnia_config.yml` file to enable the login node, the login node must be part of the cluster. 
-* All nodes must be connected to the network and must have access to the Internet.
-* Set the hostnames of all the nodes in the cluster.
-	* If the login node is enabled, then set the hostnames in the format: __hostname.domainname__. For example, "manager.omnia.test" is a valid hostname. **Do not** use underscores ( _ ) in the host names.
-	* Include the hostnames under /etc/hosts in the format: </br>*ipaddress hostname.domainname*. For example, "192.168.12.1 manager.example.com" is a valid entry.
-* SSH Keys for root are installed on all nodes to allow for password-less SSH.
-* The user should have root privileges to perform installations and configurations.
-* On the management station, ensure that you install Python 3.6 and Ansible.  
-	* Run the following commands to install Python 3.6:  
-		```
-		dnf install epel-release -y
-		dnf install python3 -y
-		```
-	* Run the following commands to install Ansible:
-		 ```
-		 pip3.6 install --upgrade pip
-		 python3.6 -m pip install ansible
-		 ```
-	After the installation is complete, run `ansible --version` to verify if the installation is successful. In the output, ensure that the executable location path is present in the PATH variable by running `echo $PATH`.
-	If the executable location path is not available, update the path by running `export PATH=$PATH:<executable location>\`.  
-	
-	For example,  
-	```
-	ansible -- version
-    ansible 2.10.9
-    config file = None
-    configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
-    ansible python module location = /usr/local/lib/python3.6/site-packages/ansible
-    executable location = /usr/local/bin/ansible
-    python version = 3.6.8 (default, Aug 24 2020, 17:57:11) [GCC 8.3.1 20191121 (Red Hat 8.3.1-5)]
-    ```
-	The executable location is `/usr/local/bin/ansible`. Update the path by running the following command:
-    ```
-	export PATH=$PATH:/usr/local/bin
-	```  
-	
->> **Note**: To deploy Omnia, Python 3.6 provides bindings to system tools such as RPM, DNF, and SELinux. As versions greater than 3.6 do not provide these bindings to system tools, ensure that you install Python 3.6 with dnf.  
-
->> **Note**: If Ansible version 2.9 or later is installed, ensure it is uninstalled before installing a newer version of Ansible. Run the following commands to uninstall Ansible before upgrading to a newer version.  
->> 1. `pip uninstall ansible`
->> 2. `pip uninstall ansible-base (if ansible 2.9 is installed)`
->> 3. `pip uninstall ansible-core (if ansible 2.10  > version is installed)`
-
->> __Note:__ If you are using LeapOS, zypper may need to be updated before installing Omnia using the command: `zypper update -y`
-
-
-* On the management station, run the following commands to install Git:
-	```
-	dnf install epel-release -y
-	dnf install git -y
-	```
-
->> **Note**: If there are errors while executing the Ansible playbook commands, then re-run the commands.  
-
-## Steps to install Omnia using CLI
-
-1. Clone the Omnia repository:
-``` 
-git clone https://github.com/dellhpc/omnia.git 
-```  
-
-<!---
-From release branch: 
-``` 
-git clone -b release https://github.com/dellhpc/omnia.git 
-```-->  
-
->> __Note:__ After the Omnia repository is cloned, a folder named __omnia__ is created. Ensure that you do not rename this folder.
-
-2. Change the directory to __omnia__: `cd omnia`
-
-3. In the `omnia_config.yml` file, provide the following details:  
-
-| Parameter Name             | Default Value | Additional Information                                                                                                                                                                                                                               |
-|----------------------------|---------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| mariadb_password           | password      | Password used to access the Slurm database. <br> Required Length: 8   characters <br> The password must not contain -,\, ',"                                                                                                                         |
-| k8s_version                | 1.16.7        | Kuberenetes Version <br> Accepted Values: "1.16.7" or   "1.19.3"                                                                                                                                                                                     |
-| k8s_cni                    | calico        | CNI type used by Kuberenetes. <br> Accepted values: calico, flannel                                                                                                                                                                                  |
-| k8s_pod_network_cidr       | 10.244.0.0/16 | Kubernetes pod network CIDR                                                                                                                                                                                                                          |
-| docker_username            |               | Username to login to Docker. A kubernetes secret will be created and   patched to the service account in default namespace. <br> This value is   optional but suggested to avoid docker pull limit issues                                            |
-| docker_password            |               | Password to login to Docker <br> This value is mandatory if a   docker_username is provided                                                                                                                                                          |
-| ansible_config_file_path   | /etc/ansible  | Path where the ansible.cfg file can be found. <br> If `dnf` is   used, the default value is valid. If `pip` is used, the variable must be set   manually                                                                                             |
-| login_node_required        | TRUE          | Boolean indicating whether the login node is required or not                                                                                                                                                                                         |
-| domain_name                | omnia.test    | Sets the intended domain name                                                                                                                                                                                                                        |
-| realm_name                 | OMNIA.TEST    | Sets the intended realm name                                                                                                                                                                                                                         |
-| directory_manager_password |               | Password authenticating admin level access to the Directory for system   management tasks. It will be added to the instance of directory server   created for IPA. <br> Required Length: 8 characters. <br> The   password must not contain -,\, '," |
-| kerberos_admin_password    |               | "admin" user password for the IPA server on RockyOS. If LeapOS is in use, it is used as the "kerberos admin" user password for 389-ds <br> This field is not relevant to Management Stations running `LeapOS`                                                                                                                                                                                                                            |
-| enable_secure_login_node   |  **false**, true             | Boolean value deciding whether security features are enabled on the Login Node. For more information, see [here](docs/Security/Enable_Security_LoginNode.md).                                                                                                                                                                                                                           |
-	
-	
->> __NOTE:__  Without the login node, Slurm jobs can be scheduled only through the manager node.
-
-4. Create an inventory file in the *omnia* folder. Add login node IP address under the *[login_node]* group, manager node IP address under the *[manager]* group, compute node IP addresses under the *[compute]* group, and NFS node IP address under the *[nfs_node]* group. A template file named INVENTORY is provided in the *omnia\docs* folder.  
->>	**NOTE**: Ensure that all the four groups (login_node, manager, compute, nfs_node) are present in the template, even if the IP addresses are not updated under login_node and nfs_node groups. 
-
-5. To install Omnia:
-
-| Leap OS                     	| CentOS, Rocky                                             	|
-|-----------------------------	|-----------------------------------------------------------	|
-| `ansible-playbook omnia.yml -i inventory -e 'ansible_python_interpreter=/usr/bin/python3'`   	| `ansible-playbook omnia.yml -i inventory`	|
-		
-
-
-6. By default, no skip tags are selected, and both Kubernetes and Slurm will be deployed.  
-
-	To skip the installation of Kubernetes, enter:  
-	`ansible-playbook omnia.yml -i inventory --skip-tags "kubernetes"` 
-	
-	To skip the installation of Slurm, enter:  
-	`ansible-playbook omnia.yml -i inventory --skip-tags "slurm"`  
-
-	To skip the NFS client setup, enter the following command to skip the k8s_nfs_client_setup role of Kubernetes:  
-	`ansible-playbook omnia.yml -i inventory --skip-tags "nfs_client"`
-
-	The default path of the Ansible configuration file is `/etc/ansible/`. If the file is not present in the default path, then edit the `ansible_config_file_path` variable to update the configuration path.
-
-7. To provide passwords for mariaDB Database (for Slurm accounting), Kubernetes Pod Network CIDR, and Kubernetes CNI, edit the `omnia_config.yml` file.  
->> __Note:__ 
-* Supported values for Kubernetes CNI are calico and flannel. The default value of CNI considered by Omnia is calico. 
-* The default value of Kubernetes Pod Network CIDR is 10.244.0.0/16. If 10.244.0.0/16 is already in use within your network, select a different Pod Network CIDR. For more information, see __https://docs.projectcalico.org/getting-started/kubernetes/quickstart__.
-
->> **NOTE**: If you want to view or edit the `omnia_config.yml` file, run the following command:  
-- `ansible-vault view omnia_config.yml --vault-password-file .omnia_vault_key` -- To view the file. 
-- `ansible-vault edit omnia_config.yml --vault-password-file .omnia_vault_key` -- To edit the file.
-
->> **NOTE**: It is suggested that you use the ansible-vault view or edit commands and that you do not use the ansible-vault decrypt or encrypt commands. If you have used the ansible-vault decrypt or encrypt commands, provide 644 permission to `omnia_config.yml`.  
-
-Omnia considers `slurm` as the default username for MariaDB.  
-
-## Kubernetes roles
-
-The following __kubernetes__ roles are provided by Omnia when __omnia.yml__ file is run:
-- __common__ role:
-	- Install common packages on manager and compute nodes
-	- Docker is installed
-	- Deploy time ntp/chrony
-	- Install Nvidia drivers and software components
-- **k8s_common** role: 
-	- Required Kubernetes packages are installed
-	- Starts the docker and Kubernetes services.
-- **k8s_manager** role: 
-	- __helm__ package for Kubernetes is installed.
-- **k8s_firewalld** role: This role is used to enable the required ports to be used by Kubernetes. 
-	- For __head-node-ports__: 6443,2379-2380,10251,10250,10252
-	- For __compute-node-ports__: 10250,30000-32767
-	- For __calico-udp-ports__: 4789
-	- For __calico-tcp-ports__: 5473,179
-	- For __flanel-udp-ports__: 8285,8472
-- **k8s_nfs_server_setup** role: 
-	- A __nfs-share__ directory, `/home/k8snfs`, is created. Using this directory, compute nodes share the common files.
-- **k8s_nfs_client_setup** role
-- **k8s_start_manager** role: 
-	- Runs the __/bin/kubeadm init__ command to initialize the Kubernetes services on manager node.
-	- Initialize the Kubernetes services in the manager node and create service account for Kubernetes Dashboard
-- **k8s_start_workers** role: 
-	- The compute nodes are initialized and joined to the Kubernetes cluster with the manager node. 
-- **k8s_start_services** role
-	- Kubernetes services are deployed such as Kubernetes Dashboard, Prometheus, MetalLB and NFS client provisioner
-
-
-* Whenever k8s_version, k8s_cni or k8s_pod_network_cidr needs to be modified after the HPC cluster is setup, the OS in the manager and compute nodes in the cluster must be re-flashed before executing omnia.yml again.
-* After Kubernetes is installed and configured, few Kubernetes and calico/flannel related ports are opened in the manager and compute nodes. This is required for Kubernetes Pod-to-Pod and Pod-to-Service communications. Calico/flannel provides a full networking stack for Kubernetes pods.
-* If Kubernetes Pods are unable to communicate with the servers (i.e., unable to access the Internet) when the DNS servers are not responding, then the Kubernetes Pod Network CIDR may be overlapping with the host network which is DNS issue. To resolve this issue:
-	1. Disable firewalld.service.
-	2. If the issue persists, then perform the following actions:  
-		a. Format the OS on manager and compute nodes.  
-		b. In the management station, edit the *omnia_config.yml* file to change the Kubernetes Pod Network CIDR or CNI value. Suggested IP range is 192.168.0.0/16 and ensure you provide an IP which is not in use in your host network.  
-		c. Execute `omnia.yml` and skip slurm using `--skip-tags slurm`.
-
-## Slurm roles
-
-The following __Slurm__ roles are provided by Omnia when __omnia.yml__ file is run:
-- **slurm_common** role:
-	- Installs the common packages on manager node and compute node.
-- **slurm_manager** role:
-	- Installs the packages only related to manager node
-	- This role also enables the required ports to be used by Slurm.  
-	    **tcp_ports**: 6817,6818,6819  
-		**udp_ports**: 6817,6818,6819
-	- Creating and updating the Slurm configuration files based on the manager node requirements.
-- **slurm_workers** role:
-	- Installs the Slurm packages into all compute nodes as per the compute node requirements.
-- **slurm_start_services** role: 
-	- Starting the Slurm services so that compute node communicates with manager node.
-- **slurm_exporter** role: 
-	- Slurm exporter is a package for exporting metrics collected from Slurm resource scheduling system to Prometheus.
-	- Slurm exporter is installed on the host like Slurm, and Slurm exporter will be successfully installed only if Slurm is installed.  
-
-## Login node roles
-To enable the login node, the *login_node_required* variable must be set to "true" in the *omnia_config.yml* file.  
-- **login_common** role: The firewall ports are opened on the manager and login nodes.  
-- **login_server** role: FreeIPA server is installed and configured on the manager node to provide authentication using LDAP and Kerberos principles.  
-- **login_node** role: For Rocky, FreeIPA client is installed and configured on the login node and is integrated with the server running on the manager node. For LeapOS, 389ds will be installed instead.
-
->>__Note:__ If LeapOS is being deployed, login_common and login_server roles will be skipped.  
-
->> **NOTE**: To skip the installation of:
->> * The login node-In the `omnia_config.yml` file, set the *login_node_required* variable to "false".  
->> * The FreeIPA server and client: Use `--skip-tags freeipa` while executing the *omnia.yml* file. 
-
-### Installing JupyterHub and Kubeflow playbooks  
-If you want to install JupyterHub and Kubeflow playbooks, you have to first install the JupyterHub playbook and then install the Kubeflow playbook.
-
-Commands to install JupyterHub and Kubeflow:
-* `ansible-playbook platforms/jupyterhub.yml -i inventory`
-* `ansible-playbook platforms/kubeflow.yml -i inventory`
-
->> __Note:__ When the Internet connectivity is unstable or slow, it may take more time to pull the images to create the Kubeflow containers. If the time limit is exceeded, the **Apply Kubeflow configurations** task may fail. To resolve this issue, you must redeploy Kubernetes cluster and reinstall Kubeflow by completing the following steps:
-* Format the OS on manager and compute nodes.
-* In the `omnia_config.yml` file, change the k8s_cni variable value from calico to flannel.
-* Run the Kubernetes and Kubeflow playbooks. 
-
-
-## Add a new compute node to the cluster
-
-To update the INVENTORY file present in `omnia` directory with the new node IP address under the compute group. Ensure the other nodes which are already a part of the cluster are also present in the compute group along with the new node. Then, run `omnia.yml` to add the new node to the cluster and update the configurations of the manager node.
-

+ 34 - 0
docs/Input_Parameter_Guide/Control_Plane_Parameters/Device_Parameters/TOR_Interface_Keys.md

@@ -0,0 +1,34 @@
+# Accepted Interface Keys on Top Of the Rack Switches
+
+Interface key name	|	Type	|	Description
+---------	|   ----	|	-----------
+desc	|	string	|	Configures a single line interface description
+portmode	|	string	|	Configures port mode according to the device type
+switchport	|	boolean: true, false*	|	Configures an interface in L2 mode
+admin	|	string: up, down*	|	Configures the administrative state for the interface; configuring the value as administratively "up" enables the interface; configuring the value as administratively "down" disables the interface
+mtu	|	integer	|	Configures the MTU size for L2 and L3 interfaces (1280 to 65535)
+speed	|	string: auto, 1000, 10000, 25000, ...	|	Configures the speed of the interface
+fanout	|	string: dual, single; string:10g-4x, 40g-1x, 25g-4x, 100g-1x, 50g-2x (os10)	|	Configures fanout to the appropriate value
+suppress_ra	|	string: present, absent	|	Configures IPv6 router advertisements if set to present
+ip_type_dynamic	|	boolean: true, false	|	Configures IP address DHCP if set to true (ip_and_mask is ignored if set to true)
+ipv6_type_dynamic	|	boolean: true, false	|	Configures an IPv6 address for DHCP if set to true (ipv6_and_mask is ignored if set to true)
+ipv6_autoconfig	|	boolean: true, false	|	Configures stateless configuration of IPv6 addresses if set to true (ipv6_and_mask is ignored if set to true)
+vrf	|	string	|	Configures the specified VRF to be associated to the interface
+min_ra	|	string	|	Configures RA minimum interval time period
+max_ra	|	string	|	Configures RA maximum interval time period
+ip_and_mask	|	string	|	Configures the specified IP address to the interface
+ipv6_and_mask	|	string	|	Configures a specified IPv6 address to the interface
+virtual_gateway_ip	|	string	|	Configures an anycast gateway IP address for a VXLAN virtual network as well as VLAN interfaces
+virtual_gateway_ipv6	|	string	|	Configures an anycast gateway IPv6 address for VLAN interfaces
+state_ipv6	|	string: absent, present*	|	Deletes the IPV6 address if set to absent
+ip_helper	|	list	|	Configures DHCP server address objects (see ip_helper.*)
+ip_helper.ip	|	string (required)	|	Configures the IPv4 address of the DHCP server (A.B.C.D format)
+ip_helper.state	|	string: absent, present*	|	Deletes the IP helper address if set to absent
+flowcontrol	|	dictionary	|	Configures the flowcontrol attribute (see flowcontrol.*)
+flowcontrol.mode	|	string: receive, transmit	|	Configures the flowcontrol mode
+flowcontrol.enable	|	string: on, off	|	Configures the flowcontrol mode on
+flowcontrol.state	|	string: absent, present	|	Deletes the flowcontrol if set to absent
+ipv6_bgp_unnum	|	dictionary	|	Configures the IPv6 BGP unnum attributes (see ipv6_bgp_unnum.*) below
+ipv6_bgp_unnum.state	|	string: absent, present*	|	Disables auto discovery of BGP unnumbered peer if set to absent
+ipv6_bgp_unnum.peergroup_type	|	string: ebgp, ibgp	|	Specifies the type of template to inherit from
+stp_rpvst_default_behaviour	|	boolean: false, true	|	Configures RPVST default behavior of BPDU's when set to True, which is default

文件差异内容过多而无法显示
+ 8 - 0
docs/Input_Parameter_Guide/Control_Plane_Parameters/Device_Parameters/ethernet_tor_vars.md


文件差异内容过多而无法显示
+ 10 - 0
docs/Input_Parameter_Guide/Control_Plane_Parameters/Device_Parameters/ethernet_vars.md


文件差异内容过多而无法显示
+ 10 - 0
docs/Input_Parameter_Guide/Control_Plane_Parameters/Device_Parameters/ib_vars.md


+ 12 - 0
docs/Input_Parameter_Guide/Control_Plane_Parameters/Device_Parameters/idrac_2fa.md

@@ -0,0 +1,12 @@
+# Parameters in `idrac_2fa.yml`
+This file is located in [/control_plane/tools](../../../control_plane/tools/idrac2fa_vars.yml)
+
+|	Variables</br> [Required if two_factor_authentication is enabled/ Optional]	|	Default, choices	|	Description
+----------------	|	-----------------	|	-----------------
+dns_domain_name</br> [Required]	|		|	DNS domain name to be set for iDRAC. 
+ipv4_static_dns1, ipv4_static_dns2</br> [Required] 	|		|	DNS1 and DNS2 static IPv4 addresses.
+smtp_server_ip</br> [Required]	|		|	Server IP address used for SMTP.
+use_email_address_2fa</br> [Required]	|		|	Email address used for enabling 2FA. After 2FA is enabled, an authentication code is sent to the provided email address. 
+smtp_authentication [Required]	| <ul> <li>__Disabled__</li> <li>Enabled </li> </ul> | Enable SMTP authentication 
+smtp_username</br> [Optional]	|		|	Username for SMTP.
+smtp_password</br> [Optional]	|		|	Password for SMTP.

+ 18 - 0
docs/Input_Parameter_Guide/Control_Plane_Parameters/Device_Parameters/idrac_ldap.md

@@ -0,0 +1,18 @@
+# Parameters in `idrac_ldap.yml`
+This file is located in [/control_plane/tools](../../../control_plane/tools/idrac_ldap.yml)
+
+|	Variables</br> [Required if ldap_directory_services is enabled/ Optional]	|	Default, choices	|	Description
+----------------	|	-----------------	|	-----------------
+cert_validation_enable</br> [Required]	|	<ul><li>**disabled**</li></ul>	|	This option will be disabled by default. If required, you must manually upload the CA certificate.
+ldap_server_address</br> [Required] 	|		|	Server address used for LDAP.
+ldap_port</br> [Required]	|	<ul><li>636</li></ul>	|	TCP port at which the LDAP server is listening for connections.
+bind_dn</br> [Optional]	|		|	Distinguished Name of the node in your directory tree from which records are searched.
+bind_password</br> [Optional]	|		|	Password used for "bind_dn".
+base_dn</br> [Required]	|		|	Distinguished Name of the search base.
+user_attribute</br> [Optional]	|		|	User attribute used for searching in LDAP server.
+group_attribute</br> [Optional]	|		|	Group attribute used for searching in LDAP server.
+group_attribute_is_dn</br> [Required]	|	<ul><li>**enabled**</li> <li>disabled</li></ul>	|	Specify whether the group attribute type is DN or not.
+search_filter</br> [Optional]	|		|	Search scope is related to the Base DN. 
+role_group1_dn</br> [Required]	|		|	DN of LDAP group to be added.
+role_group1_privilege</br> [Required]	|	<ul><li>**Administrator**</li><li>Operator</li><li>ReadOnly</li></ul>	|	Privilege to LDAP role group 1.  
+	

+ 25 - 0
docs/Input_Parameter_Guide/Control_Plane_Parameters/Device_Parameters/idrac_tools_vars.md

@@ -0,0 +1,25 @@
+# Parameters in `idrac_tools_vars.yml`
+This file is located in [/control_plane/input_params](../../../control_plane/input_params/idrac_tools_vars.yml)
+
+| Parameter                        	| Default/Accepted Values               | Additional Information                                                                                                                           	|
+|----------------------------------	|---------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------	|
+| dns_domain_name   [Required]     	|                                       |  * DNS domain name to set to   iDRAC                                                                                                             	|
+| ipv4_static_dns1 [Required]      	|                                       | IPV4 static DNS1                                                                                                                                 	|
+| ipv4_static_dns2 [Required]      	|                                       | IPV4 static DNS2                                                                                                                                 	|
+| smtp_server_ip [Required]        	|                                       | Server IP used for SMTP                                                                                                                          	|
+| use_email_address_2fa [Required] 	|                                       | Email address used for enabling 2FA                                                                                                              	|
+| smtp_authentication [Required]   	| **disabled**, enabled                 |  SMTP authentication disabled by   default <br> When enabled, ensure that `smtp_username` and   `smtp_password` is filled.                       	|
+| smtp_username                    	|                                       | Username used for SMTP                                                                                                                           	|
+| smtp_password                    	|                                       | Password   used for SMTP                                                                                                                         	|
+| cert_validation_enable           	| **disabled**, enabled                 | CA certification validation value <br> If required user has to   manually upload CA certificate after `idrac_ldap.yml` execution.                	|
+| ldap_server_address [Required]   	|                                       | Sever address used for LDAP                                                                                                                      	|
+| ldap_port                        	| 636                                   | TCP port port at which the LDAP server is listening for connections   <br> Default Port for LDAP: 389 <br> Default Port for LDAP over   SSL: 636 	|
+| bind_dn                          	|                                       | Distinguished Name of the node in your directory tree from which to start   searching for records                                                	|
+| bind_password                    	|                                       | Password used for bind_dn                                                                                                                        	|
+| base_dn [Required]               	|                                       | The distinguished name of the search base.                                                                                                       	|
+| user_attribute                   	|                                       | User attribute used for search in LDAP server                                                                                                    	|
+| group_attribute                  	|                                       | Group   attribute used for search in LDAP server                                                                                                 	|
+| group_attribute_is_dn            	| **disabled**, enabled                 |  Specify whether the group   attribute type is DN or not                                                                                         	|
+| search_filter                    	|                                       | The search scope defines how LDAP will search for your objects.                                                                                  	|
+| role_group1_dn [Required]        	|                                       | DN of LDAP group be to added                                                                                                                     	|
+| role_group1_privilege [Required] 	| **Administrator**, Operator, ReadOnly |  Privelege to LDAP role group 1                                                                                                                  	|

+ 12 - 0
docs/Input_Parameter_Guide/Control_Plane_Parameters/Device_Parameters/idrac_vars.md

@@ -0,0 +1,12 @@
+# Parameters in `idrac_vars.yml`
+This file is located in [/control_plane/input_params](../../../control_plane/input_params/idrac_vars.yml)
+
+|	Variables</br> [Required/ Optional]	|	Default, choices	|	Description
+----------------	|	-----------------	|	-----------------
+idrac_system_profile</br> [Required]	|	<ul><li>**Performance**</li> <li>PerformancePerWatt(DAPC)</li> <li>PerformancePerWatt(OS)</li> <li>WorkstationPerformance</li></ul>	|	The system profile used for BIOS configuration. 
+firmware_update_required</br> [Required]	|	<ul><li>**false**</li> <li>true</li></ul>	|	By default, Omnia updates the firmware on the servers. To disable the firmware update, set the variable to "false".
+poweredge_model</br> [Required if "firmware_update_required" is set to "true"]	|	<ul><li>**C6420**</li> <li>R640</li><li>R740</li><li>C4140</li> <li>And other supported PowerEdge servers</li></ul>	|	Enter the required PowerEdge server models to update the firmware. For example, enter `R640,R740,C4140` to update firmware on these models of PowerEdge servers. For a complete list of supported PowerEdge servers, see the *Hardware managed by Omnia* section in the Readme file.
+uefi_secure_boot</br> [Optional]	|	<ul><li>**disabled**</li> <li>enabled</li></ul>	|	Option to enable or disable the secure boot mode.
+system_lockdown</br> [Optional]	|	<ul><li>**disabled**</li> <li>enabled</li></ul>	|	Option to enable or disable system lockdown.
+two_factor_authentication</br> [Optional]	|	<ul><li>**disabled**</li> <li>enabled</li></ul>	|	Option to enable or disable the 2FA on iDRAC.</br> If enabled, update the required variables in the `idrac_tools_vars.yml` file.</br> **[WARNING]**: For the other iDRAC playbooks to run, you must manually disable 2FA by setting the *Easy 2FA State* to "Disabled" in the iDRAC settings.
+ldap_directory_services</br> [Optional]	|	<ul><li>**disabled**</li> <li>enabled</li></ul>	|	Option to enable or disable the LDAP directory services on iDRAC.</br> If enabled, update the required variables in the `idrac_tools_vars.yml` file.

文件差异内容过多而无法显示
+ 19 - 0
docs/Input_Parameter_Guide/Control_Plane_Parameters/Device_Parameters/powervault_me4_vars.md


文件差异内容过多而无法显示
+ 10 - 355
docs/INSTALL_OMNIA_CONTROL_PLANE.md


+ 21 - 0
docs/Input_Parameter_Guide/Control_Plane_Parameters/login_vars.md

@@ -0,0 +1,21 @@
+# Parameters in `login_vars.yml`
+`login_vars.yml` contains the credentials of the supported devices. This file will be encrypted using ansible vault.
+This file is located in [/control_plane/input_params](../../../control_plane/input_params/login_vars.yml)
+
+| Parameter                       | Default, Accepted values | Additional Information                                                                                                                                                                                                                        |
+|---------------------------------|--------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| provision_password   [Required] |                          | Password used when deploying the OS on bare metal servers. <br>   Minimum Length: 8 characters <br> Forbidden Characters:  -,\, ',"                                                                                                           |
+| cobbler_password   [Required]   |                          | Password used to authenticate cobbler <br> Minimum Length: 8   characters <br> Forbidden Characters:    -,\, ',"                                                                                                                              |
+| idrac_username   [Optional]     |                          | Username used to authenticate iDRAC    <br> Minimum Length: 8 characters <br> Forbidden   Characters:  -,\, ',"                                                                                                                               |
+| idrac_password   [Optional]     |                          | Password used to authenticate iDRAC <br> Forbidden Characters:  -,\, '," <br> This parameter is   required if `idrac_support` is true.                                                                                                        |
+| awx_password                    |                          | Password used to authenticate AWX    <br> Minimum Length: 8 characters <br> Forbidden   Characters:  -,\, ',"                                                                                                                                 |
+| grafana_username                |                          | Username used to authenticate grafana    <br> Minimum Length: 5 characters <br> Forbidden   Characters:  -,\, ',"                                                                                                                             |
+| grafana_password                |                          | Password used to authenticate grafana    <br> Minimum Length: 5 characters <br> Forbidden   Characters:  -,\, ',"  <br> Do not set this parameter to   'admin'                                                                                |
+| ethernet_switch_username        |                          | Username used to login to the Ethernet Switch  <br> Forbidden Characters:  -,\, ',"                                                                                                                                                           |
+| ethernet_switch_password        |                          | Password used to login to the Ethernet Switch  <br> Forbidden Characters:  -,\, ',"                                                                                                                                                           |
+| ib_username                     |                          | Username used to login to the Infiniband Switch  <br> Forbidden Characters:  -,\, ',"                                                                                                                                                         |
+| ib_password                     |                          | Password used to login to the Infiniband Switch  <br> Forbidden Characters:  -,\, ',"                                                                                                                                                         |
+| powervault_me4_username         |                          | Username used to login to the PowerVault    <br> Forbidden Characters:    -,\, ',"                                                                                                                                                            |
+| powervault_me4_password         |                          | Password used to login to the PowerVault    <br> Forbidden Characters:    -,\, ',"                                                                                                                                                            |
+| ms_directory_manager_password   |                          | Password to authenticate Admin level access to the directory for system   management tasks and will be added to the instance of directory server   created for IPA. <br> Minimum Length: 8 characters <br> Forbidden   Characters:  -,\, ',"  |
+| ms_kerberos_admin_password      |                          | Password authenticating the 'admin' account on the IPA server. If 389ds   is in use, this field authenticates the Kerberos Admin.                                                                                                             |

文件差异内容过多而无法显示
+ 159 - 0
docs/Input_Parameter_Guide/Control_Plane_Parameters/opensm_conf.md


文件差异内容过多而无法显示
+ 5 - 115
docs/Security/ENABLE_SECURITY_MANAGEMENT_STATION.md


+ 11 - 0
docs/Input_Parameter_Guide/Telemetry_Visualization_Parameters/telemetry_base_vars.md

@@ -0,0 +1,11 @@
+# Parameters in `telemetry_base_vars.yml`
+
+Before running `telemetry.yml`, ensure that the files in `/telemetry/input_params/` are filled in.
+
+
+| Parameter Name          | Default Value     | Information |
+|-------------------------|-------------------|-------------|
+| idrac_telemetry_support | true              | This variable is used to enable iDRAC telemetry support and visualizations. Accepted Values: true/false            |
+| slurm_telemetry_support | true              | This variable is used to enable slurm telemetry support and visualizations. Slurm Telemetry support can only be activated when idrac_telemetry_support is set to true. Accepted Values: True/False.        |
+| timescaledb_name        | telemetry_metrics | Postgres DB with timescale extension is used for storing iDRAC and slurm telemetry metrics.            |
+| mysqldb_name			  | idrac_telemetrysource_services_db | MySQL DB is used to store IPs and credentials of iDRACs having datacenter license           |

+ 11 - 0
docs/Input_Parameter_Guide/Telemetry_Visualization_Parameters/telemetry_login_vars.md

@@ -0,0 +1,11 @@
+# Parameters in `telemetry_login_vars.yml`
+Before running `telemetry.yml`, ensure that the files in `/telemetry/input_params/` are filled in.
+
+
+| Parameter Name        | Default Value | Information |
+|-----------------------|---------------|-------------|
+| timescaledb_user      | 		        |  Username used for connecting to timescale db. Minimum Length: 2 characters. <br> The username must not contain -,\, ',"         |
+| timescaledb_password  | 		        |  Password used for connecting to timescale db. Minimum Length: 2 characters. <br> The password must not contain -,\, ',",@          |
+| mysqldb_user          | 		        |  Username used for connecting to mysql db. Minimum Length: 2 characters. <br>  The username must not contain -,\, ',"       |
+| mysqldb_password      | 		        |  Password used for connecting to mysql db. Minimum Length: 2 characters. <br> The password must not contain -,\, ',"          |
+| mysqldb_root_password | 		        |  Password used for connecting to mysql db for root user. Minimum Legth: 2 characters. <br> The password must not contain -,\, ',"        |

+ 19 - 0
docs/Input_Parameter_Guide/omnia_config.md

@@ -0,0 +1,19 @@
+# Parameters in `omnia_config.yml`
+`omnia_config.yml` contains multiple configuration parameters.
+
+| Parameter Name             | Default Value | Additional Information                                                                                                                                                                                                                               |
+|----------------------------|---------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| mariadb_password           | password      | Password used to access the Slurm database. <br> Required Length: 8   characters <br> The password must not contain -,\, ',"                                                                                                                         |
+| k8s_version                | 1.16.7        | Kuberenetes Version <br> Accepted Values: "1.16.7" or   "1.19.3"                                                                                                                                                                                     |
+| k8s_cni                    | calico        | CNI type used by Kuberenetes. <br> Accepted values: calico, flannel                                                                                                                                                                                  |
+| k8s_pod_network_cidr       | 10.244.0.0/16 | Kubernetes pod network CIDR                                                                                                                                                                                                                          |
+| docker_username            |               | Username to login to Docker. A kubernetes secret will be created and   patched to the service account in default namespace. <br> This value is   optional but suggested to avoid docker pull limit issues                                            |
+| docker_password            |               | Password to login to Docker <br> This value is mandatory if a   docker_username is provided                                                                                                                                                          |
+| ansible_config_file_path   | /etc/ansible  | Path where the ansible.cfg file can be found. <br> If `dnf` is   used, the default value is valid. If `pip` is used, the variable must be set   manually                                                                                             |
+| login_node_required        | TRUE          | Boolean indicating whether the login node is required or not                                                                                                                                                                                         |
+| domain_name                | omnia.test    | Sets the intended domain name                                                                                                                                                                                                                        |
+| realm_name                 | OMNIA.TEST    | Sets the intended realm name                                                                                                                                                                                                                         |
+| directory_manager_password |               | Password authenticating admin level access to the Directory for system   management tasks. It will be added to the instance of directory server   created for IPA. <br> Required Length: 8 characters. <br> The   password must not contain -,\, '," |
+| kerberos_admin_password    |               | "admin" user password for the IPA server on RockyOS. If LeapOS is in use, it is used as the "kerberos admin" user password for 389-ds <br> This field is not relevant to Control Planes running `LeapOS`                                                                                                                                                                                                                            |
+| enable_secure_login_node   |  **false**, true             | Boolean value deciding whether security features are enabled on the Login Node. For more information, see [here](docs/Security/Enable_Security_LoginNode.md).                                                                                                                                                                                                                           |
+	

+ 151 - 0
docs/Installation_Guides/INSTALL_OMNIA_CLI.md

@@ -0,0 +1,151 @@
+# Install Omnia
+
+The following sections provide details on installing `omnia.yml` using CLI.  
+
+To install the Omnia control plane and manage workloads on your cluster using the Omnia control plane, see [Install the Omnia Control Plane](INSTALL_OMNIA_CONTROL_PLANE.md) and [Monitor Kubernetes and Slurm](MONITOR_CLUSTERS.md) for more information.
+
+## Steps to install Omnia using CLI
+
+1. Clone the Omnia repository:
+``` 
+git clone https://github.com/dellhpc/omnia.git 
+```  
+
+<!---
+From release branch: 
+``` 
+git clone -b release https://github.com/dellhpc/omnia.git 
+```-->  
+
+>> __Note:__ After the Omnia repository is cloned, a folder named __omnia__ is created. Ensure that you do not rename this folder.
+
+2. Change the directory to __omnia__: `cd omnia`
+
+3. In the `omnia_config.yml` file, provide the required details (Check the [parameter guide](../Input_Parameter_Guide/omnia_config.md) for more information).
+>> __Note:__  Without the login node, Slurm jobs can be scheduled only through the manager node.
+
+4. Create an inventory file in the *omnia* folder. Add login node IP address under the *[login_node]* group, manager node IP address under the *[manager]* group, compute node IP addresses under the *[compute]* group, and NFS node IP address under the *[nfs_node]* group. A template file named INVENTORY is provided in the *omnia\docs* folder.  
+>>	**Note**: Ensure that all the four groups (login_node, manager, compute, nfs_node) are present in the template, even if the IP addresses are not updated under login_node and nfs_node groups. 
+
+5. To install Omnia:
+
+| Leap OS                     	| CentOS, Rocky                                             	|
+|-----------------------------	|-----------------------------------------------------------	|
+| `ansible-playbook omnia.yml -i inventory -e 'ansible_python_interpreter=/usr/bin/python3'`   	| `ansible-playbook omnia.yml -i inventory`	|
+		
+
+
+6. By default, no skip tags are selected, and both Kubernetes and Slurm will be deployed.  
+
+	To skip the installation of Kubernetes, enter:  
+	`ansible-playbook omnia.yml -i inventory --skip-tags "kubernetes"` 
+	
+	To skip the installation of Slurm, enter:  
+	`ansible-playbook omnia.yml -i inventory --skip-tags "slurm"`  
+
+	To skip the NFS client setup, enter the following command to skip the k8s_nfs_client_setup role of Kubernetes:  
+	`ansible-playbook omnia.yml -i inventory --skip-tags "nfs_client"`
+
+	The default path of the Ansible configuration file is `/etc/ansible/`. If the file is not present in the default path, then edit the `ansible_config_file_path` variable to update the configuration path.
+
+7. To provide passwords for mariaDB Database (for Slurm accounting), Kubernetes Pod Network CIDR, and Kubernetes CNI, edit the `omnia_config.yml` file.  
+>> __Note:__ 
+* Supported values for Kubernetes CNI are calico and flannel. The default value of CNI considered by Omnia is calico. 
+* The default value of Kubernetes Pod Network CIDR is 10.244.0.0/16. If 10.244.0.0/16 is already in use within your network, select a different Pod Network CIDR. For more information, see __https://docs.projectcalico.org/getting-started/kubernetes/quickstart__.
+
+>> **Note**: If you want to view or edit the `omnia_config.yml` file, run the following command:  
+- `ansible-vault view omnia_config.yml --vault-password-file .omnia_vault_key` -- To view the file. 
+- `ansible-vault edit omnia_config.yml --vault-password-file .omnia_vault_key` -- To edit the file.
+
+>> **Note**: It is suggested that you use the ansible-vault view or edit commands and that you do not use the ansible-vault decrypt or encrypt commands. If you have used the ansible-vault decrypt or encrypt commands, provide 644 permission to `omnia_config.yml`.  
+
+Omnia considers `slurm` as the default username for MariaDB.  
+
+## Kubernetes roles
+
+The following __kubernetes__ roles are provided by Omnia when __omnia.yml__ file is run:
+- __common__ role:
+	- Install common packages on manager and compute nodes
+	- Docker is installed
+	- Deploy time ntp/chrony
+	- Install Nvidia drivers and software components
+- **k8s_common** role: 
+	- Required Kubernetes packages are installed
+	- Starts the docker and Kubernetes services.
+- **k8s_manager** role: 
+	- __helm__ package for Kubernetes is installed.
+- **k8s_firewalld** role: This role is used to enable the required ports to be used by Kubernetes. 
+	- For __head-node-ports__: 6443,2379-2380,10251,10250,10252
+	- For __compute-node-ports__: 10250,30000-32767
+	- For __calico-udp-ports__: 4789
+	- For __calico-tcp-ports__: 5473,179
+	- For __flanel-udp-ports__: 8285,8472
+- **k8s_nfs_server_setup** role: 
+	- A __nfs-share__ directory, `/home/k8snfs`, is created. Using this directory, compute nodes share the common files.
+- **k8s_nfs_client_setup** role
+- **k8s_start_manager** role: 
+	- Runs the __/bin/kubeadm init__ command to initialize the Kubernetes services on manager node.
+	- Initialize the Kubernetes services in the manager node and create service account for Kubernetes Dashboard
+- **k8s_start_workers** role: 
+	- The compute nodes are initialized and joined to the Kubernetes cluster with the manager node. 
+- **k8s_start_services** role
+	- Kubernetes' services are deployed such as Kubernetes Dashboard, Prometheus, MetalLB and NFS client provisioner
+
+
+* Whenever k8s_version, k8s_cni or k8s_pod_network_cidr needs to be modified after the HPC cluster is setup, the OS in the manager and compute nodes in the cluster must be re-flashed before executing omnia.yml again.
+* After Kubernetes is installed and configured, few Kubernetes and calico/flannel related ports are opened in the manager and compute nodes. This is required for Kubernetes Pod-to-Pod and Pod-to-Service communications. Calico/flannel provides a full networking stack for Kubernetes pods.
+* If Kubernetes Pods are unable to communicate with the servers (i.e., unable to access the Internet) when the DNS servers are not responding, then the Kubernetes Pod Network CIDR may be overlapping with the host network which is DNS issue. To resolve this issue:
+	1. Disable firewalld.service.
+	2. If the issue persists, then perform the following actions:  
+		a. Format the OS on manager and compute nodes.  
+		b. In the control plane, edit the *omnia_config.yml* file to change the Kubernetes Pod Network CIDR or CNI value. Suggested IP range is 192.168.0.0/16 and ensure you provide an IP which is not in use in your host network.  
+		c. Execute `omnia.yml` and skip slurm using `--skip-tags slurm`.
+
+## Slurm roles
+
+The following __Slurm__ roles are provided by Omnia when __omnia.yml__ file is run:
+- **slurm_common** role:
+	- Installs the common packages on manager node and compute node.
+- **slurm_manager** role:
+	- Installs the packages only related to manager node
+	- This role also enables the required ports to be used by Slurm.  
+	    **tcp_ports**: 6817,6818,6819  
+		**udp_ports**: 6817,6818,6819
+	- Creating and updating the Slurm configuration files based on the manager node requirements.
+- **slurm_workers** role:
+	- Installs the Slurm packages into all compute nodes as per the compute node requirements.
+- **slurm_start_services** role: 
+	- Starting the Slurm services so that compute node communicates with manager node.
+- **slurm_exporter** role: 
+	- Slurm exporter is a package for exporting metrics collected from Slurm resource scheduling system to Prometheus.
+	- Slurm exporter is installed on the host like Slurm, and Slurm exporter will be successfully installed only if Slurm is installed.  
+
+## Login node roles
+To enable the login node, the *login_node_required* variable must be set to "true" in the *omnia_config.yml* file.  
+- **login_common** role: The firewall ports are opened on the manager and login nodes.  
+- **login_server** role: FreeIPA server is installed and configured on the manager node to provide authentication using LDAP and Kerberos principles.  
+- **login_node** role: For Rocky, FreeIPA client is installed and configured on the login node and is integrated with the server running on the manager node. For LeapOS, 389ds will be installed instead.
+
+>>__Note:__ If LeapOS is being deployed, login_common and login_server roles will be skipped.  
+
+>> **Note**: To skip the installation of:
+>> * The login node-In the `omnia_config.yml` file, set the *login_node_required* variable to "false".  
+>> * The FreeIPA server and client: Use `--skip-tags freeipa` while executing the *omnia.yml* file. 
+
+### Installing JupyterHub and Kubeflow playbooks  
+If you want to install JupyterHub and Kubeflow playbooks, you have to first install the JupyterHub playbook and then install the Kubeflow playbook.
+
+Commands to install JupyterHub and Kubeflow:
+* `ansible-playbook platforms/jupyterhub.yml -i inventory`
+* `ansible-playbook platforms/kubeflow.yml -i inventory`
+
+>> __Note:__ When the Internet connectivity is unstable or slow, it may take more time to pull the images to create the Kubeflow containers. If the time limit is exceeded, the **Apply Kubeflow configurations** task may fail. To resolve this issue, you must redeploy Kubernetes cluster and reinstall Kubeflow by completing the following steps:
+* Format the OS on manager and compute nodes.
+* In the `omnia_config.yml` file, change the k8s_cni variable value from calico to flannel.
+* Run the Kubernetes and Kubeflow playbooks. 
+
+
+## Add a new compute node to the cluster
+
+To update the INVENTORY file present in `omnia` directory with the new node IP address under the compute group. Ensure the other nodes which are already a part of the cluster are also present in the compute group along with the new node. Then, run `omnia.yml` to add the new node to the cluster and update the configurations of the manager node.
+

文件差异内容过多而无法显示
+ 257 - 0
docs/Installation_Guides/INSTALL_OMNIA_CONTROL_PLANE.md


+ 13 - 0
docs/Installation_Guides/INSTALL_TELEMETRY.md

@@ -0,0 +1,13 @@
+# Installing Telemetry
+1. Ensure that all required [pre-requisites](../PreRequisites/Telemetry_Visualization_PreReqs.md) and [input parameters](../Input_Parameter_Guide/Telemetry_Visualization_Parameters) are entered for telemetry.
+
+2. Once `control_plane.yml` and `omnia.yml` are executed, run the following commands from `omnia/telemetry`:
+
+`ansible-playbook telemetry.yml`
+
+>> __Note:__ Telemetry Collection is only initiated on iDRACs on AWX that have a datacenter license and are running a firmware version of 4 or higher.
+
+## Adding a New Node to Telemetry
+After initiation, new nodes can be added to telemetry by running the following commands from `omnia/telemetry`:
+		
+`ansible-playbook add_idrac_node.yml`

+ 19 - 0
docs/LIMITATIONS.md

@@ -0,0 +1,19 @@
+# Limitations
+* Once `control_plane.yml` is used to configure devices, it is recommended to avoid rebooting the control plane.
+* If the control plane reboots, DHCP services restart. Devices that have had their IP assigned dynamically via DHCP may get assigned new IPs. This in turn can cause duplicate entries for the same device on AWX. Clusters may also show inconsistency and ambiguity.
+* Removal of Slurm and Kubernetes component roles are not supported. However, skip tags can be provided at the start of installation to select the component roles.
+* After installing the Omnia control plane, changing the manager node is not supported. If you need to change the manager node, you must redeploy the entire cluster.
+* Dell Technologies provides support to the Dell-developed modules of Omnia. All the other third-party tools deployed by Omnia are outside the support scope.
+* To change the Kubernetes single node cluster to a multi-node cluster or change a multi-node cluster to a single node cluster, you must either redeploy the entire cluster or run `kubeadm reset -f` on all the nodes of the cluster. You then need to run the `omnia.yml` file and skip the installation of Slurm using the skip tags.
+* In a single node cluster, the login node and Slurm functionalities are not applicable. However, Omnia installs FreeIPA Server and Slurm on the single node.
+* To change the Kubernetes version from 1.16 to 1.19 or 1.19 to 1.16, you must redeploy the entire cluster.
+* The Kubernetes pods will not be able to access the Internet or start when firewalld is enabled on the node. This is a limitation in Kubernetes. So, the firewalld daemon will be disabled on all the nodes as part of omnia.yml execution.
+* Only one storage instance (Powervault) is currently supported in the HPC cluster.
+* Cobbler web support has been discontinued from Omnia 1.2 onwards.
+* Configuration of storage devices with boss cards is not supported.
+* Shared LOM (LAN on Motherboard) architecture is not supported.
+* Omnia supports only basic telemetry configurations. Changing data fetching time intervals for telemetry is not supported.
+* Slurm cluster metrics will only be fetched from clusters configured by Omnia via AWX.
+* All iDRACs must have the same username and password.
+* OpenSUSE Leap 15.3 is not supported on the Control Plane.
+* Slurm Telemetry is supported only on a single cluster.

+ 30 - 30
docs/MONITOR_CLUSTERS.md

@@ -1,10 +1,10 @@
 # Monitor Kubernetes and Slurm
-Omnia provides playbooks to configure additional software components for Kubernetes such as JupyterHub and Kubeflow. For workload management (submitting, conrolling, and managing jobs) of HPC, AI, and Data Analytics clusters, you can access Kubernetes and Slurm dashboards and other supported applications. 
+Omnia provides playbooks to configure additional software components for Kubernetes such as JupyterHub and Kubeflow. For workload management (submitting, controlling, and managing jobs) of HPC, AI, and Data Analytics clusters, you can access Kubernetes and Slurm dashboards and other supported applications. 
 
 ## Before accessing the dashboards
-To access any of the dashboards, ensure that a compatible web browser is installed. If you are connecting remotely to your Linux server by using MobaXterm version later than 8 or other X11 Clients though *ssh*, follow the below mentioned steps to launch the Firefox Browser:  
-* On the management station:
-	1. Connect using *ssh*. Run `ssh <user>@<IP-address>`, where *IP-address* is the private IP of the management station.
+To access any of the dashboards, ensure that a compatible web browser is installed. If you are connecting remotely to your Linux server by using MobaXterm version later than 8 or other X11 Clients though *ssh*, follow the below-mentioned steps to launch the Firefox Browser:  
+* On the control plane:
+	1. Connect using *ssh*. Run `ssh <user>@<IP-address>`, where *IP-address* is the private IP of the control plane.
 	2. `dnf install mesa-libGL-devel -y`
 	3. `dnf install firefox -y`
 	4. `dnf install xorg-x11-xauth`
@@ -20,10 +20,10 @@ To access any of the dashboards, ensure that a compatible web browser is install
 	5. `logout and login back`
 	6. To launch Firefox from terminal, run `firefox&`
 
->> **NOTE**: When the PuTTY or MobaXterm session ends, you must run **export DISPLAY=:10.0** command each time, else Firefox cannot be launched again.  
+>> **Note**: When the PuTTY or MobaXterm session ends, you must run **export DISPLAY=:10.0** command each time, else Firefox cannot be launched again.  
 
 ## Access FreeIPA Dashboard  
-The FreeIPA Dashboard can be accessed from the management station, manager, and login nodes. To access the dashboard:
+The FreeIPA Dashboard can be accessed from the control plane, manager, and login nodes. To access the dashboard:
 1.	Install the Firefox Browser.
 2.	Open the Firefox Browser and enter the url: `https://<hostname>`. For example, enter `https://manager.example.com`.
 3.	Enter the username and password. If the admin or user has obtained a Kerberos ticket, then the credentials need not be provided.  
@@ -34,14 +34,14 @@ The FreeIPA Dashboard can be accessed from the management station, manager, and
 
 An administrator can create users on the login node using FreeIPA. The users will be prompted to change the passwords upon first login.
 
-## Access Kuberentes Dashboard
+## Access Kubernetes Dashboard
 1. To verify if the **Kubernetes-dashboard** service is in the Running state, run `kubectl get pods --namespace kubernetes-dashboard`.
 2. To start the Kubernetes dashboard, run `kubectl proxy`.
 3. To retrieve the encrypted token, run `kubectl get secret -n kubernetes-dashboard $(kubectl get serviceaccount admin-user -n kubernetes-dashboard -o jsonpath="{.secrets[0].name}") -o jsonpath="{.data.token}" | base64 --decode`.
 4. Copy the encrypted token value.
-5. On a web browser on the management station (for control_plane.yml) or manager node (for omnia.yml) enter http://localhost:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy/.
+5. On a web browser on the control plane (for control_plane.yml) or manager node (for omnia.yml) enter http://localhost:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy/.
 6. Select the authentication method as __Token__.
-7. On the Kuberenetes Dashboard, paste the copied encrypted token and click **Sign in** to access the Kubernetes Dashboard.
+7. On the Kubernetes Dashboard, paste the copied encrypted token and click **Sign in** to access the Kubernetes Dashboard.
 
 ## Access Kubeflow Dashboard
 1. Before accessing the Kubeflow Dashboard, run `kubectl -n kubeflow get applications -o yaml profiles`. Wait till **profiles-deployment** enters the Ready state.
@@ -97,9 +97,9 @@ __Note:__
 
 ## Accessing Cluster metrics (fetched by Prometheus) on Grafana 
 
-* Once `control_plane.yml` is run, Prometheus is added to Grafana as a datasource (hpc-prometheus). This allows Grafana to display statistics from the Compute Nodes that have been polled using Prometheus on the Management Station.
+* Once `control_plane.yml` is run, Prometheus is added to Grafana as a datasource (hpc-prometheus). This allows Grafana to display statistics from the Compute Nodes that have been polled using Prometheus on the Control Plane.
 
-* Select the dashboard (![Dashboard Icon](Telemetry_Visualization/Images/DashBoardIcon.PNG)) tab to view the list of Prometheus based dashboards. Some default dashboards include CoreDNS, Prometheus Overview, Kuberenetes Networking etc.
+* Select the dashboard (![Dashboard Icon](../Telemetry_Visualization/Images/DashBoardIcon.PNG)) tab to view the list of Prometheus based dashboards. Some default dashboards include CoreDNS, Prometheus Overview, Kubernetes Networking etc.
 
 >> __Note:__ Both the control plane and HPC clusters can be monitored on these dashboards by toggling the datasource at the top of each dashboard. 
 
@@ -107,19 +107,19 @@ __Note:__
 
 * Once `control_plane.yml` is run, Prometheus is added to Grafana as a datasource. This allows Grafana to display statistics from the Control Plane that have been polled using Prometheus.
 
-![Prometheus DataSource](Telemetry_Visualization/Images/Prometheus_DataSource.jpg)
+![Prometheus DataSource](../Telemetry_Visualization/Images/Prometheus_DataSource.jpg)
 
-* Select the dashboard (![Dashboard Icon](Telemetry_Visualization/Images/DashBoardIcon.PNG)) tab to view the list of Prometheus based dashboards. Some default dashboards include CoreDNS, Prometheus Overview, Kuberenetes Networking etc.
+* Select the dashboard (![Dashboard Icon](../Telemetry_Visualization/Images/DashBoardIcon.PNG)) tab to view the list of Prometheus based dashboards. Some default dashboards include CoreDNS, Prometheus Overview, Kubernetes Networking etc.
 
 >> __Note:__ Both the control plane and HPC clusters can be monitored on these dashboards by toggling the datasource at the top of each dashboard:
 
 | Data Source | Description | Source |
 |-------------|-------------|--------|
-|  hpc-prometheus-manager-nodeIP            | Manages the Kuberenetes and Slurm Cluster on the Manager and Compute nodes.            |  This datasource is set up when `Omnia.yml` is run.      |
-| control_plane_prometheus            | Monitors the Single Node cluster running on the Management Station            | This datasource is set up when `control_plane.yml` is run.        |
+|  hpc-prometheus-manager-nodeIP            | Manages the Kubernetes and Slurm Cluster on the Manager and Compute nodes.            |  This datasource is set up when `Omnia.yml` is run.      |
+| control_plane_prometheus            | Monitors the Single Node cluster running on the Control Plane            | This datasource is set up when `control_plane.yml` is run.        |
 
 
-![Prometheus DataSource](Telemetry_Visualization/Images/Prometheus_Dashboard.jpg)
+![Prometheus DataSource](../Telemetry_Visualization/Images/Prometheus_Dashboard.jpg)
 
 
 
@@ -127,20 +127,20 @@ __Note:__
 | Type        | Subtype           | Dashboard Name                    | Available DataSources                               |
 |-------------|-------------------|-----------------------------------|-----------------------------------------------------|
 |             |                   | CoreDNS                           | control-plane-prometheus, hpc-prometheus-manager-nodeIP |
-| Kuberenetes |                   | API Types                         | control-plane-prometheus, hpc-prometheus-manager-nodeIP |
-| Kuberenetes | Compute Resources | Cluster                           | control-plane-prometheus, hpc-prometheus-manager-nodeIP |
-| Kuberenetes | Compute Resources | Namespace (Pods)                  | control-plane-prometheus, hpc-prometheus-manager-nodeIP |
-| Kuberenetes | Compute Resources | Node (Pods)                       | control-plane-prometheus, hpc-prometheus-manager-nodeIP |
-| Kuberenetes | Compute Resources | Pod                               | control-plane-prometheus, hpc-prometheus-manager-nodeIP |
-| Kuberenetes | Compute Resources | Workload                          | control-plane-prometheus, hpc-prometheus-manager-nodeIP |
-| Kuberenetes |                   | Kubelet                           | control-plane-prometheus, hpc-prometheus-manager-nodeIP |
-| Kuberenetes | Networking        | Cluster                           | control-plane-prometheus, hpc-prometheus-manager-nodeIP |
-| Kuberenetes | Networking        | Namespace (Pods)                  | control-plane-prometheus, hpc-prometheus-manager-nodeIP |
-| Kuberenetes | Networking        | Namespace (Workload)              | control-plane-prometheus, hpc-prometheus-manager-nodeIP |
-| Kuberenetes | Networking        | Pod                               | control-plane-prometheus, hpc-prometheus-manager-nodeIP |
-| Kuberenetes | Networking        | Workload                          | control-plane-prometheus, hpc-prometheus-manager-nodeIP |
-| Kuberenetes |                   | Scheduler                         | control-plane-prometheus, hpc-prometheus-manager-nodeIP |
-| Kuberenetes |                   | Stateful Sets                     | control-plane-prometheus, hpc-prometheus-manager-nodeIP |
+| Kubernetes |                   | API Types                         | control-plane-prometheus, hpc-prometheus-manager-nodeIP |
+| Kubernetes | Compute Resources | Cluster                           | control-plane-prometheus, hpc-prometheus-manager-nodeIP |
+| Kubernetes | Compute Resources | Namespace (Pods)                  | control-plane-prometheus, hpc-prometheus-manager-nodeIP |
+| Kubernetes | Compute Resources | Node (Pods)                       | control-plane-prometheus, hpc-prometheus-manager-nodeIP |
+| Kubernetes | Compute Resources | Pod                               | control-plane-prometheus, hpc-prometheus-manager-nodeIP |
+| Kubernetes | Compute Resources | Workload                          | control-plane-prometheus, hpc-prometheus-manager-nodeIP |
+| Kubernetes |                   | Kubelet                           | control-plane-prometheus, hpc-prometheus-manager-nodeIP |
+| Kubernetes | Networking        | Cluster                           | control-plane-prometheus, hpc-prometheus-manager-nodeIP |
+| Kubernetes | Networking        | Namespace (Pods)                  | control-plane-prometheus, hpc-prometheus-manager-nodeIP |
+| Kubernetes | Networking        | Namespace (Workload)              | control-plane-prometheus, hpc-prometheus-manager-nodeIP |
+| Kubernetes | Networking        | Pod                               | control-plane-prometheus, hpc-prometheus-manager-nodeIP |
+| Kubernetes | Networking        | Workload                          | control-plane-prometheus, hpc-prometheus-manager-nodeIP |
+| Kubernetes |                   | Scheduler                         | control-plane-prometheus, hpc-prometheus-manager-nodeIP |
+| Kubernetes |                   | Stateful Sets                     | control-plane-prometheus, hpc-prometheus-manager-nodeIP |
 |             |                   | Prometheus Overview               | control-plane-prometheus, hpc-prometheus-manager-nodeIP |
 | Slurm       |                   | CPUs/GPUs, Jobs, Nodes, Scheduler | hpc-prometheus-manager-nodeIP                           |
 | Slurm       |                   | Node Exporter Server Metrics      | hpc-prometheus-manager-nodeIP                           |

+ 17 - 0
docs/PreRequisites/Control_Plane_Security_PreReqs.md

@@ -0,0 +1,17 @@
+# Pre-requisites Before Enabling Security: Control Plane
+
+* Set hostname of control plane to hostname.domainname format using the below command:
+`hostnamectl set-hostname <hostname>.<domainname>`
+>>Eg: `hostnamectl set-hostname valdiationms.omnia.test`
+>> __Note:__ 
+>>	* The Hostname should not contain the following characters: , (comma), \. (period) or _ (underscore). However, the **domain name** is allowed commas and periods. 
+>>	* The Hostname cannot start or end with a hyphen (-).
+>>	* No upper case characters are allowed in the hostname.
+>>	* The hostname cannot start with a number.
+
+* Add the set hostname in `/etc/hosts` using vi editor.
+
+`vi /etc/hosts`
+
+* Add the IP of the control plane with the above hostname using `hostnamectl` command in the last line of the file.
+>> Eg: xx.xx.xx.xx <hostname>

+ 19 - 0
docs/PreRequisites/Login_Node_Security_PreReqs.md

@@ -0,0 +1,19 @@
+# Pre-requisites Before Enabling Security: Login Node
+
+* Verify that the login node host name has been set. If not, use the following steps to set it.
+	* Set hostname of the login node to hostname.domainname format using the below command:
+	`hostnamectl set-hostname <hostname>.<domainname>`
+	>>Eg: `hostnamectl set-hostname login-node.omnia.test`
+	* Add the set hostname in `/etc/hosts` using vi editor.
+
+	`vi /etc/hosts`
+
+    * Add the IP of the login node with the above hostname using `hostnamectl` command in last line of the file.
+  
+	__Eg:__  xx.xx.xx.xx <hostname>
+	
+>> __Note:__ 
+>>	* The Hostname should not contain the following characters: , (comma), \. (period) or _ (underscore). However, the **domain name** is allowed commas and periods. 
+>>	* The Hostname cannot start or end with a hyphen (-).
+>>	* No upper case characters are allowed in the hostname.
+>>	* The hostname cannot start with a number.

+ 56 - 0
docs/PreRequisites/Omnia_Control_Plane_PreReqs.md

@@ -0,0 +1,56 @@
+# Pre-requisites Before Running Control Plane
+* Ensure that a stable Internet connection is available on control plane.
+* Rocky 8 is installed on the control plane. 		 
+* To provision the bare metal servers, download one of the following ISOs for deployment:
+    1. [Leap 15.3](https://get.opensuse.org/leap/)
+    2. [Rocky 8](https://rockylinux.org/)
+* As a best practice, ensure that PowerCap policy is disabled and the BIOS system profile is set to Performance on the Control Plane.
+* For DHCP configuration, you can provide a host mapping file (Example available [here](../../examples/host_mapping_file_os_provisioning.csv)). If the mapping file is not provided and the variable is left blank, a default mapping file will be created. The provided details must be in the format: MAC address, Hostname, IP address, Component_role. For example, `10:11:12:13,server1,100.96.20.66,compute` and  `14:15:16:17,server2,100.96.22.199,manager` are valid entries.  
+>> __Note:__  
+>>	* In the `omnia/examples` folder, a **mapping_host_file.csv** template is provided which can be used for DHCP configuration. The header in the template file must not be deleted before saving the file. It is recommended to provide this optional file as it allows IP assignments provided by Omnia to be persistent across control plane reboots.  
+>>	* The Hostname should not contain the following characters: , (comma), \. (period) or _ (underscore). However, the **domain name** is allowed commas and periods. 
+>>	* The Hostname cannot start or end with a hyphen (-).
+>>	* No upper case characters are allowed in the hostname.
+>>	* The hostname cannot start with a number.
+* Connect one of the Ethernet cards on the control plane to the HPC switch. The other Ethernet card must be connected to the internet network. 
+* Ensure that all connection names under the network manager match their corresponding device names. This can be verified using the command `nmcli connection`. In the event of a mismatch, edit the file `/etc/sysconfig/network-scripts/ifcfg-<nic name>` using vi editor. 
+* You must have root privileges to perform installations and configurations using the Omnia control plane.
+* On the control plane, ensure that Python 3.6 and Ansible are installed (The following commands are compatible with all 3 OS's unless marked otherwise).  
+    * Run the following commands to install Python 3.6:  
+      `dnf install epel-release -y` <br><br> `dnf install python3 -y`
+    * Run the following commands to install Ansible:
+       ```
+       pip3.6 install --upgrade pip
+       python3.6 -m pip install ansible
+       ```
+    After the installation is complete, run `ansible --version` to verify if the installation is successful. In the output, ensure that the executable location path is present in the PATH variable by running `echo $PATH`.
+    If executable location path is not present, update the path by running `export PATH=$PATH:<executable location>\`.  
+	
+    For example,  
+    ```
+    ansible -- version
+    ansible 2.10.9
+    config file = None
+    configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
+    ansible python module location = /usr/local/lib/python3.6/site-packages/ansible
+    executable location = /usr/local/bin/ansible
+    python version = 3.6.8 (default, Aug 24 2020, 17:57:11) [GCC 8.3.1 20191121 (Red Hat 8.3.1-5)]
+    ```
+    The executable location is `/usr/local/bin/ansible`. Update the path by running the following command:
+    ```
+    export PATH=$PATH:/usr/local/bin
+    ```  
+	
+    >>__Note__: To deploy Omnia, Python 3.6 provides bindings to system tools such as RPM, DNF, and SELinux. As versions greater than 3.6 do not provide these bindings to system tools, ensure that you install Python 3.6 with dnf.  
+    >> __Note__: If SELinux is not disabled on the control plane, disable it from `/etc/sysconfig/selinux` and restart the control plane.
+    >>__Note__: If Ansible version 2.9 or later is installed, ensure it is uninstalled before installing a newer version of Ansible. Run the following commands to uninstall Ansible before upgrading to newer version.
+    1. `pip uninstall ansible`
+    2. `pip uninstall ansible-base (if ansible 2.9 is installed)`
+    3. `pip uninstall ansible-core (if ansible 2.10  > version is installed)`
+* On the control plane, run the following commands to install Git:
+  `dnf install epel-release -y` <br><br> `dnf install git -y`
+>> **Note**:
+>> * After the installation of the Omnia appliance, changing the control plane is not supported. If you need to change the control plane, you must redeploy the entire cluster.
+>> * If there are errors while executing any of the Ansible playbook commands, then re-run the commands.  
+
+* Fill in all required parameters under `/control_plane/input_parameters` and security parameters under `omnia_security_config.yml`/ `security_vars.yml` based on the provided [Input Parameter Guide](../Input_Parameter_Guide)

+ 15 - 0
docs/PreRequisites/Telemetry_Visualization_PreReqs.md

@@ -0,0 +1,15 @@
+# Pre-Requisites Before Running `telemetry.yml`
+
+## Prerequisites to Enabling iDRAC Telemetry
+* All target devices should run iDRAC firmware version > 4.
+* All target devices should have a datacenter license.
+
+## Prerequisites to Enabling Slurm Telemetry
+* Slurm Telemetry cannot be executed without iDRAC support
+* Omnia control plane should be executed and node_inventory should be created in awx.
+* The slurm manager and compute nodes are fetched at run time from node_inventory.
+* Slurm should be installed on the nodes, if not there is no point in executing slurm telemetry.
+* A minimum of one cluster is required for Slurm Telemetry to work.
+* Once telemetry is running, delete the pods and images on control plane if a cluster change is intended.
+
+Once all pre-requisites are met, enter the required input parameters based on the [provided guides](../Input_Parameter_Guide/Telemetry_Visualization_Parameters).

+ 23 - 168
docs/README.md

@@ -14,7 +14,7 @@
 - [Solution Brief: Omnia Software](https://infohub.delltechnologies.com/section-assets/omnia-solution-brief)
 
 ## What Omnia does
-Omnia can build clusters that use Slurm or Kubernetes (or both!) for workload management. Omnia will install software from a variety of sources, including:
+Omnia can deploy and configure devices, and build clusters that use Slurm or Kubernetes (or both!) for workload management. Omnia will install software from a variety of sources, including:
 - Helm repositories
 - Source code compilation
 - [OperatorHub](https://operatorhub.io)
@@ -28,173 +28,28 @@ Omnia can deploy firmware, install Kubernetes or Slurm (or both), along with add
 ![Omnia Slurm Stack](images/omnia-slurm.png)  
 
 ## What's new in this release
-- Support for Rocky 8.x with latest python/ansible on the Management Station
-- Support for Leap 15.3 on the cluster
-- Support for Rocky 8.x on the cluster
-- Added Grafana integration for better monitoring capability
-- Added Loki Log aggregation of Var Logs
-- Added Slurm/K8s Monitoring capability
-- Added security features to comply with NIST 800-53 Revision 5 and 800-171 Revision 5
-- Added the ability to collect telemetry information from SLURM and iDRAC
-- Added Grafana plugins to view real time graphs of cluster/node statistics
-
-## Deploying clusters using the Omnia control plane
-The Omnia Control Plane will automate the entire cluster deployment process, starting with provisioning the operating system on the supported devices and updating the firmware versions of PowerEdge Servers. 
-For detailed instructions, see [Install the Omnia Control Plane](INSTALL_OMNIA_CONTROL_PLANE.md).  
-
-## Installing Omnia to servers with a pre-provisioned OS
-Omnia can be deployed on clusters that already have an RPM-based Linux OS running on them and are all connected to the Internet. Currently, all Omnia testing is done using the software versions mentioned [here](README.md#System-requirements ). Please see [Example system designs](EXAMPLE_SYSTEM_DESIGNS.md) for instructions on the network setup.
-
-Once servers have functioning OS and networking, you can use Omnia to install and start Slurm and/or Kubernetes. For detailed instructions, see [Install Omnia using CLI](INSTALL_OMNIA.md). 
-
-# System requirements  
-The following table lists the software and operating system requirements on the management station, manager, and compute nodes. To avoid any impact on the proper functioning of Omnia, other versions than those listed are not supported.  
-
-Requirements  |   Version
-----------------------------------  |   -------
-OS pre-installed on the management station  |  Rocky 8.x
-OS deployed by Omnia on bare-metal Dell EMC PowerEdge Servers | Rocky 8.x Minimal Edition/ Leap 15.x
-Ansible  |  2.9.21
-Python  |  3.6.15
-
-## Hardware managed by Omnia
-The following table lists the supported devices managed by Omnia. Other devices than those listed in the following table will be discovered by Omnia, but features offered by Omnia will not be applicable.
-
-Device type	|	Supported models	
------------	|	-------	
-Dell EMC PowerEdge Servers	|	PowerEdge C4140, C6420, C6520, R240, R340, R440, R540, R640, R650, R740, R740xd, R740xd2, R750, R750xa, R840, R940, R940xa
-Dell EMC PowerVault Storage	|	PowerVault ME4084, ME4024, and ME4012 Storage Arrays
-Dell EMC Networking Switches	|	PowerSwitch S3048-ON and PowerSwitch S5232F-ON
-Mellanox InfiniBand Switches	|	NVIDIA MQM8700-HS2F Quantum HDR InfiniBand Switch 40 QSFP56
-
-
-## Software deployed by Omnia
-The following table lists the software and its compatible version managed by Omnia. To avoid any impact on the proper functioning of Omnia, other versions than those listed are not supported.
-
-| Software	                                  	| 	License	                                                                    | 	Compatible Version	                            | 	Description                                                                                                                                                 |
-|-------------------------------------------	|-----------------------------------------------------------------------------	|-------------------------------------------------	|--------------------------------------------------------------------------------------------------------------------------------------------------------------	|
-| LeapOS 15.3	                               	| 	-	                                                                        | 	15.x                                            | 	Operating system on entire cluster                                                                                                                          |
-| CentOS Linux release 7.9.2009 (Core)	      	| 	-	                                                                        | 	7.9	                                            | 	Operating system on entire cluster except for management station                                                                                            |
-| Rocky 8.x	                                 	| 	-	                                                                        | 	8.x	                                            | 	Operating system on entire cluster except for management station                                                                                            |
-| Rocky 8.x	                                 	| 	-	                                                                        | 	8.x	                                            | 	Operating system on the management station                                                                                                                  |
-| MariaDB	                                   	| 	GPL 2.0	                                                                    | 	5.5.68	                                        | 	Relational database used by Slurm                                                                                                                           |
-| Slurm	                                     	| 	GNU General Public	                                                        | 	20.11.7	                                        | 	HPC Workload Manager                                                                                                                                        |
-| Docker CE	                                 	| 	Apache-2.0	                                                                | 	20.10.2	                                        | 	Docker Service                                                                                                                                              |
-| FreeIPA	                                   	| 	GNU General Public License v3	                                            | 	4.6.8	                                        | 	Authentication system used in the login node                                                                                                                |
-| OpenSM	                                    | 	GNU General Public License 2	                                            | 	3.3.24	                                        | 	-                                                                                                                                                           |
-| NVIDIA container runtime	                  	| 	Apache-2.0	                                                                | 	3.4.2	                                        | 	Nvidia container runtime library                                                                                                                            |
-| Python PIP	                                | 	MIT License	                                                                | 	21.1.2	                                        | 	Python Package                                                                                                                                              |
-| Python3	                                   	| 	-	                                                                        | 	3.6.8 (3.6.15 if LeapOS is being used)	        | 	-                                                                                                                                                           |
-| Kubelet	                                   	| 	Apache-2.0	                                                                | 	1.16.7,1.19, 1.21  	                            | 	Provides external, versioned ComponentConfig API types for configuring   the kubelet                                                                        |
-| Kubeadm	                                   	| 	Apache-2.0	                                                                | 	1.16.7,1.19, 1.21 	                            | 	"fast paths" for creating Kubernetes clusters                                                                                                               |
-| Kubectl	                                   	| 	Apache-2.0	                                                                | 	1.16.7,1.19, 1.21 	                            | 	Command line tool for Kubernetes                                                                                                                            |
-| kubernetes.core	                           	| 	GPL 3.0	                                                                    | 	2.2.3 	                                        | 	Performs CRUD operations on K8s onjects                                                                                                                     |
-| JupyterHub	                                | 	Modified BSD License	                                                    | 	1.1.0	                                        | 	Multi-user hub                                                                                                                                              |
-| kubernetes Controllers	                    | 	Apache-2.0	                                                                | 	1.16.7,1.19 (1.21 if LeapOS is being used)	    | 	Orchestration tool	                                                                                                                                        |
-| Kfctl	                                     	| 	Apache-2.0	                                                                | 	1.0.2	                                        | 	CLI for deploying and managing Kubeflow                                                                                                                     |
-| Kubeflow	                                  	| 	Apache-2.0	                                                                | 	1	                                            | 	Cloud Native platform for machine learning                                                                                                                  |
-| Helm	                                      	| 	Apache-2.0	                                                                | 	3.5.0	                                        | 	Kubernetes Package Manager                                                                                                                                  |
-| Helm Chart	                                | 	-	                                                                        | 	0.9.0	                                        | 	-                                                                                                                                                           |
-| TensorFlow	                                | 	Apache-2.0	                                                                | 	2.1.0	                                        | 	Machine Learning framework                                                                                                                                  |
-| Horovod	                                   	| 	Apache-2.0	                                                                | 	0.21.1	                                        | 	Distributed deep learning training framework for Tensorflow                                                                                                 |
-| MPI	                                       	| 	Copyright (c) 2018-2019 Triad National Security,LLC. All rights   reserved.	| 	0.3.0	                                        | 	HPC library                                                                                                                                                 |
-| CoreDNS	                                   	| 	Apache-2.0	                                                                | 	1.6.2	                                        | 	DNS server that chains plugins                                                                                                                              |
-| CNI	                                       	| 	Apache-2.0	                                                                | 	0.3.1	                                        | 	Networking for Linux containers                                                                                                                             |
-| AWX	                                       	| 	Apache-2.0	                                                                | 	20.0.0	                                        | 	Web-based User Interface                                                                                                                                    |
-| AWX.AWX	                                   	| 	Apache-2.0	                                                                | 	19.4.0	                                        | 	Galaxy collection to perform awx configuration                                                                                                              |
-| AWXkit	                                    | 	Apache-2.0	                                                                | 	18.0.0	                                        | 	To perform configuration through CLI commands                                                                                                               |
-| CRI-O	                                     	| 	Apache-2.0	                                                                | 	1.21, 1.22.0  									| 	Container Service                                                                                                                                           |
-| Buildah	                                   	| 	Apache-2.0	                                                                | 	1.22.4	                                        | 	Tool to build and run containers                                                                                                                            |
-| PostgreSQL	                                | 	Copyright (c) 1996-2020, PostgreSQL Global Development Group	            | 	10.15	                                        | 	Database Management System                                                                                                                                  |
-| Redis	                                     	| 	BSD-3-Clause License	                                                    | 	6.0.10	                                        | 	In-memory database                                                                                                                                          |
-| NGINX	                                     	| 	BSD-2-Clause License	                                                    | 	1.14	                                        | 	-                                                                                                                                                           |
-| dellemc.os10	                              	| 	GNU-General Public License v3.1	                                            | 	1.1.1	                                        | 	It provides networking hardware abstraction through a common set of APIs                                                                                    |
-| grafana	                                   	| 	Apache-2.0	                                                                | 	8.3.2	                                        | 	Grafana is the open source analytics & monitoring solution for every   database.                                                                            |
-| community.grafana	                         	| 	GPL 3.0	                                                                    | 	1.3.0	                                        | 	Technical Support for open source grafana                                                                                                                   |
-| OMSDK	                                     	| 	Apache-2.0	                                                                | 	1.2.488	                                        | 	Dell EMC OpenManage Python SDK (OMSDK) is a python library that helps   developers and customers to automate the lifecycle management of PowerEdge   Servers|
-| activemq	                                  	| 	Apache-2.0	                                                                | 	5.10.0	                                        | 	Most popular multi protocol, message broker                                                                                                                 |
-|  Loki                                     	|  Apache License 2.0                                                         	|  2.4.1                                          	|  Loki is a log aggregation   system   designed to store and query   logs from all your applications and     infrastructure                                   	|
-|  Promtail                                 	|  Apache License 2.1                                                         	|  2.4.1                                          	|  Promtail is an agent which ships   the contents of local logs to   a   private Grafana Loki instance or Grafana Cloud.                                      	|
-|  kube-prometheus-stack                    	|  Apache License 2.2                                                         	|  25.0.0                                         	|  Kube Prometheus Stack is a   collection of Kubernetes manifests,     Grafana dashboards, and Prometheus rules.                                              	|
-|  mailx                                    	|  MIT License                                                                	|  12.5                                           	|  mailx is a Unix utility program   for sending and receiving   mail.                                                                                         	|
-|  postfix                                  	|  IBM Public License                                                         	|  3.5.8                                          	|  Mail Transfer Agent (MTA) designed   to determine routes and   send   emails                                                                                	|
-|  xorriso                                  	|  GPL version 3                                                              	|  1.4.8                                          	|  xorriso copies file objects from   POSIX compliant filesystems   into Rock   Ridge enhanced ISO 9660 filesystems.                                           	|
-|  Dell EMC     OpenManage Ansible Modules  	|  GNU- General Public License   v3.0                                         	|  5.0.0                                          	|  OpenManage Ansible Modules   simplifies and automates     provisioning, deployment, and updates of PowerEdge servers and   modular   infrastructure.        	|
-|  389-ds                                   	|  GPL version 3                                                              	|  1.4.4                                          	|   LDAP server used for   authentication, access control.                                                                                                     	|
-|  sssd                                     	|  GPL version 3                                                              	|  1.16.1                                         	|  A set of daemons used to manage   access to remote directory services and authentication mechanisms.                                                        	|
-|  krb5                                     	|  MIT License                                                                	|  1.19.2                                         	|  Authentication protocol providing   strong authentication for client/server applications by using secret-key   cryptography                                 	|
-|  openshift                                	|  Apache 2.0                                                                 	|  0.12.1                                         	|  an on-premises  platform as a   service built around Linux containers orchestrated and managed   by Kubernetes                                              	|
-| golang                                    	| BSD-3-Clause License                                                        	| 1.17                                            	| Go is a statically typed, compiled programming language designed at   Google                                                                                 	|
-| mysql                                     	| GPL 2.0                                                                     	| 8                                               	| MySQL is an open-source relational database management system.                                                                                               	|
-| postgresSQL                               	| PostgresSQL License                                                         	| 12                                              	| PostgreSQL, also known as Postgres, is a free and open-source relational   database management system emphasizing extensibility and SQL compliance.          	|
-| idrac-telemetry-reference tools           	| Apache-2.0                                                                  	| 0.1                                             	| Reference toolset for PowerEdge telemetry metric collection and   integration with analytics and visualization solutions.                                    	|
-| jansson                                   	| MIT License                                                                 	| 2.14                                            	| C library for encoding, decoding and manipulating JSON data                                                                                                  	|
-| libjwt                                    	| MPL-2.0 License                                                             	| 1.13.0                                          	| JWT C Library                                                                                                                                                	|
-| apparmor                                  	| GNU General Public License                                                  	| 3.0.3                                           	| Controls access based on paths of the program files                                                                                                          	|
-| nsfcac/grafana-plugin                     	| Apache-2.0                                                                  	| 2.1.0                                           	| Machine Learning Framework                                                                                                                                   	|
-| apparmor                                  	| GNU General Public License                                                  	| 3.0.3                                           	| Controls access based on paths of the program files                                                                                                          	|
-| snoopy                                    	| GPL 2.0                                                                     	| 2.4.15                                          	| Snoopy is a small library that logs all program executions on your   Linux/BSD system                                                                        	|
-
-
-# Known issues  
-* **Issue**: Hosts are not displayed on the AWX UI.  
-	**Resolution**:  
-	* Verify if the provisioned_hosts.yml file is present in the omnia/control_plane/roles/collect_node_info/files/ folder.
-	* Verify whether the hosts are listed in the provisioned_hosts.yml file.
-	* If hosts are not listed, then servers are not PXE booted yet.
-If hosts are listed, then an IP address has been assigned to them by DHCP. However, hosts are not displayed on the AWX UI as the PXE boot is still in process or is not initiated.
-	* Check for the reachable and unreachable hosts using the provision_report.yml tool present in the omnia/control_plane/tools folder. To run provision_report.yml, in the omnia/control_plane/ directory, run playbook -i roles/collect_node_info/files/provisioned_hosts.yml tools/provision_report.yml.
-
-* **Issue**: There are **ImagePullBack** or **ErrPullImage** errors in the status of Kubernetes pods.  
-	**Cause**: The errors occur when the Docker pull limit is exceeded.  
-	**Resolution**:
-	* For **omnia.yml** and **control_plane.yml**: Provide the docker username and password for the Docker Hub account in the *omnia_config.yml* file and execute the playbook. 
-	* For HPC cluster, during omnia.yml execution, a kubernetes secret 'dockerregcred' will be created in default namespace and patched to service account. User needs to patch this secret in their respective namespace while deploying custom applications and use the secret as imagePullSecrets in yaml file to avoid ErrImagePull. [Click here for more info](https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/)
-	* **Note**: If the playbook is already executed and the pods are in __ImagePullBack__ error, then run `kubeadm reset -f` in all the nodes before re-executing the playbook with the docker credentials.
-
-* **Issue**: The `kubectl` command stops working after a reboot and displays the following error message: *The connection to the server head_node_ip:port was refused - did you specify the right host or port?*  
-	**Resolution**:
-	On the management station or the manager node, run the following commands:  
-	* `swapoff -a`
-	* `systemctl restart kubelet`  
-	
-* **Issue**: If control_plane.yml fails at the webui_awx role, then the previous IP address and password are not cleared when control_plane.yml is re-run.   
-	**Resolution**: In the *webui_awx/files* directory, delete the *.tower_cli.cfg* and *.tower_vault_key* files, and then re-run `control_plane.yml`.
-
-* **Issue**: The FreeIPA server and client installation fails.  
-	**Cause**: The hostnames of the manager and login nodes are not set in the correct format.  
-	**Resolution**: If you have enabled the option to install the login node in the cluster, set the hostnames of the nodes in the format: *hostname.domainname*. For example, *manager.omnia.test* is a valid hostname for the login node. **Note**: To find the cause for the failure of the FreeIPA server and client installation, see *ipaserver-install.log* in the manager node or */var/log/ipaclient-install.log* in the login node.  
-	
-* **Issue**: The inventory details are not updated in AWX when device or host credentials are invalid.  
-	**Resolution**: Provide valid credentials of the devices and hosts in the cluster. 
-
-* **Issue**: The Host list is empty after executing the control_plane playbook.  
-	**Resolution**: Ensure that all devices used are in DHCP enabled mode.
-	
-* **Issue**: The task 'Install Packages' fails on the NFS node with the message: `Failure in talking to yum: Cannot find a valid baseurl for repo: base/7/x86_64.`  
-	**Cause**: There are connections missing on the NFS node.  
-	**Resolution**: Ensure that there are 3 nics being used on the NFS node:
-	1. For provisioning the OS
-	2. For connecting to the internet (Management purposes)
-	3. For connecting to PowerVault (Data Connection)  
-	
-	
-* **Issue**: Hosts are not automatically deleted from awx UI when redeploying the cluster.  
-	**Resolution**: Before re-deploying the cluster, ensure that the user manually deletes all hosts from the awx UI.
-	
-
-# [Frequently asked questions](FAQ.md)
-
-# Limitations
-* Removal of Slurm and Kubernetes component roles are not supported. However, skip tags can be provided at the start of installation to select the component roles.​  
-* After installing the Omnia control plane, changing the manager node is not supported. If you need to change the manager node, you must redeploy the entire cluster.  
-* Dell Technologies provides support to the Dell-developed modules of Omnia. All the other third-party tools deployed by Omnia are outside the support scope.​
-* To change the Kubernetes single node cluster to a multi-node cluster or change a multi-node cluster to a single node cluster, you must either redeploy the entire cluster or run `kubeadm reset -f` on all the nodes of the cluster. You then need to run the *omnia.yml* file and skip the installation of Slurm using the skip tags.  
-* In a single node cluster, the login node and Slurm functionalities are not applicable. However, Omnia installs FreeIPA Server and Slurm on the single node.  
-* To change the Kubernetes version from 1.16 to 1.19 or 1.19 to 1.16, you must redeploy the entire cluster.  
-* The Kubernetes pods will not be able to access the Internet or start when firewalld is enabled on the node. This is a limitation in Kubernetes. So, the firewalld daemon will be disabled on all the nodes as part of omnia.yml execution.
-* Only one storage instance (Powervault) is currently supported in the HPC cluster.
-* Cobbler web support has been discontinued from Omnia 1.2 onwards.
+- Support for Rocky 8.5 with the latest python/ansible on the Control Plane.
+- Support for Leap 15.3 on the cluster.
+- Support for Rocky 8.5 on the cluster.
+- Added Grafana integration for better monitoring capability.
+- Added Loki Log aggregation of Var Logs.
+- Added Slurm/K8s Monitoring capability.
+- Added security features to comply with NIST 800-53 Revision 5 and 800-171 Revision 5.
+- Added the ability to collect telemetry information from SLURM and iDRAC.
+- Added Grafana plugins to view real time graphs of cluster/node statistics.
+
+# Using Omnia
+1. Verify that your system meets Omnia's [hardware](Support_Matrix/Hardware) and [software requirements](Support_Matrix/Software/Operating_Systems)
+2. Ensure that all [pre-requisites](PreRequisites) are met.
+3. Fill out all the required [input parameters](Input_Parameter_Guide).
+4. [Run Control_Plane](Installation_Guides/INSTALL_OMNIA_CONTROL_PLANE.md) to provision OS's, [configure devices](Device_Configuration) and set up [security measures](Security):
+5. [Run Omnia](Installation_Guides/INSTALL_OMNIA_CLI.md) to set up Kubernetes and Slurm.
+6. Run the telemetry playbook to [set up](Installation_Guides/INSTALL_TELEMETRY.md) and use [Telemetry and Visualization Services](Telemetry_Visualization)
+![Omnia Flow](images/Omnia_Flow.png)
+
+## Troubleshooting Omnia
+* For a list of commonly encountered issues, check out our [FAQs](Troubleshooting/FAQ.md).
+* To troubleshoot Omnia, use our [Troubleshooting Guide](Troubleshooting/Troubleshooting_Guide.md).
 
 
 # Contributing to Omnia

+ 82 - 0
docs/Security/ENABLE_SECURITY_CONTROL_PLANE.md

@@ -0,0 +1,82 @@
+# Enabling Security on the Control Plane
+
+Omnia uses [FreeIPA (on RockyOS)](https://www.freeipa.org/page/Documentation
+) and [389ds(on Leap)](https://doc.opensuse.org/documentation/leap/security/html/book-security/cha-security-ldap.html
+) to enable security features like authorisation and access control.
+
+
+## Enabling Authentication on the Control Plane:
+
+Once all [pre-requisites](../PreRequisites/Control_Plane_Security_PreReqs.md) are met, set the parameter 'enable_security_support' to true in `base_vars.yml`
+
+>> __Note:__ 
+>> * In the event that `control_plane.yml` fails after executing the control plane security tasks, `sshd` services will have to be restarted manually by the User.
+>> * Once security features are enabled on the control plane, `/etc/resolv.conf` will become immutable. To edit the file, run `chattr -i /etc/resolv.conf` . To make file immutable after edits, run `chattr +i /etc/resolv.conf`. Changes made using this method may not be persistent across reboots.
+## Limiting User Authentication over sshd
+
+Users logging into this host can be __optionally__ allowed or denied using an access control list. All users to be allowed or denied are to be listed in the variable `user` in `security_vars.yml`. 
+
+>> __Note:__ All users on the server will have to be defined manually. Omnia does not create any users by default.
+
+## Session Timeout
+
+To encourage security, users who have been idle over 3 minutes will be logged out automatically. To adjust this value, update the `session_timeout` variable in `security_vars.yml`. This variable is mandatory. 
+
+## Restricting Program Support
+
+Optionally, different communication protocols can be disabled on the control plane using the `restrict_program_support` and `restrict_softwares` variables. These protocols include: telnet,lpd,bluetooth,rlogin and rexec. Features that cannot be disabled include: ftp,smbd,nmbd,automount and portmap. 
+
+## Logging Program Executions using Snoopy
+
+Omnia installs Snoopy to log all program executions on Linux/BSD systems. For more information on Snoopy, click [here](https://github.com/a2o/snoopy).
+
+## Logging User activity using PSACCT/ACCT
+
+Using PSACCT on Rocky and Acct on LeapOS, admins can monitor activity. For more information, click [here](https://www.redhat.com/sysadmin/linux-system-monitoring-acct).
+
+## Configuring Email Alerts for Authentication Failures
+
+If the `alert_email_address` variable in `security_config.yml` is populated with a single, valid email ID, all authentication failures will trigger an email notification. A cron job is set up to verify failures and send emails every hour.
+
+>> __Note:__ The `alert_email_address` variable is __optional__. If it is not populated, authentication failure email alerts will be disabled.
+
+## Log Aggregation via Grafana
+
+[Loki](https://grafana.com/docs/loki/latest/fundamentals/overview/) is a datastore used to efficiently hold log data for security purposes. Using the `promtail` agent, logs are collated and streamed via a HTTP API.
+
+>> __Note:__ When `control_plane.yml` is run, Loki is automatically set up as a data source on the Grafana UI.
+
+
+
+### Querying Loki 
+
+Loki uses basic regex based syntax to filter for specific jobs, dates or timestamps.
+
+* Select the Explore ![Explore Icon](../Telemetry_Visualization/Images/ExploreIcon.PNG) tab to select control-plane-loki from the drop down.
+* Using [LogQL queries](https://grafana.com/docs/loki/latest/logql/log_queries/), all logs in `/var/log` can be accessed using filters (Eg: `{job=???Omnia???}` )
+
+## Viewing Logs on the Dashboard
+
+All log files can be viewed via the Dashboard tab (![Dashboard Icon](../Telemetry_Visualization/Images/DashBoardIcon.PNG)). The Default Dashboard displays `omnia.log` and `syslog`. Custom dashboards can be created per user requirements.
+
+Below is a list of all logs available to Loki and can be accessed on the dashboard:
+
+| Name               | Location                                  | Purpose                      | Additional Information                                                                             |
+|--------------------|-------------------------------------------|------------------------------|----------------------------------------------------------------------------------------------------|
+| Omnia Logs         | /var/log/omnia.log                        | Omnia Log                    | This log is configured by Default                                                                  |
+| syslogs            | /var/log/messages                         | System Logging               | This log is configured by Default                                                                  |
+| Audit Logs         | /var/log/audit/audit.log                  | All Login Attempts           | This log is configured by Default                                                                  |
+| CRON logs          | /var/log/cron                             | CRON Job Logging             | This log is configured by Default                                                                  |
+| Pods logs          | /var/log/pods/ * / * / * log                    | k8s pods                     | This log is configured by Default                                                                  |
+| Access Logs        | /var/log/dirsrv/slapd-<Realm Name>/access | Directory Server Utilization | This log is available when FreeIPA or 389ds is set up ( ie when   enable_security_support is set to 'true') |
+| Error Log          | /var/log/dirsrv/slapd-<Realm Name>/errors | Directory Server Errors      | This log is available when FreeIPA or 389ds is set up ( ie when   enable_security_support is set to 'true') |
+| CA Transaction Log | /var/log/pki/pki-tomcat/ca/transactions   | FreeIPA PKI Transactions     | This log is available when FreeIPA or 389ds is set up ( ie when   enable_security_support is set to 'true') |
+| KRB5KDC            | /var/log/krb5kdc.log                      | KDC Utilization              | This log is available when FreeIPA or 389ds is set up ( ie when   enable_security_support is set to 'true') |
+| Secure logs        | /var/log/secure                           | Login Error Codes            | This log is available when FreeIPA or 389ds is set up ( ie when   enable_security_support is set to 'true') |
+| HTTPD logs         | /var/log/httpd/*                          | FreeIPA API Call             | This log is available when FreeIPA or 389ds is set up ( ie when   enable_security_support is set to 'true') |
+| DNF logs           | /var/log/dnf.log                          | Installation Logs            | This log is configured on Rocky OS                                                                 |
+| Zypper Logs        | /var/log/zypper.log                       | Installation Logs            | This log is configured on Leap OS                                                                  |
+
+
+
+

文件差异内容过多而无法显示
+ 2 - 38
docs/Security/ENABLE_SECURITY_LOGIN_NODE.md


docs/Security/LOGIN_USER_CREATION.md → docs/Security/FreeIPA_User_Creation.md


+ 11 - 0
docs/Support_Matrix/Hardware/Servers.md

@@ -0,0 +1,11 @@
+# Servers Supported By Omnia
+For information on how to configure your servers, [click here.](../../Device_Configuration/Servers.md)
+
+## PowerEdge Servers
+
+| Server Type 	| Server Model                                                                	|
+|-------------	|-----------------------------------------------------------------------------	|
+| 14g         	| C4140 C6420 R240 R340 R440 R540 R640  R740 R740xd R740xd2  R840 R940 R940xa 	|
+| 15g         	| C6520  R650    R750 R750xa                                                  	|
+ >> __Note:__ Support for 15g servers began after Omnia 1.2
+

+ 7 - 0
docs/Support_Matrix/Hardware/Storage.md

@@ -0,0 +1,7 @@
+# Storage Supported by Omnia
+For information on how to configure your storage systems, [click here.](../../Device_Configuration/Storage.md)
+## PowerVault Storage
+
+| Storage Type 	| Storage Model                     	|
+|--------------	|-----------------------------------	|
+| ME4          	|  ME4084       ME4024       ME4012 	|

+ 13 - 0
docs/Support_Matrix/Hardware/Switches.md

@@ -0,0 +1,13 @@
+# Switches Supported by Omnia
+
+## Networking Switches
+For information on how to configure your Ethernet Switch, [click here.](../../Device_Configuration/Ethernet_Switches.md)
+| Switch Type                    	| Switch Model                               	|
+|--------------------------------	|--------------------------------------------	|
+| Dell EMC   Networking Switches 	| PowerSwitch S3048-ON PowerSwitch S5232F-ON 	|
+
+## Infiniband Switches
+For information on how to configure your Ethernet Switch, [click here.](../../Device_Configuration/Infiniband_Switches.md)
+| Switch Type                    	| Switch Model                                               	|
+|--------------------------------	|-------------------------------------------------------------	|
+| Mellanox   InfiniBand Switches 	| NVIDIA MQM8700-HS2F Quantum HDR InfiniBand Switch 40 QSFP56 	|

+ 65 - 0
docs/Support_Matrix/Software/Additional_Software.md

@@ -0,0 +1,65 @@
+# Software Deployed by Omnia
+
+| Software	                                  	| 	License	                                                                    | 	Compatible Version	                            | 	Description                                                                                                                                                 |
+|-------------------------------------------	|-----------------------------------------------------------------------------	|-------------------------------------------------	|--------------------------------------------------------------------------------------------------------------------------------------------------------------	|
+| LeapOS 15.3	                               	| 	-	                                                                        | 	15.x                                            | 	Operating system on entire cluster                                                                                                                          |
+| CentOS Linux release 7.9.2009 (Core)	      	| 	-	                                                                        | 	7.9	                                            | 	Operating system on entire cluster except for control plane                                                                                            |
+| Rocky 8.x	                                 	| 	-	                                                                        | 	8.x	                                            | 	Operating system on entire cluster except for control plane                                                                                            |
+| Rocky 8.x	                                 	| 	-	                                                                        | 	8.x	                                            | 	Operating system on the control plane                                                                                                                  |
+| MariaDB	                                   	| 	GPL 2.0	                                                                    | 	5.5.68	                                        | 	Relational database used by Slurm                                                                                                                           |
+| Slurm	                                     	| 	GNU General Public	                                                        | 	20.11.7	                                        | 	HPC Workload Manager                                                                                                                                        |
+| Docker CE	                                 	| 	Apache-2.0	                                                                | 	20.10.2	                                        | 	Docker Service                                                                                                                                              |
+| FreeIPA	                                   	| 	GNU General Public License v3	                                            | 	4.6.8	                                        | 	Authentication system used in the login node                                                                                                                |
+| OpenSM	                                    | 	GNU General Public License 2	                                            | 	3.3.24	                                        | 	-                                                                                                                                                           |
+| NVIDIA container runtime	                  	| 	Apache-2.0	                                                                | 	3.4.2	                                        | 	Nvidia container runtime library                                                                                                                            |
+| Python PIP	                                | 	MIT License	                                                                | 	21.1.2	                                        | 	Python Package                                                                                                                                              |
+| Python3	                                   	| 	-	                                                                        | 	3.6.8 (3.6.15 if LeapOS is being used)	        | 	-                                                                                                                                                           |
+| Kubelet	                                   	| 	Apache-2.0	                                                                | 	1.16.7,1.19, 1.21  	                            | 	Provides external, versioned ComponentConfig API types for configuring   the kubelet                                                                        |
+| Kubeadm	                                   	| 	Apache-2.0	                                                                | 	1.16.7,1.19, 1.21 	                            | 	"fast paths" for creating Kubernetes clusters                                                                                                               |
+| Kubectl	                                   	| 	Apache-2.0	                                                                | 	1.16.7,1.19, 1.21 	                            | 	Command line tool for Kubernetes                                                                                                                            |
+| kubernetes.core	                           	| 	GPL 3.0	                                                                    | 	2.2.3 	                                        | 	Performs CRUD operations on K8s onjects                                                                                                                     |
+| JupyterHub	                                | 	Modified BSD License	                                                    | 	1.1.0	                                        | 	Multi-user hub                                                                                                                                              |
+| kubernetes Controllers	                    | 	Apache-2.0	                                                                | 	1.16.7,1.19 (1.21 if LeapOS is being used)	    | 	Orchestration tool	                                                                                                                                        |
+| Kfctl	                                     	| 	Apache-2.0	                                                                | 	1.0.2	                                        | 	CLI for deploying and managing Kubeflow                                                                                                                     |
+| Kubeflow	                                  	| 	Apache-2.0	                                                                | 	1	                                            | 	Cloud Native platform for machine learning                                                                                                                  |
+| Helm	                                      	| 	Apache-2.0	                                                                | 	3.5.0	                                        | 	Kubernetes Package Manager                                                                                                                                  |
+| Helm Chart	                                | 	-	                                                                        | 	0.9.0	                                        | 	-                                                                                                                                                           |
+| TensorFlow	                                | 	Apache-2.0	                                                                | 	2.1.0	                                        | 	Machine Learning framework                                                                                                                                  |
+| Horovod	                                   	| 	Apache-2.0	                                                                | 	0.21.1	                                        | 	Distributed deep learning training framework for Tensorflow                                                                                                 |
+| MPI	                                       	| 	Copyright (c) 2018-2019 Triad National Security,LLC. All rights   reserved.	| 	0.3.0	                                        | 	HPC library                                                                                                                                                 |
+| CoreDNS	                                   	| 	Apache-2.0	                                                                | 	1.6.2	                                        | 	DNS server that chains plugins                                                                                                                              |
+| CNI	                                       	| 	Apache-2.0	                                                                | 	0.3.1	                                        | 	Networking for Linux containers                                                                                                                             |
+| AWX	                                       	| 	Apache-2.0	                                                                | 	20.0.0	                                        | 	Web-based User Interface                                                                                                                                    |
+| AWX.AWX	                                   	| 	Apache-2.0	                                                                | 	19.4.0	                                        | 	Galaxy collection to perform awx configuration                                                                                                              |
+| AWXkit	                                    | 	Apache-2.0	                                                                | 	18.0.0	                                        | 	To perform configuration through CLI commands                                                                                                               |
+| CRI-O	                                     	| 	Apache-2.0	                                                                | 	1.21, 1.22.0  									| 	Container Service                                                                                                                                           |
+| Buildah	                                   	| 	Apache-2.0	                                                                | 	1.22.4	                                        | 	Tool to build and run containers                                                                                                                            |
+| PostgreSQL	                                | 	Copyright (c) 1996-2020, PostgreSQL Global Development Group	            | 	10.15	                                        | 	Database Management System                                                                                                                                  |
+| Redis	                                     	| 	BSD-3-Clause License	                                                    | 	6.0.10	                                        | 	In-memory database                                                                                                                                          |
+| NGINX	                                     	| 	BSD-2-Clause License	                                                    | 	1.14	                                        | 	-                                                                                                                                                           |
+| dellemc.os10	                              	| 	GNU-General Public License v3.1	                                            | 	1.1.1	                                        | 	It provides networking hardware abstraction through a common set of APIs                                                                                    |
+| grafana	                                   	| 	Apache-2.0	                                                                | 	8.3.2	                                        | 	Grafana is the open source analytics & monitoring solution for every   database.                                                                            |
+| community.grafana	                         	| 	GPL 3.0	                                                                    | 	1.3.0	                                        | 	Technical Support for open source grafana                                                                                                                   |
+| OMSDK	                                     	| 	Apache-2.0	                                                                | 	1.2.488	                                        | 	Dell EMC OpenManage Python SDK (OMSDK) is a python library that helps   developers and customers to automate the lifecycle management of PowerEdge   Servers|
+| activemq	                                  	| 	Apache-2.0	                                                                | 	5.10.0	                                        | 	Most popular multi protocol, message broker                                                                                                                 |
+|  Loki                                     	|  Apache License 2.0                                                         	|  2.4.1                                          	|  Loki is a log aggregation   system   designed to store and query   logs from all your applications and     infrastructure                                   	|
+|  Promtail                                 	|  Apache License 2.1                                                         	|  2.4.1                                          	|  Promtail is an agent which ships   the contents of local logs to   a   private Grafana Loki instance or Grafana Cloud.                                      	|
+|  kube-prometheus-stack                    	|  Apache License 2.2                                                         	|  25.0.0                                         	|  Kube Prometheus Stack is a   collection of Kubernetes manifests,     Grafana dashboards, and Prometheus rules.                                              	|
+|  mailx                                    	|  MIT License                                                                	|  12.5                                           	|  mailx is a Unix utility program   for sending and receiving   mail.                                                                                         	|
+|  postfix                                  	|  IBM Public License                                                         	|  3.5.8                                          	|  Mail Transfer Agent (MTA) designed   to determine routes and   send   emails                                                                                	|
+|  xorriso                                  	|  GPL version 3                                                              	|  1.4.8                                          	|  xorriso copies file objects from   POSIX compliant filesystems   into Rock   Ridge enhanced ISO 9660 filesystems.                                           	|
+|  Dell EMC     OpenManage Ansible Modules  	|  GNU- General Public License   v3.0                                         	|  5.0.0                                          	|  OpenManage Ansible Modules   simplifies and automates     provisioning, deployment, and updates of PowerEdge servers and   modular   infrastructure.        	|
+|  389-ds                                   	|  GPL version 3                                                              	|  1.4.4                                          	|   LDAP server used for   authentication, access control.                                                                                                     	|
+|  sssd                                     	|  GPL version 3                                                              	|  1.16.1                                         	|  A set of daemons used to manage   access to remote directory services and authentication mechanisms.                                                        	|
+|  krb5                                     	|  MIT License                                                                	|  1.19.2                                         	|  Authentication protocol providing   strong authentication for client/server applications by using secret-key   cryptography                                 	|
+|  openshift                                	|  Apache 2.0                                                                 	|  0.12.1                                         	|  an on-premises  platform as a   service built around Linux containers orchestrated and managed   by Kubernetes                                              	|
+| golang                                    	| BSD-3-Clause License                                                        	| 1.17                                            	| Go is a statically typed, compiled programming language designed at   Google                                                                                 	|
+| mysql                                     	| GPL 2.0                                                                     	| 8                                               	| MySQL is an open-source relational database management system.                                                                                               	|
+| postgresSQL                               	| PostgresSQL License                                                         	| 12                                              	| PostgreSQL, also known as Postgres, is a free and open-source relational   database management system emphasizing extensibility and SQL compliance.          	|
+| idrac-telemetry-reference tools           	| Apache-2.0                                                                  	| 0.1                                             	| Reference toolset for PowerEdge telemetry metric collection and   integration with analytics and visualization solutions.                                    	|
+| jansson                                   	| MIT License                                                                 	| 2.14                                            	| C library for encoding, decoding and manipulating JSON data                                                                                                  	|
+| libjwt                                    	| MPL-2.0 License                                                             	| 1.13.0                                          	| JWT C Library                                                                                                                                                	|
+| apparmor                                  	| GNU General Public License                                                  	| 3.0.3                                           	| Controls access based on paths of the program files                                                                                                          	|
+| nsfcac/grafana-plugin                     	| Apache-2.0                                                                  	| 2.1.0                                           	| Machine Learning Framework                                                                                                                                   	|
+| apparmor                                  	| GNU General Public License                                                  	| 3.0.3                                           	| Controls access based on paths of the program files                                                                                                          	|
+| snoopy                                    	| GPL 2.0                                                                     	| 2.4.15                                          	| Snoopy is a small library that logs all program executions on your   Linux/BSD system                                                                        	|

+ 5 - 0
docs/Support_Matrix/Software/Operating_Systems/CentOS.md

@@ -0,0 +1,5 @@
+# CentOS
+
+| OS Version     	| Control Plane 	| Compute Nodes 	|
+|----------------	|--------------------	|---------------	|
+| 7.9 Minimal OS 	| No                 	| Yes           	|

+ 6 - 0
docs/Support_Matrix/Software/Operating_Systems/LeapOS.md

@@ -0,0 +1,6 @@
+# LeapOS 
+| OS Version     	| Control Plane 	| Compute Nodes 	|
+|----------------	|--------------------	|---------------	|
+| 15.3             	| No                 	| Yes           	|
+
+As the licensed version of Leap, we also support SUSE Linux version 15.3 and above.

+ 6 - 0
docs/Support_Matrix/Software/Operating_Systems/RHEL.md

@@ -0,0 +1,6 @@
+# Red Hat Enterprise Linux
+
+| OS Version 	| Control Plane 	| Compute Nodes 	|
+|------------	|--------------------	|---------------	|
+| 8.4          	|        Yes            	|       Yes        	|
+| 7          	|        No           	|           No    	|

+ 5 - 0
docs/Support_Matrix/Software/Operating_Systems/Rocky.md

@@ -0,0 +1,5 @@
+# Rocky
+| OS Version     	| Control Plane 	| Compute Nodes 	|
+|----------------	|--------------------	|---------------	|
+| 8.4            	| Yes                 	| Yes           	|
+| 8.5            	| Yes                 	| Yes           	|

+ 0 - 11
docs/Telemetry_Visualization/VISUALIZATION.md

@@ -16,14 +16,3 @@ Using the following graphs, data can be visualized to gather correlational infor
 
 >> __Note:__ The timestamps used for the time metric are based on the `timezone` set in `control_plane/input_params/base_vars.yml`.  In the event of a mismatch between the timezone on the browser being used to access Grafana UI and the timezone in `base_vars.yml`, the time range being used to filter information on the Grafana UI will have to be adjusted per the timezone in `base_vars.yml`.
 
-### The Multi-factor Visualization Dashboard
-The Multi-factor Visualization Dashboard has 4 interactive visualization panels that allow you to see all the graphs mentioned above in a single view.
-![Multi Factor Visualization Dashboard](Images/MultiFactorVisualizationDashboard.png)
-
-Using the Node and User dropdowns on the left, nodes and users can be filtered to collect data within a given time-frame (Select the time frame on the top-right of the view).
-![Multi Factor Visualization ](Images/MultiFactorVisualizationDashboard_Filter.png)
-
-To interact with a specific panel, click on the __Panel Name__ and then select the __View__ option from the dropdown menu.
-![img.png](Images/MultiFactorVisualizationDashboard_Interact.png)
-
-

+ 84 - 4
docs/FAQ.md

@@ -1,5 +1,73 @@
 # Frequently Asked Questions
 
+## What to do when hosts do not show on the AWX UI?  
+Resolution: 
+* Verify if the provisioned_hosts.yml file is present in the omnia/control_plane/roles/collect_node_info/files/ folder.
+* Verify whether the hosts are listed in the provisioned_hosts.yml file.
+* If hosts are not listed, then servers are not PXE booted yet.
+If hosts are listed, then an IP address has been assigned to them by DHCP. However, hosts are not displayed on the AWX UI as the PXE boot is still in process or is not initiated.
+* Check for the reachable and unreachable hosts using the provision_report.yml tool present in the omnia/control_plane/tools folder. To run provision_report.yml, in the omnia/control_plane/ directory, run playbook -i roles/collect_node_info/files/provisioned_hosts.yml tools/provision_report.yml.
+
+## Why do Kubernetes Pods show `ImagePullBack` or `ErrPullImage` errors in their status?
+Potential Cause:
+    * The errors occur when the Docker pull limit is exceeded.
+Resolution:
+    * For `omnia.yml` and `control_plane.yml` : Provide the docker username and password for the Docker Hub account in the *omnia_config.yml* file and execute the playbook.
+    * For HPC cluster, during `omnia.yml execution`, a kubernetes secret 'dockerregcred' will be created in default namespace and patched to service account. User needs to patch this secret in their respective namespace while deploying custom applications and use the secret as imagePullSecrets in yaml file to avoid ErrImagePull. [Click here for more info](https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/)
+> __Note:__ If the playbook is already executed and the pods are in __ImagePullBack__ state, then run `kubeadm reset -f` in all the nodes before re-executing the playbook with the docker credentials.
+
+## What to do after a reboot if kubectl commands return: `The connection to the server head_node_ip:port was refused - did you specify the right host or port?`
+  On the control plane or the manager node, run the following commands:
+    * `swapoff -a`
+    * `systemctl restart kubelet`
+
+## How to clear up the configuration if `control_plane.yml` fails at the webui_awx stage?
+  In the `webui_awx/files` directory, delete the `.tower_cli.cfg` and `.tower_vault_key` files, and then re-run `control_plane.yml`.
+
+## Why would FreeIPA server/client installation fail?  
+Potential Cause:
+    The hostnames of the manager and login nodes are not set in the correct format.  
+Resolution:
+    If you have enabled the option to install the login node in the cluster, set the hostnames of the nodes in the format: *hostname.domainname*. For example, *manager.omnia.test* is a valid hostname for the login node. **Note**: To find the cause for the failure of the FreeIPA server and client installation, see *ipaserver-install.log* in the manager node or */var/log/ipaclient-install.log* in the login node.
+
+## Why are inventory details not updated in AWX?
+Potential Cause:
+    The provided device credentials may be invalid.
+Resolution:
+    Manually validate/update the relevant login information on the AWX settings screen and re-run `device_inventory_job`. Optionally, wait 24 hours for the scheduled inventory job to run.
+
+## Why is the host list empty when executing `control_plane.yml`?
+Hosts that are not in DHCP mode do not get populated in the host list when `control_plane.yml` is run.
+
+## Why does the task 'Install Packages' fails on the NFS node with the message: `Failure in talking to yum: Cannot find a valid baseurl for repo: base/7/x86_64.`  
+Potential Cause:
+    There are connections missing on the NFS node.  
+Resolution:
+        Ensure that there are 3 nics being used on the NFS node:
+                1. For provisioning the OS
+                2. For connecting to the internet (Management purposes)
+                3. For connecting to PowerVault (Data Connection)
+
+## What to do if AWX jobs fail with `Error creating pod: container failed to start, ImagePullBackOff`?
+Potential Cause:<br>
+ After running `control_plane.yml`, the AWX image got deleted.<br>
+Resolution:<br>
+    Run the following commands:<br>
+    1. `cd omnia/control_plane/roles/webui_awx/files`
+    2. `buildah bud -t custom-awx-ee awx_ee.yml`
+
+## Why does the task 'control_plane_common: Setting Metric' fail?
+Potential Cause:
+    The device name and connection name listed by the network manager in `/etc/sysconfig/network-scripts/ifcfg-<nic name>` do not match.
+
+Resolution:
+1. Use `nmcli connection` to list all available connections and their attributes.<br>
+    _Expected Output:_<br>
+    ![NMCLI Expected Output](../images/nmcli_output.jpg)
+2. For any connections that have mismatched names and device names, edit the file `/etc/sysconfig/network-scripts/ifcfg-<nic name>` using vi editor.
+
+## Are hosts automatically cleaned up from the AWX UI when re-deploying the cluster? 
+No. Before re-deploying the cluster, users have to manually delete all hosts from the awx UI.
 
 ## Why is the error "Wait for AWX UI to be up" displayed when `control_plane.yml` fails?  
 Potential Causes: 
@@ -115,7 +183,7 @@ Resolution:
 
 ## What to do if jobs hang in 'pending' state on the AWX UI:
 
-Run `kubectl rollout restart deployment awx -n awx` from the management station and try to re-run the job.
+Run `kubectl rollout restart deployment awx -n awx` from the control plane and try to re-run the job.
 
 If the above solution **doesn't work**,
 1. Delete all the inventories, groups and organization from AWX UI.
@@ -123,6 +191,18 @@ If the above solution **doesn't work**,
 3. Delete the file: `omnia/control_plane/roles/webui_awx/files/.tower_cli.cfg`.
 4. Re-run *control_plane.yml*.
   
+## What to do after a control plane reboot?
+1. Once the control plane reboots, wait for 10-15 minutes to allow all k8s pods and services to come up. This can be verified using:
+* `kubectl get pods --all-namespaces`
+2. If the pods do not come up, check `/var/log/omnia.log` for more information.
+3. Cobbler profiles are not persistent across reboots. The latest profile will be available post-reboot based on the values of `provision_os` and `iso_file_path` in `base_vars.yml`. Re-run `control_plane.yml` with different values for `provision_os` and `iso_file_path` to restore the profiles.
+4. Devices that have had their IP assigned dynamically via DHCP may get assigned new IPs. This in turn can cause duplicate entries for the same device on AWX. Clusters may also show inconsistency and ambiguity.
+
+## How to clear existing DHCP leases after a management NIC IP change?
+If `device_config_support` is set to TRUE,
+1. Reboot the ethernet TOR (Top of the Rack) switches in your environment.
+2. If the leases aren't cleared, reboot the devices that have not registered the new IP.
+If `device_config_support` is set to FALSE, no reboots are required.
 
 ## Why is permission denied when executing the `idrac.yml` file or other .yml files from AWX?
 Potential Cause: The "PermissionError: [Errno 13] Permission denied" error is displayed if you have used the ansible-vault decrypt or encrypt commands.  
@@ -137,7 +217,7 @@ It is recommended that the ansible-vault view or edit commands are used and not
 * Launch iDRAC template.
 
 ## What to do if the network CIDR entry of iDRAC IP in /etc/exports file is missing:
-* Add an additional network CIDR range of idrac IPs in the */etc/exports* file if the iDRAC IP is not in the management network range provided in base_vars.yml.
+* Add an additional network CIDR range of iDRAC IPs in the */etc/exports* file if the iDRAC IP is not in the management network range provided in base_vars.yml.
 
 ## What to do if a custom ISO file is not present on the device:
 * Re-run the *control_plane.yml* file.
@@ -199,8 +279,8 @@ Potential Cause: Your Docker pull limit has been exceeded. For more information,
 ## Can Cobbler deploy both Rocky and CentOS at the same time?
 No. During Cobbler based deployment, only one OS is supported at a time. If the user would like to deploy both, please deploy one first, **unmount `/mnt/iso`** and then re-run cobbler for the second OS.
 
-## Why do Firmware Updates fail for some components with Omnia 1.1.1?
-Due to the latest `catalog.xml` file, Firmware updates fail for some components on server models R640 and R740. Omnia execution doesn't get interrupted but an error gets logged. For now, please download those individual updates manually.
+## Why do Firmware Updates fail for some components with Omnia?
+Due to the latest `catalog.xml` file, Firmware updates may fail for certain components. Omnia execution doesn't get interrupted but an error gets logged on AWX. For now, please download those individual updates manually.
 
 ## Why does the Task [network_ib : Authentication failure response] fail with the message 'Status code was -1 and not [302]: Request failed: <urlopen error [Errno 111] Connection refused>' on Infiniband Switches when running `infiniband.yml`?
 To configure a new Infiniband Switch, it is required that HTTP and JSON gateway be enabled. To verify that they are enabled, run:

+ 59 - 0
docs/Troubleshooting/Troubleshooting_Guide.md

@@ -0,0 +1,59 @@
+# Logs Used for Troubleshooting
+
+1. /var/log (Control Plane)
+
+All log files can be viewed via the Dashboard tab (![Dashboard Icon](../Telemetry_Visualization/Images/DashBoardIcon.PNG)). The Default Dashboard displays `omnia.log` and `syslog`. Custom dashboards can be created per user requirements.
+
+Below is a list of all logs available to Loki and can be accessed on the dashboard:
+
+| Name               | Location                                  | Purpose                      | Additional Information                                                                             |
+|--------------------|-------------------------------------------|------------------------------|----------------------------------------------------------------------------------------------------|
+| Omnia Logs         | /var/log/omnia.log                        | Omnia Log                    | This log is configured by Default. This log can be used to track all changes made by Omnia                                                                  |
+| syslogs            | /var/log/messages                         | System Logging               | This log is configured by Default                                                                  |
+| Audit Logs         | /var/log/audit/audit.log                  | All Login Attempts           | This log is configured by Default                                                                  |
+| CRON logs          | /var/log/cron                             | CRON Job Logging             | This log is configured by Default                                                                  |
+| Pods logs          | /var/log/pods/ * / * / * log              | k8s pods                     | This log is configured by Default                                                                  |
+| Access Logs        | /var/log/dirsrv/slapd-<Realm Name>/access | Directory Server Utilization | This log is available when FreeIPA or 389ds is set up ( ie when   enable_security_support is set to 'true') |
+| Error Log          | /var/log/dirsrv/slapd-<Realm Name>/errors | Directory Server Errors      | This log is available when FreeIPA or 389ds is set up ( ie when   enable_security_support is set to 'true') |
+| CA Transaction Log | /var/log/pki/pki-tomcat/ca/transactions   | FreeIPA PKI Transactions     | This log is available when FreeIPA or 389ds is set up ( ie when   enable_security_support is set to 'true') |
+| KRB5KDC            | /var/log/krb5kdc.log                      | KDC Utilization              | This log is available when FreeIPA or 389ds is set up ( ie when   enable_security_support is set to 'true') |
+| Secure logs        | /var/log/secure                           | Login Error Codes            | This log is available when FreeIPA or 389ds is set up ( ie when   enable_security_support is set to 'true') |
+| HTTPD logs         | /var/log/httpd/ *                         | FreeIPA API Calls            | This log is available when FreeIPA or 389ds is set up ( ie when   enable_security_support is set to 'true') |
+| DNF logs           | /var/log/dnf.log                          | Installation Logs            | This log is configured on Rocky OS                                                                 |
+| Zypper Logs        | /var/log/zypper.log                       | Installation Logs            | This log is configured on Leap OS                                                                  |
+
+
+2. Checking logs of individual containers:
+   1. A list of namespaces and their corresponding pods can be obtained using:
+      `kubectl get pods -A`
+   2. Get a list of containers for the pod in question using:
+      `kubectl get pods <pod_name> -o jsonpath='{.spec.containers[*].name}'`
+   3. Once you have the namespace, pod and container names, run the below command to get the required logs:
+      `kubectl logs pod <pod_name> -n <namespace> -c <container_name>`
+
+
+3. Connecting to internal databases:
+* TimescaleDB
+	* Go inside the pod: `kubectl exec -it pod/timescaledb-0 -n telemetry-and-visualizations -- /bin/bash`
+	* Connect to psql: `psql -U <postgres_username>`
+	* Connect to database: `\c  < timescaledb_name >`
+* MySQL DB
+	* Go inside the pod: `kubectl exec -it pod/mysqldb-0  -n telemetry-and-visualizations -- /bin/bash`
+	* Connect to psql: `psql -U <mysqldb_username> -p <mysqldb_password>`
+	* Connect to database: `USE <mysqldb_name>`
+
+4. Checking and updating encrypted parameters:
+   1. Move to the filepath where the parameters are saved (as an example, we will be using `login_vars.yml`):
+      `cd control_plane/input_params`
+   2. To view the encrypted parameters:
+   `ansible-vault view login_vars.yml --vault-password-file .login_vault_key`
+   3. To edit the encrypted parameters:
+    `ansible-vault edit login_vars.yml --vault-password-file .login_vault_key`
+5. Checking pod status on the control plane
+    * Select the pod you need to troubleshoot from the output of `kubectl get pods -A`
+    * Check the status of the pod by running `kubectl describe pod <pod name> -n <namespace name>`
+
+
+
+
+

文件差异内容过多而无法显示
+ 0 - 70
docs/control_plane/device_templates/CONFIGURE_INFINIBAND_SWITCHES.md


文件差异内容过多而无法显示
+ 0 - 43
docs/control_plane/device_templates/CONFIGURE_POWERSWITCHES.md


文件差异内容过多而无法显示
+ 0 - 42
docs/control_plane/device_templates/CONFIGURE_POWERVAULT_STORAGE.md


+ 0 - 146
docs/control_plane/device_templates/PROVISION_SERVERS.md

@@ -1,146 +0,0 @@
-# Custom ISO provisioning on Dell EMC PowerEdge Servers
-
-## Update the input parameters
-
-Edit the following files under the `control_plane/input_params` directory to provide the required input parameters.
-1. Edit the `login_vars.yml` file to enter the following details:  
-	a. `provision_password`- password used while provisioning OS on bare metal servers.  
-	b. `cobbler_password`- password for Cobbler.    
-	c. `idrac_username` and `idrac_password`- iDRAC username and password.   
-	**NOTE**: Minimum length of the password must be at least eight characters and a maximum of 30 characters. Do not use these characters while entering a password: -, \\, "", and \'
-2. Edit the following variables in the `idrac_vars.yml` file.  
-
-	File name	|	Variables</br> [Required/ Optional]	|	Default, choices	|	Description
-	-------	|	----------------	|	-----------------	|	-----------------
-	idrac_vars.yml	|	idrac_system_profile</br> [Required]	|	<ul><li>**Performance**</li> <li>PerformancePerWatt(DAPC)</li> <li>PerformancePerWatt(OS)</li> <li>WorkstationPerformance</li></ul>	|	The system profile used for BIOS configuration. 
-	<br>	|	firmware_update_required</br> [Required]	|	<ul><li>**false**</li> <li>true</li></ul>	|	By default, Omnia updates the firmware on the servers. To disable the firmware update, set the variable to "false".
-	<br>	|	poweredge_model</br> [Required if "firmware_update_required" is set to "true"]	|	<ul><li>**C6420**</li> <li>R640</li><li>R740</li><li>C4140</li> <li>And other supported PowerEdge servers</li></ul>	|	Enter the required PowerEdge server models to update the firmware. For example, enter `R640,R740,C4140` to update firmware on these models of PowerEdge servers. For a complete list of supported PowerEdge servers, see the *Hardware managed by Omnia* section in the Readme file.
-	<br>	|	uefi_secure_boot</br> [Optional]	|	<ul><li>**disabled**</li> <li>enabled</li></ul>	|	Option to enable or disable the secure boot mode.
-	<br>	|	system_lockdown</br> [Optional]	|	<ul><li>**disabled**</li> <li>enabled</li></ul>	|	Option to enable or disable system lockdown.
-	<br>	|	two_factor_authentication</br> [Optional]	|	<ul><li>**disabled**</li> <li>enabled</li></ul>	|	Option to enable or disable the 2FA on iDRAC.</br> If enabled, update the required variables in the `idrac_tools_vars.yml` file.</br> **[WARNING]**: For the other iDRAC playbooks to run, you must manually disable 2FA by setting the *Easy 2FA State* to "Disabled" in the iDRAC settings.
-	<br>	|	ldap_directory_services</br> [Optional]	|	<ul><li>**disabled**</li> <li>enabled</li></ul>	|	Option to enable or disable the LDAP directory services on iDRAC.</br> If enabled, update the required variables in the `idrac_tools_vars.yml` file.
-
-## Custom ISO file creation for Out-of-band server management
-Omnia role used to create the custom ISO: *control_plane_customiso*  
-Based on the inputs provided in the `login_vars.yml` and `base_vars.yml` files, the Kickstart file is configured and added to the custom ISO file. The *unattended_centos7.iso*, *unattended_rocky8.iso* or *unattended_leap15.iso* file is copied to an NFS share on the management station to provision the PowerEdge servers using iDRAC. 
-
-## Provisioning of PowerEdge Servers using iDRAC (Out-of-band server management)
-
-### Run idrac_template on the AWX UI.
-1. Run `kubectl get svc -n awx`.
-2. Copy the Cluster-IP address of the awx-ui. 
-3. To retrieve the AWX UI password, run `kubectl get secret awx-admin-password -n awx -o jsonpath="{.data.password}" | base64 --decode`.
-4. Open the default web browser on the management station and enter `http://<IP>:8052`, where IP is the awx-ui IP address and 8052 is the awx-ui port number. Log in to the AWX UI using the username as `admin` and the retrieved password.  
-5. Under __RESOURCES__ -> __Templates__, launch the **idrac_template**.
-
-Omnia role used to provision custom ISO on PowerEdge Servers using iDRAC: *provision_idrac*  
-
-For the `idrac.yml` file to successfully provision the custom ISO on the PowerEdge Servers, ensure that the following prerequisites are met:
-* The **idrac_inventory** file is updated with the iDRAC IP addresses.
-* Required input parameters are updated in the **idrac_vars.yml** file under **omnia/control_plane/input_params** directory.
-* An *unattended_centos7.iso*, *unattended_rocky8.iso* or *unattended_leap15.iso* file is available in an NFS path.
-* The Lifecycle Controller Remote Services of PowerEdge Servers is in the 'ready' state.
-* The Redfish services are enabled in the iDRAC settings under **Services**.
-* The PowerEdge Servers have the iDRAC Enterprise or Datacenter license. If the license is not found, servers will be PXE booted and provisioned using Cobbler.  
-* If `provision_method` is set to PXE in `base_vars.yml`, ensure that all PXE devices have a configured, active NIC. To verify/ configure NIC availability: On the server, go to `BIOS Setup -> Network Settings -> PXE Device`. For each listed device (typically 4), configure/ check for an active NIC under `PXE device settings`
-* iDRAC 9 based Dell EMC PowerEdge Servers with firmware versions 5.00.10.20 and above. (With the latest BIOS available)
-
-The **provision_idrac** file configures and validates the following:
-* Required input parameters and prerequisites.
-* BIOS and SNMP settings.
-* The latest available version of the iDRAC firmware is updated.
-* If bare metal servers have a RAID controller installed, Virtual disks are created for RAID configuration.
-* Availability of iDRAC Enterprise or Datacenter License on iDRAC.  
-
-After the configurations are validated, the **provision_idrac** file provisions the custom ISO on the PowerEdge Servers. After the OS is provisioned successfully, iDRAC IP addresses are updated in the *provisioned_idrac_inventory* in AWX.
-
->>**NOTE**: The `idrac.yml` file initiates the provisioning of custom ISO on the PowerEdge servers. Wait for some time for the node inventory to be updated on the AWX UI. 
-
-### Provisioning newly added PowerEdge servers in the cluster
-To provision newly added servers, wait till the iDRAC IP addresses are automatically added to the *idrac_inventory*. After the iDRAC IP addresses are added, launch the iDRAC template on the AWX UI to provision CentOS custom OS on the servers.  
-
-If you want to reprovision all the servers in the cluster or any of the faulty servers, you must remove the respective iDRAC IP addresses from *provisioned_idrac_inventory* on AWX UI and then launch the iDRAC template. If required, you can delete the *provisioned_idrac_inventory* from the AWX UI to remove the IP addresses of provisioned servers. After the servers are provisioned, *provisioned_idrac_inventory* is created and updated on the AWX UI.
-
-## OS provisioning on PowerEdge Servers using Cobbler on the host network  
-
-Omnia role used: *provision_cobbler*  
-Ports used by Cobbler:  
-* TCP ports: 69,8000, 8008
-* UDP ports: 69,4011
-
-To create the Cobbler image, Omnia configures the following:
-* Firewall settings.
-* The kickstart file of Cobbler to enable the UEFI PXE boot.
-
-To access the Cobbler dashboard, enter `https://<IP>/cobbler_web` where `<IP>` is the Global IP address of the management station. For example, enter
-`https://100.98.24.225/cobbler_web` to access the Cobbler dashboard.
-
->>__Note__: After the Cobbler Server provisions the operating system on the servers, IP addresses and hostnames are assigned by the DHCP service.  
->>* If a mapping file is not provided, the hostname to the server is provided based on the following format: **computexxx-xxx** where "xxx-xxx" is the last two octets of the Host IP address. For example, if the Host IP address is 172.17.0.11 then the assigned hostname by Omnia is compute0-11.  
->>* If a mapping file is provided, the hostnames follow the format provided in the mapping file.  
-
->>__Note__: If you want to add more nodes, append the new nodes in the existing mapping file. However, do not modify the previous nodes in the mapping file as it may impact the existing cluster.
-
->> __Note__: With the addition of Multiple profiles, the cobbler container dynamically updates the mount point based on the value of `provision_os` in `base_vars.yml`.
-
-### DHCP routing using Cobbler
-Omnia now supports DHCP routing via Cobbler. To enable routing, update the `primary_dns` and `secondary_dns` in `base_vars` with the appropriate IPs (hostnames are currently not supported). For compute nodes that are not directly connected to the internet (ie only host network is configured), this configuration allows for internet connectivity.
-
-## Security enhancements  
-Omnia provides the following options to enhance security on the provisioned PowerEdge servers:
-* **System lockdown mode**: To enable the system lockdown mode on iDRAC, set the *system_lockdown* variable to "enabled" in the `idrac_vars.yml` file.
-* **Secure boot mode**: To enable the secure boot mode on iDRAC, set the *uefi_secure_boot* variable to "enabled" in the `idrac_vars.yml` file.
-* **2-factor authentication (2FA)**: To enable the 2FA on iDRAC, set the *two_factor_authentication* variable to "enabled" in the `idrac_vars.yml` file.  
-	
-	**WARNING**: If 2FA is enabled on iDRAC, you must manually disable 2FA on iDRAC by setting the *Easy 2FA State* to "Disabled" for the user specified in the `login_vars.yml` file to run other iDRAC playbooks. 
-	
-	Before executing the **idrac_2fa.yml**, you must edit the `idrac_tools_vars.yml` by running the following command: `ansible-vault edit idrac_tools_vars.yml --vault-password-file .idrac_vault_key`.   
-	
-	Provide the following details in the **idrac_2fa.yml** file.  
-	
-	File name	|	Variables</br> [Required if two_factor_authentication is enabled/ Optional]	|	Default, choices	|	Description
-	-------	|	----------------	|	-----------------	|	-----------------
-	idrac_2fa.yml	|	dns_domain_name</br> [Required]	|		|	DNS domain name to be set for iDRAC. 
-	<br>	|	ipv4_static_dns1, ipv4_static_dns2</br> [Required] 	|		|	DNS1 and DNS2 static IPv4 addresses.
-	<br>	|	smtp_server_ip</br> [Required]	|		|	Server IP address used for SMTP.
-	<br>	|	use_email_address_2fa</br> [Required]	|		|	Email address used for enabling 2FA. After 2FA is enabled, an authentication code is sent to the provided email address. 
-	<br>	| smtp_authentication [Required]	| <ul> <li>__Disabled__</li> <li>Enabled </li> </ul> | Enable SMTP authentication 
-	<br>	|	smtp_username</br> [Optional]	|		|	Username for SMTP.
-	<br>	|	smtp_password</br> [Optional]	|		|	Password for SMTP.
-
-	>>**NOTE**: 2FA will be enabled on the iDRAC only if SMTP server details are valid and a test email notification is working using SMTP.  
-* **LDAP Directory Services**: To enable or disable the LDAP directory services, set the *ldap_directory_services* variable to "enabled" in the `idrac_vars.yml` file.  
-
-	Before executing the **idrac_ldap.yml** file, you must edit `idrac_tools_vars.yml` by running the following command: `ansible-vault edit idrac_tools_vars.yml --vault-password-file .idrac_vault_key`.  
-	
-	Provide the following values in the **idrac_ldap.yml** file.  
-
-	File name	|	Variables</br> [Required if ldap_directory_services is enabled/ Optional]	|	Default, choices	|	Description
-	-------	|	----------------	|	-----------------	|	-----------------
-	idrac_ldap.yml	|	cert_validation_enable</br> [Required]	|	<ul><li>**disabled**</li></ul>	|	This option will be disabled by default. If required, you must manually upload the CA certificate.
-	<br>	|	ldap_server_address</br> [Required] 	|		|	Server address used for LDAP.
-	<br>	|	ldap_port</br> [Required]	|	<ul><li>636</li></ul>	|	TCP port at which the LDAP server is listening for connections.
-	<br>	|	bind_dn</br> [Optional]	|		|	Distinguished Name of the node in your directory tree from which records are searched.
-	<br>	|	bind_password</br> [Optional]	|		|	Password used for "bind_dn".
-	<br>	|	base_dn</br> [Required]	|		|	Distinguished Name of the search base.
-	<br>	|	user_attribute</br> [Optional]	|		|	User attribute used for searching in LDAP server.
-	<br>	|	group_attribute</br> [Optional]	|		|	Group attribute used for searching in LDAP server.
-	<br>	|	group_attribute_is_dn</br> [Required]	|	<ul><li>**enabled**</li> <li>disabled</li></ul>	|	Specify whether the group attribute type is DN or not.
-	<br>	|	search_filter</br> [Optional]	|		|	Search scope is related to the Base DN. 
-	<br>	|	role_group1_dn</br> [Required]	|		|	DN of LDAP group to be added.
-	<br>	|	role_group1_privilege</br> [Required]	|	<ul><li>**Administrator**</li><li>Operator</li><li>ReadOnly</li></ul>	|	Privilege to LDAP role group 1.  
-	
-	To view the `idrac_tools_vars.yml` file, run the following command: `ansible-vault view idrac_tools_vars.yml --vault-password-file .idrac_vault_key`  
-	
-	>>**NOTE**: It is suggested that you use the ansible-vault view or edit commands and that you do not use the ansible-vault decrypt or encrypt commands. If you have used the ansible-vault decrypt or encrypt commands, provide 644 permission to `idrac_tools_vars.yml`.  
-
-On the AWX Dashboard, select the respective security requirement playbook and launch the iDRAC template by performing the following steps.
-1. On the AWX Dashboard, under __RESOURCES__ -> __Templates__, select the **idrac_template**.
-2. Under the **Details** tab, click **Edit**.
-3. In the **Edit Details** page, click the **Playbook** drop-down menu and select **tools/idrac_system_lockdown.yml**, **tools/idrac_secure_boot.yml**, **tools/idrac_2fa.yml**, or **tools/idrac_ldap.yml**.
-4. Click **Save**.
-5. To launch the iDRAC template with the respective playbook selected, click **Launch**.  
-
- 
-
-

文件差异内容过多而无法显示
+ 0 - 37
docs/control_plane/input_parameters/INFINIBAND_SWITCHES.md


文件差异内容过多而无法显示
+ 0 - 76
docs/control_plane/input_parameters/POWERSWITCHES.md


文件差异内容过多而无法显示
+ 0 - 37
docs/control_plane/input_parameters/POWERVAULT_STORAGE.md


+ 0 - 25
docs/control_plane/input_parameters/PROVISION_SERVERS.md

@@ -1,25 +0,0 @@
-# Dell EMC PowerEdge Servers
-
-## Update the input parameters
-
-Edit the following files under the `control_plane/input_params` directory to provide the required input parameters.
-1. Edit the `login_vars.yml` file to enter the following details:  
-	a. `provision_password`- password used while provisioning OS on bare metal servers.  
-	b. `cobbler_password`- password for Cobbler.    
-	c. `idrac_username` and `idrac_password`- iDRAC username and password.   
->>	**NOTE**: Minimum length of the password must be at least eight characters and a maximum of 30 characters. Do not use these characters while entering a password: -, \\, "", and \'
-2. Edit the following variables in the `idrac_vars.yml` file.  
-
-	File name	|	Variables</br> [Required/ Optional]	|	Default, choices	|	Description
-	-------	|	----------------	|	-----------------	|	-----------------
-	idrac_vars.yml	|	idrac_system_profile</br> [Required]	|	<ul><li>**Performance**</li> <li>PerformancePerWatt(DAPC)</li> <li>PerformancePerWatt(OS)</li> <li>WorkstationPerformance</li></ul>	|	The system profile used for BIOS configuration. 
-	<br>	|	firmware_update_required</br> [Required]	|	<ul><li>**false**</li> <li>true</li></ul>	|	By default, Omnia updates the firmware on the servers. To disable the firmware update, set the variable to "false".
-	<br>	|	poweredge_model</br> [Required if "firmware_update_required" is set to "true"]	|	<ul><li>**C6420**</li> <li>R640</li><li>R740</li><li>C4140</li> <li>And other supported PowerEdge servers</li></ul>	|	Enter the required PowerEdge server models to update the firmware. For example, enter `R640,R740,C4140` to update firmware on these models of PowerEdge servers. For a complete list of supported PowerEdge servers, see the *Hardware managed by Omnia* section in the Readme file.
-	<br>	|	uefi_secure_boot</br> [Optional]	|	<ul><li>**disabled**</li> <li>enabled</li></ul>	|	Option to enable or disable the secure boot mode.
-	<br>	|	system_lockdown</br> [Optional]	|	<ul><li>**disabled**</li> <li>enabled</li></ul>	|	Option to enable or disable system lockdown.
-	<br>	|	two_factor_authentication</br> [Optional]	|	<ul><li>**disabled**</li> <li>enabled</li></ul>	|	Option to enable or disable the 2FA on iDRAC.</br> If enabled, update the required variables in the `idrac_tools_vars.yml` file.</br> **[WARNING]**: For the other iDRAC playbooks to run, you must manually disable 2FA by setting the *Easy 2FA State* to "Disabled" in the iDRAC settings.
-	<br>	|	ldap_directory_services</br> [Optional]	|	<ul><li>**disabled**</li> <li>enabled</li></ul>	|	Option to enable or disable the LDAP directory services on iDRAC.</br> If enabled, update the required variables in the `idrac_tools_vars.yml` file.
-
-## Deploy Omnia Control Plane
-Before you provision the Dell EMC PowerEdge Servers, you must complete the deployment of Omnia control plane. Go to Step 8 in the [Steps to install the Omnia Control Plane](../../INSTALL_OMNIA_CONTROL_PLANE.md#steps-to-deploy-the-omnia-control-plane) file to run the `ansible-playbook control_plane.yml` file.
-

二进制
docs/images/nmcli_output.jpg