Browse Source

Merge pull request #474 from cgoveas/devel

#473 : Documentation out of date
Lucas A. Wilson 3 years ago
parent
commit
72bff7b781

+ 5 - 4
docs/INSTALL_OMNIA.md

@@ -9,7 +9,7 @@ To install the Omnia control plane and manage workloads on your cluster using th
 * If you have configured the `omnia_config.yml` file to enable the login node, the login node must be part of the cluster. 
 * All nodes must be connected to the network and must have access to the Internet.
 * Set the hostnames of all the nodes in the cluster.
-	* If the login node is enabled, then set the hostnames in the format: __hostname.domainname__. For example, "manager.example.com" is a valid hostname.
+	* If the login node is enabled, then set the hostnames in the format: __hostname.domainname__. For example, "manager.omnia.test" is a valid hostname.
 	* Include the hostnames under /etc/hosts in the format: </br>*ipaddress hostname.domainname*. For example, "192.168.12.1 manager.example.com" is a valid entry.
 * SSH Keys for root are installed on all nodes to allow for password-less SSH.
 * The user should have root privileges to perform installations and configurations.
@@ -111,9 +111,10 @@ __Note:__
 * Supported values for Kubernetes CNI are calico and flannel. The default value of CNI considered by Omnia is calico. 
 * The default value of Kubernetes Pod Network CIDR is 10.244.0.0/16. If 10.244.0.0/16 is already in use within your network, select a different Pod Network CIDR. For more information, see __https://docs.projectcalico.org/getting-started/kubernetes/quickstart__.
 
-**NOTE**: If you want to view or edit the `omnia_config.yml` file, run the following commands:
-1. `cd input_params`
-2. `ansible-vault view omnia_config.yml --vault-password-file .vault_key` or `ansible-vault edit omnia_config.yml --vault-password-file .vault_key`.  
+**NOTE**: If you want to view or edit the `omnia_config.yml` file, run the following command:  
+					
+- `ansible-vault view omnia_config.yml --vault-password-file .omnia_vault_key` -- To view the file. 
+- `ansible-vault edit omnia_config.yml --vault-password-file .omnia_vault_key` -- To edit the file.  
 
 **NOTE**: It is suggested that you use the ansible-vault view or edit commands and that you do not use the ansible-vault decrypt or encrypt commands. If you have used the ansible-vault decrypt or encrypt commands, provide 644 permission to `omnia_config.yml`.  
 

+ 9 - 5
docs/INSTALL_OMNIA_CONTROL_PLANE.md

@@ -24,7 +24,8 @@ Depending on the pass-through switch configured in your HPC environment, the num
 
 ## Prerequisites to install the Omnia Control Plane version 1.1
 * Ensure that a stable Internet connection is available on management station, manager node, login node, and compute nodes. 
-* CentOS 8.4 is installed on the management station.
+* CentOS 8.4 is installed on the management station.  
+* If the login node is enabled, then set the hostnames in the format: __hostname.domainname__. For example, "manager.omnia.test" is a valid hostname.		 
 * To provision the bare metal servers, go to http://isoredirect.centos.org/centos/7/isos/x86_64/ and download the **CentOS-7-x86_64-Minimal-2009** ISO file.
 * For DHCP configuration, you can provide a host mapping file. If the mapping file is not provided and the variable is left blank, a default mapping file will be created. The provided details must be in the format: MAC address, Hostname, IP address, Component_role. For example, `10:11:12:13,server1,100.96.20.66,compute` and  `14:15:16:17,server2,100.96.22.199,manager` are valid entries.  
 __Note:__  
@@ -146,7 +147,7 @@ Omnia creates a log file which is available at: `/var/log/omnia.log`.
 
 **NOTE**: If you want to view or edit the *login_vars.yml* file, run the following commands:
 1. `cd input_params`
-2. `ansible-vault view login_vars.yml --vault-password-file .vault_key` or `ansible-vault edit login_vars.yml --vault-password-file .vault_key`.  
+2. `ansible-vault view login_vars.yml --vault-password-file .login_vault_key` or `ansible-vault edit login_vars.yml --vault-password-file .login_vault_key`.   
 
 **NOTE**: It is suggested that you use the ansible-vault view or edit commands and that you do not use the ansible-vault decrypt or encrypt commands. If you have used the ansible-vault decrypt or encrypt commands, provide 644 permission to *login_vars.yml*.
 
@@ -179,9 +180,9 @@ For Omnia to configure the devices and to provision the bare metal servers which
 
 # Assign component roles using AWX UI
 1. Run `kubectl get svc -n awx`.
-2. Copy the Cluster-IP address of the awx-service. 
+2. Copy the Cluster-IP address of the awx-ui.  
 3. To retrieve the AWX UI password, run `kubectl get secret awx-admin-password -n awx -o jsonpath="{.data.password}" | base64 --decode`.
-4. Open the default web browser on the management station and enter the awx-service IP address. Log in to the AWX UI using the username as `admin` and the retrieved password.
+4. Open the default web browser on the management station and enter `http://<IP>:8052`, where IP is the awx-ui IP address and 8052 is the awx-ui port number. Log in to the AWX UI using the username as `admin` and the retrieved password.  
 5. On the AWX dashboard, under __RESOURCES__ __->__ __Inventories__, select **node_inventory**.
 6. Select the **Hosts** tab.
 7. To add hosts to the groups, click **+**. 
@@ -219,8 +220,11 @@ To install __JupyterHub__ and __Kubeflow__ playbooks:
 __Note:__ When the Internet connectivity is unstable or slow, it may take more time to pull the images to create the Kubeflow containers. If the time limit is exceeded, the **Apply Kubeflow configurations** task may fail. To resolve this issue, you must redeploy Kubernetes cluster and reinstall Kubeflow by completing the following steps:
 1. Complete the PXE booting of the head and compute nodes.
 2. In the `omnia_config.yml` file, change the k8s_cni variable value from calico to flannel.
-3. Run the Kubernetes and Kubeflow playbooks.
+3. Run the Kubernetes and Kubeflow playbooks.  
 
+**NOTE**: If you want to view or edit the `omnia_config.yml` file, run the following command:  
+- `ansible-vault view omnia_config.yml --vault-password-file .omnia_vault_key` -- To view the file. 
+- `ansible-vault edit omnia_config.yml --vault-password-file .omnia_vault_key` -- To edit the file.  
 ## Roles assigned to the compute and manager groups
 After **DeployOmnia** template is run from the AWX UI, the **omnia.yml** file installs Kubernetes and Slurm, or either Kubernetes or Slurm, as per the selection in the template on the management station. Additionally, appropriate roles are assigned to the compute and manager groups.
 

+ 8 - 1
docs/README.md

@@ -172,8 +172,15 @@ stp_rpvst_default_behaviour	|	boolean: false, true	|	Configures RPVST default be
 	* `systemctl restart kubelet`  
 	
 * **Issue**: If control_plane.yml fails at the webui_awx role, then the previous IP address and password are not cleared when control_plane.yml is re-run.   
-	**Resolution**: In the *webui_awx/files* directory, delete the *.tower_cli.cfg* and *.tower_vault_key* files, and then re-run `control_plane.yml`.
+	**Resolution**: In the *webui_awx/files* directory, delete the *.tower_cli.cfg* and *.tower_vault_key* files, and then re-run `control_plane.yml`.  
 
+* **Issue**: The FreeIPA server and client installation fails.  
+	**Cause**: The hostnames of the manager and login nodes are not set in the correct format.  
+	**Resolution**: If you have enabled the option to install the login node in the cluster, set the hostnames of the nodes in the format: *hostname.domainname*. For example, *manager.omnia.test* is a valid hostname for the login node. **Note**: To find the cause for the failure of the FreeIPA server and client installation, see *ipaserver-install.log* in the manager node or */var/log/ipaclient-install.log* in the login node.  
+	
+* **Issue**: The inventoy details are not updated in AWX when device or host credentials are invalid.  
+	**Resolution**: Provide valid credentials of the devices and hosts in the cluster.  
+	
 # [Frequently asked questions](FAQ.md)
 
 # Limitations

File diff suppressed because it is too large
+ 4 - 3
docs/control_plane/device_templates/CONFIGURE_INFINIBAND_SWITCHES.md


File diff suppressed because it is too large
+ 16 - 5
docs/control_plane/device_templates/CONFIGURE_POWERSWITCHES.md


+ 6 - 6
docs/control_plane/device_templates/CONFIGURE_POWERVAULT_STORAGE.md

@@ -26,16 +26,16 @@ Under the `control_plane/input_params` directory, edit the following files:
 	powervault_me4_k8s_volume_name [Required] |	<ul><li>**k8s_volume**</li><li>User-defined name</li></ul> |	Enter the Kubernetes volume name.	
 	powervault_me4_slurm_volume_name [Required] |	<ul><li>**slurm_volume**</li><li>User-defined name</li></ul> |	Enter the Slurm volume name.
 	powervault_me4_disk_group_name |	<ul><li>**omnia**</li><li>User-defined name</li></ul> |	Enter the group name of the disk.
-	powervault_me4_disk_partition_size [Required] |	<ul><li>**5**</li><li>Any value between 0-99</li></ul> |	Enter the partition size which would be used as an NFS share.  
-	powervault_me4_volume_size [Required] |	<ul><li>**100GB**</li><li>Custom value</li></ul> |	Enter the volume size in the format *SizeGB*.  
-	powervault_me4_pool [Required] |	<ul><li>**a**</li><li>b (or B)</li></ul> |	Enter the pool for the volume.  
-	powervault_me4_server_nic [Required] |	<ul><li>**eno1**</li></ul> |	Enter the NIC of the server to which the PowerVault Storage is connected.  
+	powervault_me4_disk_partition_size [Required] |	<ul><li>**5**</li><li>Any value between 5-99</li></ul> |	Enter the partition size which would be used as an NFS share.  
+	powervault_me4_volume_size [Required] |	<ul><li>**100GB**</li><li>Custom value</li></ul> |	Enter the volume size in the format: *SizeTB*, *SizeGB*, *SizeMB*, or *SizeB*.  
+	powervault_me4_pool [Required] |	<ul><li>**a** (or A)</li><li>b (or B)</li></ul> |	Enter the pool for the volume.  
+	powervault_me4_server_nic [Required] |	<ul><li>**em1**</li></ul> |	Enter the NIC of the server to which the PowerVault Storage is connected.    
 	
 ## Configuring PowerVault Storage
 
 ### Run ME4_template on the AWX UI.
 1. Run `kubectl get svc -n awx`.
-2. Copy the Cluster-IP address of the awx-service. 
+2. Copy the Cluster-IP address of the awx-ui. 
 3. To retrieve the AWX UI password, run `kubectl get secret awx-admin-password -n awx -o jsonpath="{.data.password}" | base64 --decode`.
-4. Open the default web browser on the management station and enter the awx-service IP address. Log in to the AWX UI using the username as `admin` and the retrieved password.
+4. Open the default web browser on the management station and enter `http://<IP>:8052`, where IP is the awx-ui IP address and 8052 is the awx-ui port number. Log in to the AWX UI using the username as `admin` and the retrieved password.  
 5. Under __RESOURCES__ -> __Templates__, launch the **powervault_me4_template**.

+ 3 - 3
docs/control_plane/device_templates/PROVISION_SERVERS.md

@@ -28,10 +28,10 @@ Based on the inputs provided in the `login_vars.yml` and `base_vars.yml` files,
 
 ### Run idrac_template on the AWX UI.
 1. Run `kubectl get svc -n awx`.
-2. Copy the Cluster-IP address of the awx-service. 
+2. Copy the Cluster-IP address of the awx-ui. 
 3. To retrieve the AWX UI password, run `kubectl get secret awx-admin-password -n awx -o jsonpath="{.data.password}" | base64 --decode`.
-4. Open the default web browser on the management station and enter the awx-service IP address. Log in to the AWX UI using the username as `admin` and the retrieved password.
-5. Under __RESOURCES__ -> __Templates__, launch the **idrac_template**.
+4. Open the default web browser on the management station and enter `http://<IP>:8052`, where IP is the awx-ui IP address and 8052 is the awx-ui port number. Log in to the AWX UI using the username as `admin` and the retrieved password.  
+5. Under __RESOURCES__ -> __Templates__, launch the **idrac_template**.  
 
 Omnia role used to provision custom ISO on PowerEdge Servers using iDRAC: *provision_idrac*  
 

File diff suppressed because it is too large
+ 3 - 1
docs/control_plane/input_parameters/INFINIBAND_SWITCHES.md


File diff suppressed because it is too large
+ 14 - 3
docs/control_plane/input_parameters/POWERSWITCHES.md


+ 4 - 4
docs/control_plane/input_parameters/POWERVAULT_STORAGE.md

@@ -26,10 +26,10 @@ Under the `control_plane/input_params` directory, edit the following files:
 	powervault_me4_k8s_volume_name [Required] |	<ul><li>**k8s_volume**</li><li>User-defined name</li></ul> |	Enter the Kubernetes volume name.	
 	powervault_me4_slurm_volume_name [Required] |	<ul><li>**slurm_volume**</li><li>User-defined name</li></ul> |	Enter the Slurm volume name.
 	powervault_me4_disk_group_name |	<ul><li>**omnia**</li><li>User-defined name</li></ul> |	Enter the group name of the disk.
-	powervault_me4_disk_partition_size [Required] |	<ul><li>**5**</li><li>Any value between 0-99</li></ul> |	Enter the partition size which would be used as an NFS share.  
-	powervault_me4_volume_size [Required] |	<ul><li>**100GB**</li><li>Custom value</li></ul> |	Enter the volume size in the format *SizeGB*.  
-	powervault_me4_pool [Required] |	<ul><li>**a**</li><li>b (or B)</li></ul> |	Enter the pool for the volume.  
-	powervault_me4_server_nic [Required] |	<ul><li>**eno1**</li></ul> |	Enter the NIC of the server to which the PowerVault Storage is connected.  
+	powervault_me4_disk_partition_size [Required] |	<ul><li>**5**</li><li>Any value between 5-99</li></ul> |	Enter the partition size which would be used as an NFS share.  
+	powervault_me4_volume_size [Required] |	<ul><li>**100GB**</li><li>Custom value</li></ul> |	Enter the volume size in the format: *SizeTB*, *SizeGB*, *SizeMB*, or *SizeB*.  
+	powervault_me4_pool [Required] |	<ul><li>**a** (or A)</li><li>b (or B)</li></ul> |	Enter the pool for the volume.  
+	powervault_me4_server_nic [Required] |	<ul><li>**em1**</li></ul> |	Enter the NIC of the server to which the PowerVault Storage is connected.   
 	
 ## Deploy Omnia Control Plane
 Before you configure the PowerVault Storage devices, you must complete the deployment of Omnia control plane. Go to Step 8 in the [Steps to install the Omnia Control Plane](../../INSTALL_OMNIA_CONTROL_PLANE.md#steps-to-deploy-the-omnia-control-plane) file to run the `ansible-playbook control_plane.yml` file.