Browse Source

Updating Docs

Signed-off-by: cgoveas <cassandra.goveas@dell.com>
cgoveas 3 năm trước cách đây
mục cha
commit
fb71425abe

+ 2 - 1
docs/PreRequisites/Omnia_Control_Plane_PreReqs.md

@@ -1,6 +1,7 @@
 # Pre-requisites Before Running Control Plane
 * Ensure that a stable Internet connection is available on control plane.
-* Rocky 8 is installed on the control plane. 		 
+* Rocky 8 is installed on the control plane.
+* Ensure that the root partition (/) has a minimum of 50% (~35G) free space. 
 * To provision the bare metal servers, download one of the following ISOs for deployment:
     1. [Leap 15.3](https://get.opensuse.org/leap/)
     2. [Rocky 8](https://rockylinux.org/)

+ 10 - 2
docs/Troubleshooting/FAQ.md

@@ -50,12 +50,20 @@ Resolution:
 
 ## What to do if AWX jobs fail with `Error creating pod: container failed to start, ImagePullBackOff`?
 Potential Cause:<br>
- After running `control_plane.yml`, the AWX image got deleted.<br>
+ After running `control_plane.yml`, the AWX image got deleted due to space considerations (use `df -h` to diagnose the issue.).<br>
 Resolution:<br>
-    Run the following commands:<br>
+    Delete unnecessary files from the partition`` and then run the following commands:<br>
     1. `cd omnia/control_plane/roles/webui_awx/files`
     2. `buildah bud -t custom-awx-ee awx_ee.yml`
 
+## Why do pods and images appear to get deleted automatically?
+Potential Cause: <br>
+Lack of space in the root partition (/) causes Linux to clear files automatically (Use `df -h` to diagnose the issue).<br>
+Resolution:
+* Delete large, unused files to clear the root partition. Before running Omnia Control Plane, it is recommended to have a minimum of 50% free space in the root partition.
+* Once the partition is cleared, run `kubeadm reset -f`
+* Re-run `control_plane.yml`
+
 ## Why does the task 'control_plane_common: Setting Metric' fail?
 Potential Cause:
     The device name and connection name listed by the network manager in `/etc/sysconfig/network-scripts/ifcfg-<nic name>` do not match.

+ 5 - 5
docs/Troubleshooting/Troubleshooting_Guide.md

@@ -1,6 +1,6 @@
 # Logs Used for Troubleshooting
 
-1. /var/log (Control Plane)
+## 1. /var/log (Control Plane)
 
 All log files can be viewed via the Dashboard tab (![Dashboard Icon](../Telemetry_Visualization/Images/DashBoardIcon.PNG)). The Default Dashboard displays `omnia.log` and `syslog`. Custom dashboards can be created per user requirements.
 
@@ -23,7 +23,7 @@ Below is a list of all logs available to Loki and can be accessed on the dashboa
 | Zypper Logs        | /var/log/zypper.log                       | Installation Logs            | This log is configured on Leap OS                                                                  |
 
 
-2. Checking logs of individual containers:
+## 2. Checking logs of individual containers:
    1. A list of namespaces and their corresponding pods can be obtained using:
       `kubectl get pods -A`
    2. Get a list of containers for the pod in question using:
@@ -32,7 +32,7 @@ Below is a list of all logs available to Loki and can be accessed on the dashboa
       `kubectl logs pod <pod_name> -n <namespace> -c <container_name>`
 
 
-3. Connecting to internal databases:
+## 3. Connecting to internal databases:
 * TimescaleDB
 	* Go inside the pod: `kubectl exec -it pod/timescaledb-0 -n telemetry-and-visualizations -- /bin/bash`
 	* Connect to psql: `psql -U <postgres_username>`
@@ -42,14 +42,14 @@ Below is a list of all logs available to Loki and can be accessed on the dashboa
 	* Connect to psql: `psql -U <mysqldb_username> -p <mysqldb_password>`
 	* Connect to database: `USE <mysqldb_name>`
 
-4. Checking and updating encrypted parameters:
+## 4. Checking and updating encrypted parameters:
    1. Move to the filepath where the parameters are saved (as an example, we will be using `login_vars.yml`):
       `cd control_plane/input_params`
    2. To view the encrypted parameters:
    `ansible-vault view login_vars.yml --vault-password-file .login_vault_key`
    3. To edit the encrypted parameters:
     `ansible-vault edit login_vars.yml --vault-password-file .login_vault_key`
-5. Checking pod status on the control plane
+## 5. Checking pod status on the control plane
     * Select the pod you need to troubleshoot from the output of `kubectl get pods -A`
     * Check the status of the pod by running `kubectl describe pod <pod name> -n <namespace name>`