Omnia provides playbooks to configure additional software components for Kubernetes such as JupyterHub and Kubeflow. For workload management (submitting, controlling, and managing jobs) of HPC, AI, and Data Analytics clusters, you can access Kubernetes and Slurm dashboards and other supported applications.
To access any of the dashboards, ensure that a compatible web browser is installed. If you are connecting remotely to your Linux server by using MobaXterm version later than 8 or other X11 Clients though ssh, follow the below-mentioned steps to launch the Firefox Browser:
On the control plane:
ssh <user>@<IP-address>
, where IP-address is the private IP of the control plane.dnf install mesa-libGL-devel -y
dnf install firefox -y
dnf install xorg-x11-xauth
export DISPLAY=:10.0
logout and login back
firefox&
.On the manager node:
ssh <user>@<IP-address>
, where IP-address is the private IP of the manager node.yum install firefox -y
yum install xorg-x11-xauth
export DISPLAY=:10.0
logout and login back
firefox&
Note: When the PuTTY or MobaXterm session ends, you must run export DISPLAY=:10.0 command each time, else Firefox cannot be launched again.
The FreeIPA Dashboard can be accessed from the control plane, manager, and login nodes. To access the dashboard:
https://<hostname>
. For example, enter https://manager.example.com
.Note: To obtain a Kerberos ticket, perform the following actions:
kinit <username>
An administrator can create users on the login node using FreeIPA. The users will be prompted to change the passwords upon first login.
kubectl get pods --namespace kubernetes-dashboard
.kubectl proxy
.kubectl get secret -n kubernetes-dashboard $(kubectl get serviceaccount admin-user -n kubernetes-dashboard -o jsonpath="{.secrets[0].name}") -o jsonpath="{.data.token}" | base64 --decode
.kubectl -n kubeflow get applications -o yaml profiles
. Wait till profiles-deployment enters the Ready state.kubectl get services istio-ingressgateway --namespace istio-system
.For more information about the Kubeflow Central Dashboard, see https://www.kubeflow.org/docs/components/central-dash/overview/.
kubectl get pods --namespace jupyterhub
.kubectl get services proxy-public --namespace jupyterhub
.For more information about configuring username and password, and to access the JupyterHub Dashboard, see https://zero-to-jupyterhub.readthedocs.io/en/stable/jupyterhub/customization.html.
Prometheus is installed:
A. When Prometheus is installed as a Kubernetes role.
Access Prometheus with local host:
export POD_NAME=$(kubectl get pods --namespace default -l "app=prometheus,component=server" -o jsonpath="{.items[0].metadata.name}")
echo $POD_NAME
kubectl --namespace default port-forward $POD_NAME 9090
http://localhost:9090
.
Access Prometheus with a private IP address:
kubectl get services --all-namespaces
.From the list of services, find the prometheus-xxxx-server service under the Name column, and copy the EXTERNAL-IP address.
For example, in the below list of services, 192.168.2.150
is the external IP address for the service prometheus-1619158141-server
.
NAMESPACE | NAME | TYPE | CLUSTER-IP | EXTERNAL-IP | PORT(S) | AGE |
---|---|---|---|---|---|---|
default | kubernetes | ClusterIP | 10.96.0.1 | none | 443/TCP | 107m |
default | prometheus-1619158141-server | LoadBalancer | 10.97.40.140 | 192.168.2.150 | 80:31687/TCP | 106m |
To open Firefox, run firefox&
.
Enter the copied External IP address to access Prometheus. For example, enter 192.168.2.150
to access Prometheus UI.
B. When Prometheus is installed on the host.
/var/lib/prometheus-2.23.0.linux-amd64/
../prometheus
.http://localhost:9090
.Note:
Once control_plane.yml
is run, Prometheus is added to Grafana as a datasource (hpc-prometheus). This allows Grafana to display statistics from the Compute Nodes that have been polled using Prometheus on the Control Plane.
Select the dashboard () tab to view the list of Prometheus based dashboards. Some default dashboards include CoreDNS, Prometheus Overview, Kubernetes Networking etc.
Note: Both the control plane and HPC clusters can be monitored on these dashboards by toggling the datasource at the top of each dashboard.
control_plane.yml
is run, Prometheus is added to Grafana as a datasource. This allows Grafana to display statistics from the Control Plane that have been polled using Prometheus.Note: Both the control plane and HPC clusters can be monitored on these dashboards by toggling the datasource at the top of each dashboard:
Data Source | Description | Source |
---|---|---|
hpc-prometheus-manager-nodeIP | Manages the Kubernetes and Slurm Cluster on the Manager and Compute nodes. | This datasource is set up when Omnia.yml is run. |
control_plane_prometheus | Monitors the Single Node cluster running on the Control Plane | This datasource is set up when control_plane.yml is run. |
Type | Subtype | Dashboard Name | Available DataSources |
---|---|---|---|
CoreDNS | control-plane-prometheus, hpc-prometheus-manager-nodeIP | ||
Kubernetes | API Types | control-plane-prometheus, hpc-prometheus-manager-nodeIP | |
Kubernetes | Compute Resources | Cluster | control-plane-prometheus, hpc-prometheus-manager-nodeIP |
Kubernetes | Compute Resources | Namespace (Pods) | control-plane-prometheus, hpc-prometheus-manager-nodeIP |
Kubernetes | Compute Resources | Node (Pods) | control-plane-prometheus, hpc-prometheus-manager-nodeIP |
Kubernetes | Compute Resources | Pod | control-plane-prometheus, hpc-prometheus-manager-nodeIP |
Kubernetes | Compute Resources | Workload | control-plane-prometheus, hpc-prometheus-manager-nodeIP |
Kubernetes | Kubelet | control-plane-prometheus, hpc-prometheus-manager-nodeIP | |
Kubernetes | Networking | Cluster | control-plane-prometheus, hpc-prometheus-manager-nodeIP |
Kubernetes | Networking | Namespace (Pods) | control-plane-prometheus, hpc-prometheus-manager-nodeIP |
Kubernetes | Networking | Namespace (Workload) | control-plane-prometheus, hpc-prometheus-manager-nodeIP |
Kubernetes | Networking | Pod | control-plane-prometheus, hpc-prometheus-manager-nodeIP |
Kubernetes | Networking | Workload | control-plane-prometheus, hpc-prometheus-manager-nodeIP |
Kubernetes | Scheduler | control-plane-prometheus, hpc-prometheus-manager-nodeIP | |
Kubernetes | Stateful Sets | control-plane-prometheus, hpc-prometheus-manager-nodeIP | |
Prometheus Overview | control-plane-prometheus, hpc-prometheus-manager-nodeIP | ||
Slurm | CPUs/GPUs, Jobs, Nodes, Scheduler | hpc-prometheus-manager-nodeIP | |
Slurm | Node Exporter Server Metrics | hpc-prometheus-manager-nodeIP |