MONITOR_CLUSTERS.md 6.4 KB

Monitor Kubernetes and Slurm

Omnia provides playbooks to configure additional software components for Kubernetes such as JupyterHub and Kubeflow. For workload management (submitting, conrolling, and managing jobs) of HPC, AI, and Data Analytics clusters, you can access Kubernetes and Slurm dashboards and other supported applications.

Before accessing the dashboards

To access any of the dashboards, ensure that a compatible web browser is installed. If you are connecting remotely to your Linux server by using MobaXterm version later than 8 or other X11 Clients though ssh, follow the below mentioned steps to launch the Firefox Browser:

  • On the management station:

    1. Connect using ssh. Run ssh <user>@<IP-address>, where IP-address is the private IP of the management station.
    2. dnf install mesa-libGL-devel -y
    3. dnf install firefox -y
    4. dnf install xorg-x11-xauth
    5. export DISPLAY=:10.0
    6. logout and login back
    7. To launch Firefox from terminal, run firefox&.

  • On the manager node:

    1. Connect using ssh. Run ssh <user>@<IP-address>, where IP-address is the private IP of the manager node.
    2. yum install firefox -y
    3. yum install xorg-x11-xauth
    4. export DISPLAY=:10.0
    5. logout and login back
    6. To launch Firefox from terminal, run firefox&

NOTE: When the PuTTY or MobaXterm session ends, you must run export DISPLAY=:10.0 command each time, else Firefox cannot be launched again.

Access FreeIPA Dashboard

The FreeIPA Dashboard can be accessed from the management station, manager, and login nodes. To access the dashboard:

  1. Install the Firefox Browser.
  2. Open the Firefox Browser and enter the url: https://<hostname>. For example, enter https://manager.example.com.
  3. Enter the username and password. If the admin or user has obtained a Kerberos ticket, then the credentials need not be provided.

Note: To obtain a Kerberos ticket, perform the following actions:

  1. Enter kinit <username>
  2. When prompted, enter the password.

An administrator can create users on the login node using FreeIPA. The users will be prompted to change the passwords upon first login.

Access Kuberentes Dashboard

  1. To verify if the Kubernetes-dashboard service is in the Running state, run kubectl get pods --namespace kubernetes-dashboard.
  2. To start the Kubernetes dashboard, run kubectl proxy.
  3. To retrieve the encrypted token, run kubectl get secret -n kubernetes-dashboard $(kubectl get serviceaccount admin-user -n kubernetes-dashboard -o jsonpath="{.secrets[0].name}") -o jsonpath="{.data.token}" | base64 --decode.
  4. Copy the encrypted token value.
  5. On a web browser on the management station (for control_plane.yml) or manager node (for omnia.yml) enter http://localhost:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy/.
  6. Select the authentication method as Token.
  7. On the Kuberenetes Dashboard, paste the copied encrypted token and click Sign in to access the Kubernetes Dashboard.

Access Kubeflow Dashboard

  1. Before accessing the Kubeflow Dashboard, run kubectl -n kubeflow get applications -o yaml profiles. Wait till profiles-deployment enters the Ready state.
  2. To retrieve the External IP or CLUSTER IP, run kubectl get services istio-ingressgateway --namespace istio-system.
  3. On a web browser installed on the manager node, enter the External IP or Cluster IP to open the Kubeflow Central Dashboard.

For more information about the Kubeflow Central Dashboard, see https://www.kubeflow.org/docs/components/central-dash/overview/.

Access JupyterHub Dashboard

  1. To verify if the JupyterHub services are running, run kubectl get pods --namespace jupyterhub.
  2. Ensure that the pod names starting with hub and proxy are in the Running state.
  3. To retrieve the External IP or CLUSTER IP, run kubectl get services proxy-public --namespace jupyterhub.
  4. On a web browser installed on the manager node, enter the External IP or Cluster IP to open the JupyterHub Dashboard.
  5. JupyterHub is running with a default dummy authenticator. Enter any username and password combination to access the dashboard.

For more information about configuring username and password, and to access the JupyterHub Dashboard, see https://zero-to-jupyterhub.readthedocs.io/en/stable/jupyterhub/customization.html.

Access Prometheus UI

Prometheus is installed:

  • As a Kubernetes role (A), when both Slurm and Kubernetes are installed.
  • On the host when only Slurm is installed (B).

A. When Prometheus is installed as a Kubernetes role.

  • Access Prometheus with local host:

    1. Run the following commands:
      export POD_NAME=$(kubectl get pods --namespace default -l "app=prometheus,component=server" -o jsonpath="{.items[0].metadata.name}")
      echo $POD_NAME
      kubectl --namespace default port-forward $POD_NAME 9090
    2. To launch the Prometheus UI, in the web browser, enter http://localhost:9090.
  • Access Prometheus with a private IP address:

    1. Run kubectl get services --all-namespaces.
    2. From the list of services, find the prometheus-xxxx-server service under the Name column, and copy the EXTERNAL-IP address.
      For example, in the below list of services, 192.168.2.150 is the external IP address for the service prometheus-1619158141-server.
      NAMESPACE | NAME | TYPE | CLUSTER-IP | EXTERNAL-IP | PORT(S) | AGE
      --------- | ---- | ---- | ---------- | ----------- | ------- | ----
      default | kubernetes | ClusterIP | 10.96.0.1 | none | 443/TCP | 107m
      default | prometheus-1619158141-server | LoadBalancer | 10.97.40.140 | 192.168.2.150 | 80:31687/TCP | 106m
    3. To open Firefox, run firefox&.
    4. Enter the copied External IP address to access Prometheus. For example, enter 192.168.2.150 to access Prometheus UI.

B. When Prometheus is installed on the host.

  1. Navigate to Prometheus folder. The default path is /var/lib/prometheus-2.23.0.linux-amd64/.
  2. Start the web server: ./prometheus.
  3. To launch the Prometheus UI, in the web browser, enter http://localhost:9090.

Note:

  • If Prometheus is installed through Slurm without installing Kubernetes, then it will be removed when Kubernetes is installed because Prometheus would be running as a pod.
  • Only a single instance of Prometheus is installed when both Kubernetes and Slurm are installed.