MONITOR_CLUSTERS.md 4.7 KB

Monitor Kuberentes and Slurm

Omnia provides playbooks to configure additional software components for Kubernetes such as JupyterHub and Kubeflow. For workload management (submitting, conrolling, and managing jobs) of HPC, AI, and Data Analytics clusters, you can access Kubernetes and Slurm dashboards and other supported applications.

To access any of the dashboards login to the manager node and open the installed web browser.

If you are connecting remotely ensure your putty or any X11 based clients and you are using mobaxterm version 8 and above, follow the below mentioned steps:

  1. To provide ssh to the manager node. ssh -x root@<ip> (where IP is the private IP of manager node)
  2. yum install firefox -y
  3. yum install xorg-x11-xauth
  4. export DISPLAY=:10.0
  5. logout and login back
  6. To launch firefox from terminal use the following command: firefox&

Note: When the putty/mobaxterm session ends, you must run export DISPLAY=:10.0 command each time, else Firefox cannot be launched again.

Setup user account in manager node

  1. Login to head node as root user and run adduser __<username>__.
  2. Run passwd __<username>__ to set password.
  3. Run usermod -a -G wheel __<username>__ to give sudo permission.

Note: Kuberenetes and Slurm job can be scheduled only for users with sudo privileges.

Access Kuberentes Dashboard

  1. To verify if the Kubernetes-dashboard service is running, run kubectl get pods --namespace kubernetes-dashboard.
  2. To start the Kubernetes dashboard, run kubectl proxy.
  3. From the CLI, run kubectl get secrets to see the generated tokens.
  4. Copy the token with the name prometheus--kube-state-metrics__ of the type kubernetes.io/service-account-token.
  5. Run kubectl describe secret __<copied token name>__
  6. Copy the encrypted token value.
  7. On a web browser(installed on the manager node), enter http://localhost:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy/ to access the Kubernetes Dashboard.
  8. Select the authentication method as Token.
  9. On the Kuberenetes Dashboard, paste the copied encrypted token and click Sign in.

Access Kubeflow Dashboard

It is recommended that you use port numbers between 8000-8999 and the suggested port number is 8085.

  1. To view the ports which are in use, run the following command: netstat -an
  2. Select a port number between 8000-8999 which is not in use.
  3. To run the Kubeflow Dashboard at selected port number, run one of the following commands:
    kubectl port-forward -n kubeflow service/centraldashboard __selected_port_number__:80
    (Or)
    kubectl port-forward -n istio-system svc/istio-ingressgateway __selected_port_number__:80
  4. On a web browser installed on the manager node, go to http://localhost:selected-port-number/ to launch the Kubeflow Central Dashboard.

For more information about the Kubeflow Central Dashboard, see https://www.kubeflow.org/docs/components/central-dash/overview/.

Access JupyterHub Dashboard

  1. To verify if the JupyterHub services are running, run kubectl get pods --namespace jupyterhub.
  2. Ensure that the pod names starting with hub and proxy are in Running status.
  3. Run kubectl get services --namespace jupyterhub.
  4. Copy the External IP of proxy-public service.
  5. On a web browser installed on the manager node, use the External IP address to access the JupyterHub Dashboard.
  6. Enter any username and password combination to enter the Jupyterhub. The username and password can be later configured from the JupyterHub dashboard.

Prometheus

Prometheus is installed in two different ways:

  • It is installed on the host when Slurm is installed without installing Kubernetes.
  • It is installed as a Kubernetes role, if you install both Slurm and Kubernetes.

If Prometheus is installed as part of kubernetes role, run the following commands before starting the Prometheus UI:

  1. export POD_NAME=$(kubectl get pods --namespace default -l "app=prometheus,component=server" -o jsonpath="{.items[0].metadata.name}")
  2. echo $POD_NAME
  3. kubectl --namespace default port-forward $POD_NAME 9090

If Prometheus is installed on the host, start the Prometheus web server by run the following command:

  1. Navigate to Prometheus folder. The default path is /var/lib/prometheus-2.23.0.linux-amd64/.
  2. Start the web server, ./prometheus

Go to http://localhost:9090 to launch the Prometheus UI in the browser.

Note:

  • If Prometheus was installed through slurm without Kubernetes then it will be removed when Kubernetes is installed as Prometheus would be running as a pod.
  • You can use a single instance of Prometheus when both kubernetes and slurm are installed.