Monitor Kuberentes and Slurm
Omnia provides playbooks to configure additional software components for Kubernetes such as JupyterHub and Kubeflow. For workload management (submitting, conrolling, and managing jobs) of HPC, AI, and Data Analytics clusters, you can access Kubernetes and Slurm dashboards and other supported applications.
To access any of the dashboards login to the manager node and open the installed web browser.
If you are connecting remotely ensure your putty or any X11 based clients and you are using mobaxterm version 8 and above, follow the below mentioned steps:
- To provide ssh to the manager node.
ssh -x root@<ip>
(where IP is the private IP of manager node)
yum install firefox -y
yum install xorg-x11-xauth
export DISPLAY=:10.0
logout and login back
- To launch firefox from terminal use the following command:
Note: When the putty/mobaxterm session ends, you must run export DISPLAY=:10.0 command each time, else Firefox cannot be launched again.
Setup user account in manager node
- Login to head node as root user and run
adduser __<username>__
- Run
passwd __<username>__
to set password.
- Run
usermod -a -G wheel __<username>__
to give sudo permission.
Note: Kuberenetes and Slurm job can be scheduled only for users with sudo privileges.
Access Kuberentes Dashboard
- To verify if the Kubernetes-dashboard service is running, run
kubectl get pods --namespace kubernetes-dashboard
- To start the Kubernetes dashboard, run
kubectl proxy
- From the CLI, run
kubectl get secrets
to see the generated tokens.
- Copy the token with the name prometheus--kube-state-metrics__ of the type
- Run
kubectl describe secret __<copied token name>__
- Copy the encrypted token value.
- On a web browser(installed on the manager node), enter http://localhost:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy/ to access the Kubernetes Dashboard.
- Select the authentication method as Token.
- On the Kuberenetes Dashboard, paste the copied encrypted token and click Sign in.
Access Kubeflow Dashboard
It is recommended that use port numbers between 8000-8999 and suggested port number is 8085.
- To see which are the ports are in use, use the following command:
netstat -an
- Choose a port number between 8000-8999 which is not in use.
- To run the kubeflow dashboard at selected port number, run the following command:
kubectl port-forward -n kubeflow service/centraldashboard __selected_port_number__:80
- On a web browser installed on the manager node, go to http://localhost:selected-port-number/ to launch the kubeflow central navigation dashboard.
Access JupyterHub Dashboard
- To verify if the JupyterHub services are running, run
kubectl get pods --namespace jupyterhub
- Ensure that the pod names starting with hub and proxy are in Running status.
- Run
kubectl get services --namespace jupyterhub
- Copy the External IP of proxy-public service.
- On a web browser installed on the manager node, use the External IP address to access the JupyterHub Dashboard.
- Enter any username and password combination to enter the Jupyterhub. The username and password can be later configured from the JupyterHub dashboard.
Prometheus is installed in two different ways:
- It is installed on the host when Slurm is installed without installing Kubernetes.
- It is installed as a Kubernetes role, if you install both Slurm and Kubernetes.
If Prometheus is installed as part of kubernetes role, run the following commands before starting the Prometheus UI:
export POD_NAME=$(kubectl get pods --namespace default -l "app=prometheus,component=server" -o jsonpath="{.items[0]}")
echo $POD_NAME
kubectl --namespace default port-forward $POD_NAME 9090
If Prometheus is installed on the host, start the Prometheus web server by run the following command:
- Navigate to Prometheus folder. The default path is /var/lib/prometheus-2.23.0.linux-amd64/.
- Start the web server,
Go to http://localhost:9090 to launch the Prometheus UI in the browser.
- If Prometheus was installed through slurm without Kubernetes then it will be removed when Kubernetes is installed as Prometheus would be running as a pod.
- You can use a single instance of Prometheus when both kubernetes and slurm are installed.