Browse Source

Issue #891 Updating Docs

Signed-off-by: cgoveas <cassandra.goveas@dell.com>
cgoveas 3 years ago
parent
commit
6a3eedeca5

+ 2 - 0
control_plane/roles/webui_awx/files/requirements.yml

@@ -10,3 +10,5 @@ collections:
     version: 2.2.3
   - name: community.grafana
     version: 1.3.0
+  - name: ansible.utils
+    version: 2.5.2

+ 3 - 0
docs/FAQ.md

@@ -9,6 +9,9 @@ Potential Causes:
 Resolution:  
 Wait for AWX UI to be accessible at http://\<management-station-IP>:8081, and then run the `control_plane.yml` file again, where __management-station-IP__ is the IP address of the management node.
 
+## Why does Omnia Control Plane fail at Task: `control_plane_common: Assert Value of idrac_support if mngmt_network container needed`?
+When `device_config_support` is set to true, `idrac_support` also needs to be set to true. 
+
 ## What to do if the nodes in a Kubernetes cluster reboot:
 Wait for 15 minutes after the Kubernetes cluster reboots. Next, verify the status of the cluster using the following commands:
 * `kubectl get nodes` on the manager node to get the real-time k8s cluster status.  

+ 6 - 4
docs/INSTALL_OMNIA.md

@@ -194,11 +194,13 @@ The following __Slurm__ roles are provided by Omnia when __omnia.yml__ file is r
 To enable the login node, the *login_node_required* variable must be set to "true" in the *omnia_config.yml* file.  
 - **login_common** role: The firewall ports are opened on the manager and login nodes.  
 - **login_server** role: FreeIPA server is installed and configured on the manager node to provide authentication using LDAP and Kerberos principles.  
-- **login_node** role: FreeIPA client is installed and configured on the login node and is integrated with the server running on the manager node.  
+- **login_node** role: For Rocky, FreeIPA client is installed and configured on the login node and is integrated with the server running on the manager node. For LeapOS, 389ds will be installed instead.
 
-**NOTE**: To skip the installation of:
-* The login node-In the `omnia_config.yml` file, set the *login_node_required* variable to "false".  
-* The FreeIPA server and client: Use `--skip-tags freeipa` while executing the *omnia.yml* file. 
+>>__Note:__ If LeapOS is being deployed, login_common and login_server roles will be skipped.  
+
+>> **NOTE**: To skip the installation of:
+>> * The login node-In the `omnia_config.yml` file, set the *login_node_required* variable to "false".  
+>> * The FreeIPA server and client: Use `--skip-tags freeipa` while executing the *omnia.yml* file. 
 
 ### Installing JupyterHub and Kubeflow playbooks  
 If you want to install JupyterHub and Kubeflow playbooks, you have to first install the JupyterHub playbook and then install the Kubeflow playbook.

File diff suppressed because it is too large
+ 78 - 38
docs/INSTALL_OMNIA_CONTROL_PLANE.md


+ 10 - 7
docs/README.md

@@ -54,7 +54,7 @@ Requirements  |   Version
 OS pre-installed on the management station  |  Rocky 8.x/ Leap 15.x
 OS deployed by Omnia on bare-metal Dell EMC PowerEdge Servers | Rocky 8.x Minimal Edition/ Leap 15.x
 Cobbler  |  3.2.2
-Ansible AWX  |  19.4.0
+Ansible AWX  |  20.0.0
 Slurm Workload Manager  |  20.11.2
 Kubernetes on the management station  |  1.21.0
 Kubernetes on the manager and compute nodes	|	1.16.7 or 1.19.3
@@ -92,9 +92,9 @@ OpenSM	|	GNU General Public License 2	|	3.3.24	|	-
 NVIDIA container runtime	|	Apache-2.0	|	3.4.2	|	Nvidia container runtime library
 Python PIP	|	MIT License	|	21.1.2	|	Python Package
 Python3	|	-	|	3.6.8 (3.6.15 if LeapOS is being used)	|	-
-Kubelet	|	Apache-2.0	|	1.16.7,1.19, 1.21 (LeapOS only supports 1.21) 	|	Provides external, versioned ComponentConfig API types for configuring the kubelet
-Kubeadm	|	Apache-2.0	|	1.16.7,1.19, 1.21 (LeapOS only supports 1.21)	|	"fast paths" for creating Kubernetes clusters
-Kubectl	|	Apache-2.0	|	1.16.7,1.19, 1.21 (LeapOS only supports 1.21)	|	Command line tool for Kubernetes
+Kubelet	|	Apache-2.0	|	1.16.7,1.19, 1.21  	|	Provides external, versioned ComponentConfig API types for configuring the kubelet
+Kubeadm	|	Apache-2.0	|	1.16.7,1.19, 1.21 	|	"fast paths" for creating Kubernetes clusters
+Kubectl	|	Apache-2.0	|	1.16.7,1.19, 1.21 	|	Command line tool for Kubernetes
 JupyterHub	|	Modified BSD License	|	1.1.0	|	Multi-user hub
 kubernetes Controllers	|	Apache-2.0	|	1.16.7,1.19 (1.21 if LeapOS is being used)	|	Orchestration tool	
 Kfctl	|	Apache-2.0	|	1.0.2	|	CLI for deploying and managing Kubeflow
@@ -106,10 +106,10 @@ Horovod	|	Apache-2.0	|	0.21.1	|	Distributed deep learning training framework for
 MPI	|	Copyright (c) 2018-2019 Triad National Security,LLC. All rights reserved.	|	0.3.0	|	HPC library
 CoreDNS	|	Apache-2.0	|	1.6.2	|	DNS server that chains plugins
 CNI	|	Apache-2.0	|	0.3.1	|	Networking for Linux containers
-AWX	|	Apache-2.0	|	19.4.0	|	Web-based User Interface
+AWX	|	Apache-2.0	|	20.0.0	|	Web-based User Interface
 AWX.AWX	|	Apache-2.0	|	19.4.0	|	Galaxy collection to perform awx configuration
-AWXkit	|	Apache-2.0	|	to be updated	|	To perform configuration through CLI commands
-Cri-o	|	Apache-2.0	|	1.21	|	Container Service
+AWXkit	|	Apache-2.0	|	18.0.0	|	To perform configuration through CLI commands
+Cri-o	|	Apache-2.0	|	1.21, 1.17.3  (LeapOS only supports  1.17.3) |	Container Service
 Buildah	|	Apache-2.0	|	1.22.4	|	Tool to build and run containers
 PostgreSQL	|	Copyright (c) 1996-2020, PostgreSQL Global Development Group	|	10.15	|	Database Management System
 Redis	|	BSD-3-Clause License	|	6.0.10	|	In-memory database
@@ -123,6 +123,9 @@ OMSDK	|	Apache-2.0	|	1.2.488	|	Dell EMC OpenManage Python SDK (OMSDK) is a pytho
 | postfix                               | IBM Public License               | 3.5.8  | Mail Transfer Agent (MTA) designed to determine routes and   send emails                                                                       |
 | xorriso                               | GPL version 3                    | 1.4.8  | xorriso copies file objects from POSIX compliant filesystems   into Rock Ridge enhanced ISO 9660 filesystems.                                  |
 | Dell EMC   OpenManage Ansible Modules | GNU- General Public License v3.0 | 5.0.0  | OpenManage Ansible Modules simplifies and automates   provisioning, deployment, and updates of PowerEdge servers and modular   infrastructure. |
+| 389-ds                               | GPL version 3               | 1.4.4  |  LDAP server used for authentication, access control.                                                                       |
+| sssd                               | GPL version 3                    | 1.16.1  | A set of daemons used to manage access to remote directory services and authentication mechanisms.                                   |
+| krb5 | MIT License | 1.19.2  | Authentication protocol providing strong authentication for client/server applications by using secret-key cryptography |
 
 # Known issues  
 * **Issue**: Hosts are not displayed on the AWX UI.  

File diff suppressed because it is too large
+ 25 - 0
docs/Security/ENABLE_SECURITY_LOGIN_NODE.md


File diff suppressed because it is too large
+ 85 - 0
docs/Security/ENABLE_SECURITY_MANAGEMENT_STATION.md


+ 8 - 7
docs/Telemetry_Visualization/Visualization.md

@@ -11,17 +11,17 @@ A lot of these metrics are collected using iDRAC telemetry. iDRAC telemetry allo
 
 | Parameter Name        | Default Value | Information |
 |-----------------------|---------------|-------------|
-| timescaledb_user      | 		        |  Username used for connecting to timescale db. Minimum Legth: 2 characters.          |
-| timescaledb_password  | 		        |  Password used for connecting to timescale db. Minimum Legth: 2 characters.           |
-| mysqldb_user          | 		        |  Username used for connecting to mysql db. Minimum Legth: 2 characters.         |
-| mysqldb_password      | 		        |  Password used for connecting to mysql db. Minimum Legth: 2 characters.            |
+| timescaledb_user      | 		        |  Username used for connecting to timescale db. Minimum Length: 2 characters.          |
+| timescaledb_password  | 		        |  Password used for connecting to timescale db. Minimum Length: 2 characters.           |
+| mysqldb_user          | 		        |  Username used for connecting to mysql db. Minimum Length: 2 characters.         |
+| mysqldb_password      | 		        |  Password used for connecting to mysql db. Minimum Length: 2 characters.            |
 | mysqldb_root_password | 		        |  Password used for connecting to mysql db for root user. Minimum Legth: 2 characters.         |
 
 3. All parameters in `telemetry/input_params/base_vars.yml` need to be filled in:
 
 | Parameter Name          | Default Value     | Information |
 |-------------------------|-------------------|-------------|
-| mount_location          | idrac_telemetrysource_services_db | Sets the location all telemetry related files will be stored and both timescale and mysql databases will be mounted.            |
+| mount_location          | /opt/omnia| Sets the location all telemetry related files will be stored and both timescale and mysql databases will be mounted.            |
 | idrac_telemetry_support | true              | This variable is used to enable iDRAC telemetry support and visualizations. Accepted Values: true/false            |
 | slurm_telemetry_support | true              | This variable is used to enable slurm telemetry support and visualizations. Slurm Telemetry support can only be activated when idrac_telemetry_support is set to true. Accepted Values: True/False.        |
 | timescaledb_name        | telemetry_metrics | Postgres DB with timescale extension is used for storing iDRAC and slurm telemetry metrics.            |
@@ -50,7 +50,7 @@ Use any one of the following browsers to access the Grafana UI (https://< Grafan
 
 ## Initiating Telemetry
 
-1. Once `control_plane.yml` and `telemetry.yml` are executed, run the following commands from `omnia/telemetry`:
+1. Once `control_plane.yml` and `omnia.yml` are executed, run the following commands from `omnia/telemetry`:
 
 `ansible-playbook telemetry.yml`
 
@@ -60,7 +60,8 @@ Use any one of the following browsers to access the Grafana UI (https://< Grafan
 After initiation, new nodes can be added to telemetry by running the following commands from `omnia/telemetry`:
 		
 ` ansible-playbook add_idrac_node.yml `
-		
+
+	
 
 
 

+ 1 - 1
docs/control_plane/device_templates/PROVISION_SERVERS.md

@@ -13,7 +13,7 @@ Edit the following files under the `control_plane/input_params` directory to pro
 	File name	|	Variables</br> [Required/ Optional]	|	Default, choices	|	Description
 	-------	|	----------------	|	-----------------	|	-----------------
 	idrac_vars.yml	|	idrac_system_profile</br> [Required]	|	<ul><li>**Performance**</li> <li>PerformancePerWatt(DAPC)</li> <li>PerformancePerWatt(OS)</li> <li>WorkstationPerformance</li></ul>	|	The system profile used for BIOS configuration. 
-	<br>	|	firmware_update_required</br> [Required]	|	<ul><li>**true**</li> <li>false</li></ul>	|	By default, Omnia updates the firmware on the servers. To disable the firmware update, set the variable to "false".
+	<br>	|	firmware_update_required</br> [Required]	|	<ul><li>**false**</li> <li>true</li></ul>	|	By default, Omnia updates the firmware on the servers. To disable the firmware update, set the variable to "false".
 	<br>	|	poweredge_model</br> [Required if "firmware_update_required" is set to "true"]	|	<ul><li>**C6420**</li> <li>R640</li><li>R740</li><li>C4140</li> <li>And other supported PowerEdge servers</li></ul>	|	Enter the required PowerEdge server models to update the firmware. For example, enter `R640,R740,C4140` to update firmware on these models of PowerEdge servers. For a complete list of supported PowerEdge servers, see the *Hardware managed by Omnia* section in the Readme file.
 	<br>	|	uefi_secure_boot</br> [Optional]	|	<ul><li>**disabled**</li> <li>enabled</li></ul>	|	Option to enable or disable the secure boot mode.
 	<br>	|	system_lockdown</br> [Optional]	|	<ul><li>**disabled**</li> <li>enabled</li></ul>	|	Option to enable or disable system lockdown.

+ 1 - 1
docs/control_plane/input_parameters/PROVISION_SERVERS.md

@@ -13,7 +13,7 @@ Edit the following files under the `control_plane/input_params` directory to pro
 	File name	|	Variables</br> [Required/ Optional]	|	Default, choices	|	Description
 	-------	|	----------------	|	-----------------	|	-----------------
 	idrac_vars.yml	|	idrac_system_profile</br> [Required]	|	<ul><li>**Performance**</li> <li>PerformancePerWatt(DAPC)</li> <li>PerformancePerWatt(OS)</li> <li>WorkstationPerformance</li></ul>	|	The system profile used for BIOS configuration. 
-	<br>	|	firmware_update_required</br> [Required]	|	<ul><li>**true**</li> <li>false</li></ul>	|	By default, Omnia updates the firmware on the servers. To disable the firmware update, set the variable to "false".
+	<br>	|	firmware_update_required</br> [Required]	|	<ul><li>**false**</li> <li>true</li></ul>	|	By default, Omnia updates the firmware on the servers. To disable the firmware update, set the variable to "false".
 	<br>	|	poweredge_model</br> [Required if "firmware_update_required" is set to "true"]	|	<ul><li>**C6420**</li> <li>R640</li><li>R740</li><li>C4140</li> <li>And other supported PowerEdge servers</li></ul>	|	Enter the required PowerEdge server models to update the firmware. For example, enter `R640,R740,C4140` to update firmware on these models of PowerEdge servers. For a complete list of supported PowerEdge servers, see the *Hardware managed by Omnia* section in the Readme file.
 	<br>	|	uefi_secure_boot</br> [Optional]	|	<ul><li>**disabled**</li> <li>enabled</li></ul>	|	Option to enable or disable the secure boot mode.
 	<br>	|	system_lockdown</br> [Optional]	|	<ul><li>**disabled**</li> <li>enabled</li></ul>	|	Option to enable or disable system lockdown.

+ 30 - 0
roles/cluster_validation/tasks/install_packages.yml

@@ -0,0 +1,30 @@
+#  Copyright 2022 Dell Inc. or its subsidiaries. All Rights Reserved.
+#
+#  Licensed under the Apache License, Version 2.0 (the "License");
+#  you may not use this file except in compliance with the License.
+#  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+#  limitations under the License.
+---
+
+- name: Set fact for ansible version
+  set_fact:
+    ansible_collection_used: true
+  when: "ansible_version.full is version_compare(ansible_base_version, '>')"
+
+- name: Install netaddr
+  pip:
+    name: netaddr
+    state: present
+    executable: pip3
+
+- name: Install ansible galaxy collection ansible.utils
+  command: ansible-galaxy collection install "{{ ipaddr_collection }}"
+  changed_when: false
+  when: ansible_collection_used

+ 6 - 1
roles/cluster_validation/tasks/main.yml

@@ -27,6 +27,7 @@
     control_plane_status: false
     powervault_status: false
     nfs_node_status: false
+    ansible_collection_used: false
 
 - name: Check AWX instance
   command: awx --version
@@ -46,6 +47,10 @@
     - not awx_version_check.failed
     - awx_search_key in awx_hostname.stdout
 
+- name: Install Packages
+  include_tasks: install_packages.yml
+  when: not control_plane_status
+
 - name: Set NFS node status
   set_fact:
     nfs_node_status: true
@@ -90,4 +95,4 @@
         regexp: '#log_path = /var/log/ansible.log'
         replace: 'log_path = /var/log/omnia.log'
       when: ansible_conf_exists.stat.exists
-  when: not control_plane_status
+  when: not control_plane_status

+ 5 - 1
roles/cluster_validation/vars/main.yml

@@ -99,4 +99,8 @@ allow_deny_fail_msg: "Failed. Incorrect Access format in security_vars.yml"
 restrict_program_support_success_msg: "restrict_program_support successfully validated"
 restrict_program_support_failure_msg: "Failed. Accepted values are true or false."
 restrict_softwares_success_msg: "restrict_softwares successfully validated"
-restrict_softwares_failure_msg: "Warning. Values should be comma separated. The supported services are telnet, lpd, bluetooth, rlogin, rexec. Please check restrict_softwares variable"
+restrict_softwares_failure_msg: "Warning. Values should be comma separated. The supported services are telnet, lpd, bluetooth, rlogin, rexec. Please check restrict_softwares variable"
+
+# Usage: install_packages.yml
+ansible_base_version: '2.9'
+ipaddr_collection: ansible.utils:2.5.2

+ 6 - 0
roles/slurm_manager/tasks/main.yml

@@ -120,6 +120,12 @@
 - name: Get network address/subnet mask
   set_fact:
     network_address: "{{ (ansible_default_ipv4.network + '/' + ansible_default_ipv4.netmask) | ipaddr('network/prefix') }}"
+  when: not hostvars['127.0.0.1']['ansible_collection_used']
+
+- name: Get network address/subnet mask
+  set_fact:
+    network_address: "{{ (ansible_default_ipv4.network + '/' + ansible_default_ipv4.netmask) | ansible.utils.ipaddr('network/prefix') }}"
+  when: hostvars['127.0.0.1']['ansible_collection_used']
 
 - name: Firewall rule slurm - allow all incoming traffic on internal network
   firewalld:

+ 1 - 1
telemetry/roles/slurm_telemetry/tasks/get_node_inventory.yml

@@ -39,7 +39,7 @@
     register: awx_svc_ip
 
   - name: AWX needs to be installed
-   fail:
+    fail:
       msg: "{{ awx_fail_msg }}"
     when: not awx_svc_ip.stdout
 

+ 75 - 73
telemetry/roles/slurm_telemetry/tasks/update_service_tags.yml

@@ -1,4 +1,4 @@
- Copyright 2022 Dell Inc. or its subsidiaries. All Rights Reserved.
+# Copyright 2022 Dell Inc. or its subsidiaries. All Rights Reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
@@ -34,83 +34,85 @@
   - name: Assert slurmctld status
     fail:
       msg: "{{ slurmctld_status_fail_msg }}"
-    when: not slurm_service
+    when: not hostvars[groups['manager'][0]]['slurm_service']
 
   - name: Prepare input config file
     block:
     - name: Get service tag
-        shell: >
+      shell: >
           set -o pipefail && \
           dmidecode -t 1 | grep Serial
-        changed_when: false
-        register: service_tag_details
-
-      - name: Set fact service tag
-        set_fact:
-          service_tag: "{{ service_tag_details.stdout.split(':')[1].strip() }}"
-
-      - name: Get the hostname
-        command: hostname
-        register: machine_hostname
-        changed_when: false
-
-      - name: Update Head Node IP
-        replace:
-          path: "{{ role_path }}{{ monster_input_file_path }}"
-          regexp: '  ip:.*'
-          replace: "  ip: {{ groups['manager'][0] }}"
-        delegate_to: localhost
-
-      - name: Update Head Node hostname
-        replace:
-          path: "{{ role_path }}{{ monster_input_file_path }}"
-          regexp: '  headnode:.*'
-          replace: "  headnode: {{ hostvars[groups['manager'][0]]['machine_hostname'].stdout }}"
-        delegate_to: localhost
-
-      - name: Update nodes hostnames
-        lineinfile:
-          path: "{{ role_path }}{{ monster_input_file_path }}"
-          line: "  {{ machine_hostname.stdout }}: {{ inventory_hostname }}"
-          insertafter: "hostnames:"
-        delegate_to: localhost
-
-      - name: Update service tag info
-        lineinfile:
-          path: "{{ role_path }}{{ monster_input_file_path }}"
-          line: "  - Servicetag: {{ service_tag }}\n    Os_Ip_Addr: {{ inventory_hostname }}"
-          insertafter: "clusternodes:"
-        delegate_to: localhost
-
-      - name: Copy initialization file
-        copy:
-          src: "{{ role_path }}/files/init_k8s_pod_local.sh"
-          dest: "{{ role_path }}/files/init_k8s_pod.sh"
-          mode: "{{ monster_config_file_mode }}"
-
-      - name: Update manager node details in init_k8s_pod.sh
-        replace:
-          path: "{{ role_path }}/files/init_k8s_pod.sh"
-          regexp: echo 'manager_node_ip manager_node_hostname' >> /etc/hosts
-          replace: echo '{{ inventory_hostname }} {{ machine_hostname.stdout }}' >> /etc/hosts
-        delegate_to: localhost
-        when: manager_group in group_names
-
-      - name: Update manager node IP in init_k8s_pod.sh
-        replace:
-          path: "{{ role_path }}/files/init_k8s_pod.sh"
-          regexp: ssh-keyscan -H manager_node_hostname >> /root/.ssh/known_hosts
-          replace: ssh-keyscan -H {{ machine_hostname.stdout }} >> /root/.ssh/known_hosts
-        delegate_to: localhost
-        when: manager_group in group_names
-
-      - name: Update manager node IP in init_k8s_pod.sh
-        replace:
-          path: "{{ role_path }}/files/init_k8s_pod.sh"
-          regexp: sshpass -p 'os_passwd' ssh-copy-id 'root@manager_node_ip'
-          replace: sshpass -p "{{ hostvars['127.0.0.1']['provision_password'] }}" ssh-copy-id 'root@{{ inventory_hostname }}'
-        delegate_to: localhost
-        when: manager_group in group_names
+      changed_when: false
+      register: service_tag_details
+
+    - name: Set fact service tag
+      set_fact:
+        service_tag: "{{ service_tag_details.stdout.split(':')[1].strip() }}"
+
+    - name: Get the hostname
+      command: hostname
+      register: machine_hostname
+      changed_when: false
+
+    - name: Update Head Node IP
+      replace:
+        path: "{{ role_path }}{{ monster_input_file_path }}"
+        regexp: '  ip:.*'
+        replace: "  ip: {{ groups['manager'][0] }}"
+      delegate_to: localhost
+
+    - name: Update Head Node hostname
+      replace:
+        path: "{{ role_path }}{{ monster_input_file_path }}"
+        regexp: '  headnode:.*'
+        replace: "  headnode: {{ hostvars[groups['manager'][0]]['machine_hostname'].stdout }}"
+      delegate_to: localhost
+
+    - name: Update nodes hostnames
+      lineinfile:
+        path: "{{ role_path }}{{ monster_input_file_path }}"
+        line: "  {{ machine_hostname.stdout }}: {{ inventory_hostname }}"
+        insertafter: "hostnames:"
+      delegate_to: localhost
+
+    - name: Update service tag info
+      lineinfile:
+        path: "{{ role_path }}{{ monster_input_file_path }}"
+        line: "  - Servicetag: {{ service_tag }}\n    Os_Ip_Addr: {{ inventory_hostname }}"
+        insertafter: "clusternodes:"
+      delegate_to: localhost
+
+    - name: Copy initialization file
+      copy:
+        src: "{{ role_path }}/files/init_k8s_pod_local.sh"
+        dest: "{{ role_path }}/files/init_k8s_pod.sh"
+        mode: "{{ monster_config_file_mode }}"
+      delegate_to: localhost
+      when: manager_group in group_names
+
+    - name: Update manager node details in init_k8s_pod.sh
+      replace:
+        path: "{{ role_path }}/files/init_k8s_pod.sh"
+        regexp: echo 'manager_node_ip manager_node_hostname' >> /etc/hosts
+        replace: echo '{{ inventory_hostname }} {{ machine_hostname.stdout }}' >> /etc/hosts
+      delegate_to: localhost
+      when: manager_group in group_names
+
+    - name: Update manager node IP in init_k8s_pod.sh
+      replace:
+        path: "{{ role_path }}/files/init_k8s_pod.sh"
+        regexp: ssh-keyscan -H manager_node_hostname >> /root/.ssh/known_hosts
+        replace: ssh-keyscan -H {{ machine_hostname.stdout }} >> /root/.ssh/known_hosts
+      delegate_to: localhost
+      when: manager_group in group_names
+
+    - name: Update manager node IP in init_k8s_pod.sh
+      replace:
+        path: "{{ role_path }}/files/init_k8s_pod.sh"
+        regexp: sshpass -p 'os_passwd' ssh-copy-id 'root@manager_node_ip'
+        replace: sshpass -p "{{ hostvars['127.0.0.1']['provision_password'] }}" ssh-copy-id 'root@{{ inventory_hostname }}'
+      delegate_to: localhost
+      when: manager_group in group_names
 
     when: hostvars[groups['manager'][0]]['slurm_service']
-  when: slurm_telemetry_support
+  when: hostvars['127.0.0.1']['slurm_telemetry_support']