Explorar o código

Issue #891 Updating Docs

Signed-off-by: cgoveas <cassandra.goveas@dell.com>
cgoveas %!s(int64=3) %!d(string=hai) anos
pai
achega
6a3eedeca5

+ 2 - 0
control_plane/roles/webui_awx/files/requirements.yml

@@ -10,3 +10,5 @@ collections:
     version: 2.2.3
   - name: community.grafana
     version: 1.3.0
+  - name: ansible.utils
+    version: 2.5.2

+ 3 - 0
docs/FAQ.md

@@ -9,6 +9,9 @@ Potential Causes:
 Resolution:  
 Wait for AWX UI to be accessible at http://\<management-station-IP>:8081, and then run the `control_plane.yml` file again, where __management-station-IP__ is the IP address of the management node.
 
+## Why does Omnia Control Plane fail at Task: `control_plane_common: Assert Value of idrac_support if mngmt_network container needed`?
+When `device_config_support` is set to true, `idrac_support` also needs to be set to true. 
+
 ## What to do if the nodes in a Kubernetes cluster reboot:
 Wait for 15 minutes after the Kubernetes cluster reboots. Next, verify the status of the cluster using the following commands:
 * `kubectl get nodes` on the manager node to get the real-time k8s cluster status.  

+ 6 - 4
docs/INSTALL_OMNIA.md

@@ -194,11 +194,13 @@ The following __Slurm__ roles are provided by Omnia when __omnia.yml__ file is r
 To enable the login node, the *login_node_required* variable must be set to "true" in the *omnia_config.yml* file.  
 - **login_common** role: The firewall ports are opened on the manager and login nodes.  
 - **login_server** role: FreeIPA server is installed and configured on the manager node to provide authentication using LDAP and Kerberos principles.  
-- **login_node** role: FreeIPA client is installed and configured on the login node and is integrated with the server running on the manager node.  
+- **login_node** role: For Rocky, FreeIPA client is installed and configured on the login node and is integrated with the server running on the manager node. For LeapOS, 389ds will be installed instead.
 
-**NOTE**: To skip the installation of:
-* The login node-In the `omnia_config.yml` file, set the *login_node_required* variable to "false".  
-* The FreeIPA server and client: Use `--skip-tags freeipa` while executing the *omnia.yml* file. 
+>>__Note:__ If LeapOS is being deployed, login_common and login_server roles will be skipped.  
+
+>> **NOTE**: To skip the installation of:
+>> * The login node-In the `omnia_config.yml` file, set the *login_node_required* variable to "false".  
+>> * The FreeIPA server and client: Use `--skip-tags freeipa` while executing the *omnia.yml* file. 
 
 ### Installing JupyterHub and Kubeflow playbooks  
 If you want to install JupyterHub and Kubeflow playbooks, you have to first install the JupyterHub playbook and then install the Kubeflow playbook.

A diferenza do arquivo foi suprimida porque é demasiado grande
+ 78 - 38
docs/INSTALL_OMNIA_CONTROL_PLANE.md


+ 10 - 7
docs/README.md

@@ -54,7 +54,7 @@ Requirements  |   Version
 OS pre-installed on the management station  |  Rocky 8.x/ Leap 15.x
 OS deployed by Omnia on bare-metal Dell EMC PowerEdge Servers | Rocky 8.x Minimal Edition/ Leap 15.x
 Cobbler  |  3.2.2
-Ansible AWX  |  19.4.0
+Ansible AWX  |  20.0.0
 Slurm Workload Manager  |  20.11.2
 Kubernetes on the management station  |  1.21.0
 Kubernetes on the manager and compute nodes	|	1.16.7 or 1.19.3
@@ -92,9 +92,9 @@ OpenSM	|	GNU General Public License 2	|	3.3.24	|	-
 NVIDIA container runtime	|	Apache-2.0	|	3.4.2	|	Nvidia container runtime library
 Python PIP	|	MIT License	|	21.1.2	|	Python Package
 Python3	|	-	|	3.6.8 (3.6.15 if LeapOS is being used)	|	-
-Kubelet	|	Apache-2.0	|	1.16.7,1.19, 1.21 (LeapOS only supports 1.21) 	|	Provides external, versioned ComponentConfig API types for configuring the kubelet
-Kubeadm	|	Apache-2.0	|	1.16.7,1.19, 1.21 (LeapOS only supports 1.21)	|	"fast paths" for creating Kubernetes clusters
-Kubectl	|	Apache-2.0	|	1.16.7,1.19, 1.21 (LeapOS only supports 1.21)	|	Command line tool for Kubernetes
+Kubelet	|	Apache-2.0	|	1.16.7,1.19, 1.21  	|	Provides external, versioned ComponentConfig API types for configuring the kubelet
+Kubeadm	|	Apache-2.0	|	1.16.7,1.19, 1.21 	|	"fast paths" for creating Kubernetes clusters
+Kubectl	|	Apache-2.0	|	1.16.7,1.19, 1.21 	|	Command line tool for Kubernetes
 JupyterHub	|	Modified BSD License	|	1.1.0	|	Multi-user hub
 kubernetes Controllers	|	Apache-2.0	|	1.16.7,1.19 (1.21 if LeapOS is being used)	|	Orchestration tool	
 Kfctl	|	Apache-2.0	|	1.0.2	|	CLI for deploying and managing Kubeflow
@@ -106,10 +106,10 @@ Horovod	|	Apache-2.0	|	0.21.1	|	Distributed deep learning training framework for
 MPI	|	Copyright (c) 2018-2019 Triad National Security,LLC. All rights reserved.	|	0.3.0	|	HPC library
 CoreDNS	|	Apache-2.0	|	1.6.2	|	DNS server that chains plugins
 CNI	|	Apache-2.0	|	0.3.1	|	Networking for Linux containers
-AWX	|	Apache-2.0	|	19.4.0	|	Web-based User Interface
+AWX	|	Apache-2.0	|	20.0.0	|	Web-based User Interface
 AWX.AWX	|	Apache-2.0	|	19.4.0	|	Galaxy collection to perform awx configuration
-AWXkit	|	Apache-2.0	|	to be updated	|	To perform configuration through CLI commands
-Cri-o	|	Apache-2.0	|	1.21	|	Container Service
+AWXkit	|	Apache-2.0	|	18.0.0	|	To perform configuration through CLI commands
+Cri-o	|	Apache-2.0	|	1.21, 1.17.3  (LeapOS only supports  1.17.3) |	Container Service
 Buildah	|	Apache-2.0	|	1.22.4	|	Tool to build and run containers
 PostgreSQL	|	Copyright (c) 1996-2020, PostgreSQL Global Development Group	|	10.15	|	Database Management System
 Redis	|	BSD-3-Clause License	|	6.0.10	|	In-memory database
@@ -123,6 +123,9 @@ OMSDK	|	Apache-2.0	|	1.2.488	|	Dell EMC OpenManage Python SDK (OMSDK) is a pytho
 | postfix                               | IBM Public License               | 3.5.8  | Mail Transfer Agent (MTA) designed to determine routes and   send emails                                                                       |
 | xorriso                               | GPL version 3                    | 1.4.8  | xorriso copies file objects from POSIX compliant filesystems   into Rock Ridge enhanced ISO 9660 filesystems.                                  |
 | Dell EMC   OpenManage Ansible Modules | GNU- General Public License v3.0 | 5.0.0  | OpenManage Ansible Modules simplifies and automates   provisioning, deployment, and updates of PowerEdge servers and modular   infrastructure. |
+| 389-ds                               | GPL version 3               | 1.4.4  |  LDAP server used for authentication, access control.                                                                       |
+| sssd                               | GPL version 3                    | 1.16.1  | A set of daemons used to manage access to remote directory services and authentication mechanisms.                                   |
+| krb5 | MIT License | 1.19.2  | Authentication protocol providing strong authentication for client/server applications by using secret-key cryptography |
 
 # Known issues  
 * **Issue**: Hosts are not displayed on the AWX UI.  

A diferenza do arquivo foi suprimida porque é demasiado grande
+ 25 - 0
docs/Security/ENABLE_SECURITY_LOGIN_NODE.md


A diferenza do arquivo foi suprimida porque é demasiado grande
+ 85 - 0
docs/Security/ENABLE_SECURITY_MANAGEMENT_STATION.md


+ 8 - 7
docs/Telemetry_Visualization/Visualization.md

@@ -11,17 +11,17 @@ A lot of these metrics are collected using iDRAC telemetry. iDRAC telemetry allo
 
 | Parameter Name        | Default Value | Information |
 |-----------------------|---------------|-------------|
-| timescaledb_user      | 		        |  Username used for connecting to timescale db. Minimum Legth: 2 characters.          |
-| timescaledb_password  | 		        |  Password used for connecting to timescale db. Minimum Legth: 2 characters.           |
-| mysqldb_user          | 		        |  Username used for connecting to mysql db. Minimum Legth: 2 characters.         |
-| mysqldb_password      | 		        |  Password used for connecting to mysql db. Minimum Legth: 2 characters.            |
+| timescaledb_user      | 		        |  Username used for connecting to timescale db. Minimum Length: 2 characters.          |
+| timescaledb_password  | 		        |  Password used for connecting to timescale db. Minimum Length: 2 characters.           |
+| mysqldb_user          | 		        |  Username used for connecting to mysql db. Minimum Length: 2 characters.         |
+| mysqldb_password      | 		        |  Password used for connecting to mysql db. Minimum Length: 2 characters.            |
 | mysqldb_root_password | 		        |  Password used for connecting to mysql db for root user. Minimum Legth: 2 characters.         |
 
 3. All parameters in `telemetry/input_params/base_vars.yml` need to be filled in:
 
 | Parameter Name          | Default Value     | Information |
 |-------------------------|-------------------|-------------|
-| mount_location          | idrac_telemetrysource_services_db | Sets the location all telemetry related files will be stored and both timescale and mysql databases will be mounted.            |
+| mount_location          | /opt/omnia| Sets the location all telemetry related files will be stored and both timescale and mysql databases will be mounted.            |
 | idrac_telemetry_support | true              | This variable is used to enable iDRAC telemetry support and visualizations. Accepted Values: true/false            |
 | slurm_telemetry_support | true              | This variable is used to enable slurm telemetry support and visualizations. Slurm Telemetry support can only be activated when idrac_telemetry_support is set to true. Accepted Values: True/False.        |
 | timescaledb_name        | telemetry_metrics | Postgres DB with timescale extension is used for storing iDRAC and slurm telemetry metrics.            |
@@ -50,7 +50,7 @@ Use any one of the following browsers to access the Grafana UI (https://< Grafan
 
 ## Initiating Telemetry
 
-1. Once `control_plane.yml` and `telemetry.yml` are executed, run the following commands from `omnia/telemetry`:
+1. Once `control_plane.yml` and `omnia.yml` are executed, run the following commands from `omnia/telemetry`:
 
 `ansible-playbook telemetry.yml`
 
@@ -60,7 +60,8 @@ Use any one of the following browsers to access the Grafana UI (https://< Grafan
 After initiation, new nodes can be added to telemetry by running the following commands from `omnia/telemetry`:
 		
 ` ansible-playbook add_idrac_node.yml `
-		
+
+	
 
 
 

+ 1 - 1
docs/control_plane/device_templates/PROVISION_SERVERS.md

@@ -13,7 +13,7 @@ Edit the following files under the `control_plane/input_params` directory to pro
 	File name	|	Variables</br> [Required/ Optional]	|	Default, choices	|	Description
 	-------	|	----------------	|	-----------------	|	-----------------
 	idrac_vars.yml	|	idrac_system_profile</br> [Required]	|	<ul><li>**Performance**</li> <li>PerformancePerWatt(DAPC)</li> <li>PerformancePerWatt(OS)</li> <li>WorkstationPerformance</li></ul>	|	The system profile used for BIOS configuration. 
-	<br>	|	firmware_update_required</br> [Required]	|	<ul><li>**true**</li> <li>false</li></ul>	|	By default, Omnia updates the firmware on the servers. To disable the firmware update, set the variable to "false".
+	<br>	|	firmware_update_required</br> [Required]	|	<ul><li>**false**</li> <li>true</li></ul>	|	By default, Omnia updates the firmware on the servers. To disable the firmware update, set the variable to "false".
 	<br>	|	poweredge_model</br> [Required if "firmware_update_required" is set to "true"]	|	<ul><li>**C6420**</li> <li>R640</li><li>R740</li><li>C4140</li> <li>And other supported PowerEdge servers</li></ul>	|	Enter the required PowerEdge server models to update the firmware. For example, enter `R640,R740,C4140` to update firmware on these models of PowerEdge servers. For a complete list of supported PowerEdge servers, see the *Hardware managed by Omnia* section in the Readme file.
 	<br>	|	uefi_secure_boot</br> [Optional]	|	<ul><li>**disabled**</li> <li>enabled</li></ul>	|	Option to enable or disable the secure boot mode.
 	<br>	|	system_lockdown</br> [Optional]	|	<ul><li>**disabled**</li> <li>enabled</li></ul>	|	Option to enable or disable system lockdown.

+ 1 - 1
docs/control_plane/input_parameters/PROVISION_SERVERS.md

@@ -13,7 +13,7 @@ Edit the following files under the `control_plane/input_params` directory to pro
 	File name	|	Variables</br> [Required/ Optional]	|	Default, choices	|	Description
 	-------	|	----------------	|	-----------------	|	-----------------
 	idrac_vars.yml	|	idrac_system_profile</br> [Required]	|	<ul><li>**Performance**</li> <li>PerformancePerWatt(DAPC)</li> <li>PerformancePerWatt(OS)</li> <li>WorkstationPerformance</li></ul>	|	The system profile used for BIOS configuration. 
-	<br>	|	firmware_update_required</br> [Required]	|	<ul><li>**true**</li> <li>false</li></ul>	|	By default, Omnia updates the firmware on the servers. To disable the firmware update, set the variable to "false".
+	<br>	|	firmware_update_required</br> [Required]	|	<ul><li>**false**</li> <li>true</li></ul>	|	By default, Omnia updates the firmware on the servers. To disable the firmware update, set the variable to "false".
 	<br>	|	poweredge_model</br> [Required if "firmware_update_required" is set to "true"]	|	<ul><li>**C6420**</li> <li>R640</li><li>R740</li><li>C4140</li> <li>And other supported PowerEdge servers</li></ul>	|	Enter the required PowerEdge server models to update the firmware. For example, enter `R640,R740,C4140` to update firmware on these models of PowerEdge servers. For a complete list of supported PowerEdge servers, see the *Hardware managed by Omnia* section in the Readme file.
 	<br>	|	uefi_secure_boot</br> [Optional]	|	<ul><li>**disabled**</li> <li>enabled</li></ul>	|	Option to enable or disable the secure boot mode.
 	<br>	|	system_lockdown</br> [Optional]	|	<ul><li>**disabled**</li> <li>enabled</li></ul>	|	Option to enable or disable system lockdown.

+ 30 - 0
roles/cluster_validation/tasks/install_packages.yml

@@ -0,0 +1,30 @@
+#  Copyright 2022 Dell Inc. or its subsidiaries. All Rights Reserved.
+#
+#  Licensed under the Apache License, Version 2.0 (the "License");
+#  you may not use this file except in compliance with the License.
+#  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+#  limitations under the License.
+---
+
+- name: Set fact for ansible version
+  set_fact:
+    ansible_collection_used: true
+  when: "ansible_version.full is version_compare(ansible_base_version, '>')"
+
+- name: Install netaddr
+  pip:
+    name: netaddr
+    state: present
+    executable: pip3
+
+- name: Install ansible galaxy collection ansible.utils
+  command: ansible-galaxy collection install "{{ ipaddr_collection }}"
+  changed_when: false
+  when: ansible_collection_used

+ 6 - 1
roles/cluster_validation/tasks/main.yml

@@ -27,6 +27,7 @@
     control_plane_status: false
     powervault_status: false
     nfs_node_status: false
+    ansible_collection_used: false
 
 - name: Check AWX instance
   command: awx --version
@@ -46,6 +47,10 @@
     - not awx_version_check.failed
     - awx_search_key in awx_hostname.stdout
 
+- name: Install Packages
+  include_tasks: install_packages.yml
+  when: not control_plane_status
+
 - name: Set NFS node status
   set_fact:
     nfs_node_status: true
@@ -90,4 +95,4 @@
         regexp: '#log_path = /var/log/ansible.log'
         replace: 'log_path = /var/log/omnia.log'
       when: ansible_conf_exists.stat.exists
-  when: not control_plane_status
+  when: not control_plane_status

+ 5 - 1
roles/cluster_validation/vars/main.yml

@@ -99,4 +99,8 @@ allow_deny_fail_msg: "Failed. Incorrect Access format in security_vars.yml"
 restrict_program_support_success_msg: "restrict_program_support successfully validated"
 restrict_program_support_failure_msg: "Failed. Accepted values are true or false."
 restrict_softwares_success_msg: "restrict_softwares successfully validated"
-restrict_softwares_failure_msg: "Warning. Values should be comma separated. The supported services are telnet, lpd, bluetooth, rlogin, rexec. Please check restrict_softwares variable"
+restrict_softwares_failure_msg: "Warning. Values should be comma separated. The supported services are telnet, lpd, bluetooth, rlogin, rexec. Please check restrict_softwares variable"
+
+# Usage: install_packages.yml
+ansible_base_version: '2.9'
+ipaddr_collection: ansible.utils:2.5.2

+ 6 - 0
roles/slurm_manager/tasks/main.yml

@@ -120,6 +120,12 @@
 - name: Get network address/subnet mask
   set_fact:
     network_address: "{{ (ansible_default_ipv4.network + '/' + ansible_default_ipv4.netmask) | ipaddr('network/prefix') }}"
+  when: not hostvars['127.0.0.1']['ansible_collection_used']
+
+- name: Get network address/subnet mask
+  set_fact:
+    network_address: "{{ (ansible_default_ipv4.network + '/' + ansible_default_ipv4.netmask) | ansible.utils.ipaddr('network/prefix') }}"
+  when: hostvars['127.0.0.1']['ansible_collection_used']
 
 - name: Firewall rule slurm - allow all incoming traffic on internal network
   firewalld:

+ 1 - 1
telemetry/roles/slurm_telemetry/tasks/get_node_inventory.yml

@@ -39,7 +39,7 @@
     register: awx_svc_ip
 
   - name: AWX needs to be installed
-   fail:
+    fail:
       msg: "{{ awx_fail_msg }}"
     when: not awx_svc_ip.stdout
 

+ 75 - 73
telemetry/roles/slurm_telemetry/tasks/update_service_tags.yml

@@ -1,4 +1,4 @@
- Copyright 2022 Dell Inc. or its subsidiaries. All Rights Reserved.
+# Copyright 2022 Dell Inc. or its subsidiaries. All Rights Reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
@@ -34,83 +34,85 @@
   - name: Assert slurmctld status
     fail:
       msg: "{{ slurmctld_status_fail_msg }}"
-    when: not slurm_service
+    when: not hostvars[groups['manager'][0]]['slurm_service']
 
   - name: Prepare input config file
     block:
     - name: Get service tag
-        shell: >
+      shell: >
           set -o pipefail && \
           dmidecode -t 1 | grep Serial
-        changed_when: false
-        register: service_tag_details
-
-      - name: Set fact service tag
-        set_fact:
-          service_tag: "{{ service_tag_details.stdout.split(':')[1].strip() }}"
-
-      - name: Get the hostname
-        command: hostname
-        register: machine_hostname
-        changed_when: false
-
-      - name: Update Head Node IP
-        replace:
-          path: "{{ role_path }}{{ monster_input_file_path }}"
-          regexp: '  ip:.*'
-          replace: "  ip: {{ groups['manager'][0] }}"
-        delegate_to: localhost
-
-      - name: Update Head Node hostname
-        replace:
-          path: "{{ role_path }}{{ monster_input_file_path }}"
-          regexp: '  headnode:.*'
-          replace: "  headnode: {{ hostvars[groups['manager'][0]]['machine_hostname'].stdout }}"
-        delegate_to: localhost
-
-      - name: Update nodes hostnames
-        lineinfile:
-          path: "{{ role_path }}{{ monster_input_file_path }}"
-          line: "  {{ machine_hostname.stdout }}: {{ inventory_hostname }}"
-          insertafter: "hostnames:"
-        delegate_to: localhost
-
-      - name: Update service tag info
-        lineinfile:
-          path: "{{ role_path }}{{ monster_input_file_path }}"
-          line: "  - Servicetag: {{ service_tag }}\n    Os_Ip_Addr: {{ inventory_hostname }}"
-          insertafter: "clusternodes:"
-        delegate_to: localhost
-
-      - name: Copy initialization file
-        copy:
-          src: "{{ role_path }}/files/init_k8s_pod_local.sh"
-          dest: "{{ role_path }}/files/init_k8s_pod.sh"
-          mode: "{{ monster_config_file_mode }}"
-
-      - name: Update manager node details in init_k8s_pod.sh
-        replace:
-          path: "{{ role_path }}/files/init_k8s_pod.sh"
-          regexp: echo 'manager_node_ip manager_node_hostname' >> /etc/hosts
-          replace: echo '{{ inventory_hostname }} {{ machine_hostname.stdout }}' >> /etc/hosts
-        delegate_to: localhost
-        when: manager_group in group_names
-
-      - name: Update manager node IP in init_k8s_pod.sh
-        replace:
-          path: "{{ role_path }}/files/init_k8s_pod.sh"
-          regexp: ssh-keyscan -H manager_node_hostname >> /root/.ssh/known_hosts
-          replace: ssh-keyscan -H {{ machine_hostname.stdout }} >> /root/.ssh/known_hosts
-        delegate_to: localhost
-        when: manager_group in group_names
-
-      - name: Update manager node IP in init_k8s_pod.sh
-        replace:
-          path: "{{ role_path }}/files/init_k8s_pod.sh"
-          regexp: sshpass -p 'os_passwd' ssh-copy-id 'root@manager_node_ip'
-          replace: sshpass -p "{{ hostvars['127.0.0.1']['provision_password'] }}" ssh-copy-id 'root@{{ inventory_hostname }}'
-        delegate_to: localhost
-        when: manager_group in group_names
+      changed_when: false
+      register: service_tag_details
+
+    - name: Set fact service tag
+      set_fact:
+        service_tag: "{{ service_tag_details.stdout.split(':')[1].strip() }}"
+
+    - name: Get the hostname
+      command: hostname
+      register: machine_hostname
+      changed_when: false
+
+    - name: Update Head Node IP
+      replace:
+        path: "{{ role_path }}{{ monster_input_file_path }}"
+        regexp: '  ip:.*'
+        replace: "  ip: {{ groups['manager'][0] }}"
+      delegate_to: localhost
+
+    - name: Update Head Node hostname
+      replace:
+        path: "{{ role_path }}{{ monster_input_file_path }}"
+        regexp: '  headnode:.*'
+        replace: "  headnode: {{ hostvars[groups['manager'][0]]['machine_hostname'].stdout }}"
+      delegate_to: localhost
+
+    - name: Update nodes hostnames
+      lineinfile:
+        path: "{{ role_path }}{{ monster_input_file_path }}"
+        line: "  {{ machine_hostname.stdout }}: {{ inventory_hostname }}"
+        insertafter: "hostnames:"
+      delegate_to: localhost
+
+    - name: Update service tag info
+      lineinfile:
+        path: "{{ role_path }}{{ monster_input_file_path }}"
+        line: "  - Servicetag: {{ service_tag }}\n    Os_Ip_Addr: {{ inventory_hostname }}"
+        insertafter: "clusternodes:"
+      delegate_to: localhost
+
+    - name: Copy initialization file
+      copy:
+        src: "{{ role_path }}/files/init_k8s_pod_local.sh"
+        dest: "{{ role_path }}/files/init_k8s_pod.sh"
+        mode: "{{ monster_config_file_mode }}"
+      delegate_to: localhost
+      when: manager_group in group_names
+
+    - name: Update manager node details in init_k8s_pod.sh
+      replace:
+        path: "{{ role_path }}/files/init_k8s_pod.sh"
+        regexp: echo 'manager_node_ip manager_node_hostname' >> /etc/hosts
+        replace: echo '{{ inventory_hostname }} {{ machine_hostname.stdout }}' >> /etc/hosts
+      delegate_to: localhost
+      when: manager_group in group_names
+
+    - name: Update manager node IP in init_k8s_pod.sh
+      replace:
+        path: "{{ role_path }}/files/init_k8s_pod.sh"
+        regexp: ssh-keyscan -H manager_node_hostname >> /root/.ssh/known_hosts
+        replace: ssh-keyscan -H {{ machine_hostname.stdout }} >> /root/.ssh/known_hosts
+      delegate_to: localhost
+      when: manager_group in group_names
+
+    - name: Update manager node IP in init_k8s_pod.sh
+      replace:
+        path: "{{ role_path }}/files/init_k8s_pod.sh"
+        regexp: sshpass -p 'os_passwd' ssh-copy-id 'root@manager_node_ip'
+        replace: sshpass -p "{{ hostvars['127.0.0.1']['provision_password'] }}" ssh-copy-id 'root@{{ inventory_hostname }}'
+      delegate_to: localhost
+      when: manager_group in group_names
 
     when: hostvars[groups['manager'][0]]['slurm_service']
-  when: slurm_telemetry_support
+  when: hostvars['127.0.0.1']['slurm_telemetry_support']