-
Notifications
You must be signed in to change notification settings - Fork 3
test 09 First version tested and working #149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
amemon-redhat
wants to merge
10
commits into
sap-linuxlab:main
Choose a base branch
from
amemon-redhat:test09
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 1 commit
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
081d381
test 09 First version tested and working
amemon-redhat daa8c7e
removed repeated restart_program_check verifications
amemon-redhat 0fb8fe4
added sapstart and INSTANCENAME in cspell-words
amemon-redhat 1c09231
Merge branch 'main' into test09
amemon-redhat 5463aa7
changed to stop using internal variables, started using public variables
amemon-redhat 3915121
Merge branch 'main' into test09
amemon-redhat 727a3dc
lint fixes and removal of repeated tasks
amemon-redhat b8edbc8
new line char in playbook
amemon-redhat 4a2fd8a
consolidated many repeated tasks for fact setting, full round of test…
amemon-redhat 2263ba2
trailing spaces aligned with lint
amemon-redhat File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,11 @@ | ||
| --- | ||
| - name: Running TEST09 test role on the S4/HANA Cluster | ||
| hosts: all | ||
| gather_facts: false | ||
| become: true | ||
| become_user: root | ||
| any_errors_fatal: false | ||
| tasks: | ||
| - name: Running TEST09 test role on the S4/HANA Cluster | ||
| ansible.builtin.include_role: | ||
| name: sap.cluster_qa.test09 |
100 changes: 100 additions & 0 deletions
100
ansible_collections/sap/cluster_qa/roles/test09/README.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,100 @@ | ||
| test09 | ||
| ========= | ||
|
|
||
| This role tests the SAP Message Server automatic restart mechanism and its interaction with the HA solution. It verifies that recoverable Message Server outages are handled correctly by the SAP Start Service and that unrecoverable failures trigger appropriate HA responses. | ||
|
|
||
| **Test Purpose:** | ||
| - **Verify Restart_Program parameter configuration** for Message Server (auto-configure if missing) | ||
| - Verify Message Server automatic restart functionality via SAP Start Service | ||
| - Test interaction between SAP automatic restart and HA solution | ||
| - Ensure HA solution responds appropriately when automatic restart fails | ||
| - Validate that ASCS failover respects ERS location constraints | ||
|
|
||
| **Test Procedure:** | ||
| 1. **Validate Restart_Program parameter** is configured in ASCS profile (auto-insert if missing) | ||
| 2. Kill Message Server process repeatedly (up to 6 times by default) | ||
| 3. Monitor SAP Start Service automatic restart behavior | ||
| 4. Verify HA solution response when automatic restart threshold is exceeded | ||
| 5. Ensure ASCS and ERS remain on different nodes throughout | ||
|
|
||
| Requirements | ||
| ------------ | ||
|
|
||
| A 3 or more node pacemaker cluster managing S4/HANA ASCS and ERS Instances using the `SAPInstance` resource agent with the SAP HA interface for SAP ABAP application server instances as mentioned in: https://access.redhat.com/solutions/3606101. | ||
|
|
||
| **Prerequisites:** | ||
| - **SAP Profile Parameter "Restart_Program" must be configured for Message Server** (auto-configured by test if missing) | ||
| - SAP system running in stable mode with HA solution activated | ||
| - 3+ node cluster setup required | ||
|
|
||
| **Reference:** [SAP Support Content: Message Server Restart](https://help.sap.com/docs/SUPPORT_CONTENT/si/3362959619.html?locale=en-US) | ||
|
|
||
| **Restart_Program Configuration Example:** | ||
| ``` | ||
| Restart_Program_01 = local $(DIR_EXECUTABLE)/msg_server pf=$(DIR_PROFILE)/$(SAPSYSTEMNAME)_$(INSTANCE_NAME)_$(HOSTNAME) | ||
| ``` | ||
|
|
||
| Role Variables | ||
| -------------- | ||
|
|
||
| This role uses variables provided by the `sap.cluster_qa.pcs_find_ascs` and `sap.cluster_qa.pcs_find_ers` roles: | ||
| - `sap_ascs_node_name` - The node where ASCS is currently running | ||
| - `sap_ers_node_name` - The node where ERS is currently running | ||
| - `sap_ascs_resource_name` - The name of the ASCS resource in the cluster | ||
| - `sap_ascs_instance_number` - The ASCS instance number | ||
| - `max_kill_attempts` - Maximum Message Server kill attempts (default: 6) | ||
| - `__pcs_find_ascs_sap_ascs_start_profile.stdout` - Path to ASCS profile file (used for Restart_Program validation) | ||
|
|
||
| **Expected Outcomes:** | ||
| - **Restart_Program parameter validation passes** (auto-configured if missing) | ||
| - Message Server restarts automatically via SAP Start Service (recoverable errors) | ||
| - Process ID changes with each restart | ||
| - Restart events logged in sapstartsrv.log/sapstart.log | ||
| - HA solution triggers ASCS restart/failover after restart threshold exceeded | ||
| - ASCS never moves to ERS node | ||
|
|
||
| **Auto-Configuration Feature:** | ||
| If the `Restart_Program` parameter is not found, the test will automatically: | ||
| - Check for existing `Start_Program` parameter for Message Server | ||
| - **Replace `Start_Program` with `Restart_Program`** if found (to avoid conflicts) | ||
| - Insert `_MS = ms.sap$(SAPSYSTEMNAME)_$(INSTANCE_NAME)` variable definition if needed | ||
| - Add `Restart_Program_00 = local $(_MS) pf=$(_PF)` parameter if no existing Start_Program | ||
| - Create backup of original profile before modification | ||
| - **Restart sapstartsrv service** to apply the new configuration | ||
| - **Wait for cluster to detect ASCS resource failures** after service restart | ||
| - **Wait for ASCS resource to be fully started** by the cluster | ||
| - **Re-discover ASCS location** after cluster recovery (may cause failover) | ||
| - Verify successful configuration before proceeding | ||
|
|
||
| **Important Note:** When the `Restart_Program` parameter is automatically configured, the sapstartsrv service will be restarted, which causes the cluster to detect resource failures and may trigger ASCS failover to another node. The test intelligently waits for complete cluster recovery before proceeding. | ||
|
|
||
| **Configuration Logic:** | ||
| - If `Start_Program_XX = local $(_MS) pf=$(_PF)` exists → Replace with `Restart_Program_00 = local $(_MS) pf=$(_PF)` | ||
| - If no Start_Program exists → Add both `_MS` variable and `Restart_Program_00` parameter | ||
|
|
||
| Dependencies | ||
| ------------ | ||
|
|
||
| - `sap.cluster_qa.pcs_find_ascs` - Required to locate the ASCS node and resource information | ||
| - `sap.cluster_qa.pcs_find_ers` - Required to locate the ERS node and resource information | ||
| - `sap.sap_operations` - Required for host_info and pcs_status_info modules | ||
|
|
||
| Example Playbook | ||
| ---------------- | ||
|
|
||
| Including an example of how to use your role (for instance, with variables passed in as parameters) is always nice for users too: | ||
|
|
||
| - hosts: servers | ||
| roles: | ||
| - sap.cluster_qa.test09 | ||
|
|
||
| License | ||
| ------- | ||
|
|
||
| GPLv3 | ||
|
|
||
| Author Information | ||
| ------------------ | ||
|
|
||
| Amir Memon (@amemon-redhat) | ||
| Kirill Satarin (@kksat) |
14 changes: 14 additions & 0 deletions
14
ansible_collections/sap/cluster_qa/roles/test09/meta/main.yml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,14 @@ | ||
| --- | ||
| galaxy_info: | ||
| author: Amir Memon (@amemon-redhat) | ||
| description: Run test09 - Message Server automatic restart and HA interaction test | ||
| license: GPl-3.0-only | ||
| min_ansible_version: "2.15" | ||
| platforms: | ||
| - name: EL | ||
| versions: | ||
| - "8" | ||
| - "9" | ||
| - "10" | ||
| galaxy_tags: [] | ||
| dependencies: [] |
139 changes: 139 additions & 0 deletions
139
ansible_collections/sap/cluster_qa/roles/test09/tasks/kill_message_server.yml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,139 @@ | ||
| --- | ||
| - name: Check if ASCS node is responsive before attempting kill | ||
| ansible.builtin.ping: | ||
| register: ascs_node_ping | ||
| failed_when: false | ||
| when: ansible_hostname == sap_ascs_node_name_initial | ||
|
|
||
| - name: Skip kill attempt if ASCS node is unresponsive | ||
| ansible.builtin.set_fact: | ||
| msg_server_restarted: false | ||
| when: | ||
| - ansible_hostname == sap_ascs_node_name_initial | ||
| - ascs_node_ping is defined | ||
| - ascs_node_ping.failed | default(false) | ||
|
|
||
| - name: Display node responsiveness status | ||
| ansible.builtin.debug: | ||
| msg: "Kill attempt {{ kill_attempt }}: ASCS node {{ sap_ascs_node_name_initial }} responsiveness = {{ 'RESPONSIVE' if (ascs_node_ping.ping is defined and ascs_node_ping.ping == 'pong') else 'UNRESPONSIVE' }}" | ||
kksat marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| when: ansible_hostname == sap_ascs_node_name_initial | ||
|
|
||
| - name: Get current Message Server process info | ||
| sap.sap_operations.host_info: | ||
| register: current_ascs_host_info | ||
| when: | ||
| - ansible_hostname == sap_ascs_node_name_initial | ||
| - ascs_node_ping.ping is defined | ||
| - ascs_node_ping.ping == 'pong' | ||
|
|
||
| - name: Store current Message Server PID | ||
| ansible.builtin.set_fact: | ||
| current_msg_server_pid: "{{ (msg_server_process_list | selectattr('name', 'equalto', 'msg_server') | first)['pid'] if msg_server_process_list | length > 0 else 'NO_INSTANCE' }}" | ||
| previous_msg_server_pid: "{{ previous_msg_server_pid | default('none') }}" | ||
| ascs_instance_found: "{{ ascs_instance_list | length > 0 }}" | ||
| vars: | ||
| ascs_instance_list: "{{ current_ascs_host_info.instances | selectattr('mSystemNumber', 'equalto', sap_ascs_instance_number) | list }}" | ||
| msg_server_process_list: "{{ (ascs_instance_list | first)['ProcessList'] | selectattr('name', 'equalto', 'msg_server') | list if ascs_instance_list | length > 0 else [] }}" | ||
| when: | ||
| - ansible_hostname == sap_ascs_node_name_initial | ||
| - current_ascs_host_info is defined | ||
| - current_ascs_host_info.instances is defined | ||
|
|
||
| - name: Handle case when ASCS instance not found | ||
| ansible.builtin.set_fact: | ||
| current_msg_server_pid: "NO_INSTANCE" | ||
| ascs_instance_found: false | ||
| msg_server_restarted: false | ||
| when: | ||
| - ansible_hostname == sap_ascs_node_name_initial | ||
| - current_ascs_host_info is defined | ||
| - (current_ascs_host_info.instances | default([]) | selectattr('mSystemNumber', 'equalto', sap_ascs_instance_number) | list | length == 0) | ||
|
|
||
| - name: Display Message Server PID info | ||
| ansible.builtin.debug: | ||
| msg: | | ||
| Kill attempt {{ kill_attempt }}: | ||
| - ASCS instance found: {{ ascs_instance_found | default(false) }} | ||
| - Current PID: {{ current_msg_server_pid | default('N/A') }} | ||
| - Previous PID: {{ previous_msg_server_pid | default('none') }} | ||
| {% if not (ascs_instance_found | default(false)) %} | ||
| - WARNING: ASCS instance {{ sap_ascs_instance_number }} not found in process list | ||
| {% endif %} | ||
| when: ansible_hostname == sap_ascs_node_name_initial | ||
|
|
||
| - name: Killing the Message Server process | ||
| ansible.builtin.command: "kill -9 {{ current_msg_server_pid }}" | ||
| changed_when: true | ||
| when: | ||
| - ansible_hostname == sap_ascs_node_name_initial | ||
| - current_msg_server_pid is defined | ||
| - current_msg_server_pid != "NO_INSTANCE" | ||
| - ascs_instance_found | default(false) | ||
| - ascs_node_ping.ping is defined | ||
| - ascs_node_ping.ping == 'pong' | ||
|
|
||
| - name: Update kill counter | ||
| ansible.builtin.set_fact: | ||
| message_server_kill_count: "{{ kill_attempt }}" | ||
| previous_msg_server_pid: "{{ current_msg_server_pid | default('unknown') }}" | ||
|
|
||
| - name: Wait for SAP automatic restart or HA intervention | ||
| ansible.builtin.pause: | ||
| seconds: 30 | ||
| prompt: "Waiting for SAP automatic restart or HA intervention after kill attempt {{ kill_attempt }}" | ||
|
|
||
| - name: Check if ASCS resource is still running on original node | ||
| sap.sap_operations.pcs_status_info: | ||
| register: ascs_status_check | ||
| run_once: true | ||
|
|
||
| - name: Verify ASCS resource status | ||
| ansible.builtin.set_fact: | ||
| ascs_still_on_original_node: "{{ ascs_status_check | sap.sap_operations.pcs_resources_from_status(role='Started', id=sap_ascs_resource_name) | length > 0 }}" | ||
| run_once: true | ||
|
|
||
| - name: Check if Message Server process is running again | ||
| sap.sap_operations.host_info: | ||
| register: restart_check_host_info | ||
| failed_when: false | ||
| when: | ||
| - ansible_hostname == sap_ascs_node_name_initial | ||
| - ascs_still_on_original_node | bool | ||
|
|
||
| - name: Determine if Message Server restarted automatically | ||
| ansible.builtin.set_fact: | ||
| msg_server_restarted: "{{ (restart_msg_server_list | length > 0) }}" | ||
| restart_ascs_instance_found: "{{ restart_ascs_instance_list | length > 0 }}" | ||
| vars: | ||
| restart_ascs_instance_list: "{{ restart_check_host_info.instances | default([]) | selectattr('mSystemNumber', 'equalto', sap_ascs_instance_number) | list }}" | ||
| restart_msg_server_list: "{{ (restart_ascs_instance_list | first)['ProcessList'] | selectattr('name', 'equalto', 'msg_server') | list if restart_ascs_instance_list | length > 0 else [] }}" | ||
| when: | ||
| - ansible_hostname == sap_ascs_node_name_initial | ||
| - ascs_still_on_original_node | bool | ||
| - restart_check_host_info is defined | ||
| - not (restart_check_host_info.failed | default(false)) | ||
|
|
||
| - name: Set restart status to false if ASCS moved or node unresponsive | ||
| ansible.builtin.set_fact: | ||
| msg_server_restarted: false | ||
| when: | ||
| - not (ascs_still_on_original_node | bool) or | ||
| (ansible_hostname == sap_ascs_node_name_initial and (ascs_node_ping.failed | default(false))) | ||
|
|
||
| - name: Display restart status | ||
| ansible.builtin.debug: | ||
| msg: | | ||
| After kill {{ kill_attempt }}: | ||
| - Message Server restarted: {{ msg_server_restarted | default(false) }} | ||
| - ASCS on original node: {{ ascs_still_on_original_node }} | ||
| - ASCS instance found during kill: {{ ascs_instance_found | default(false) }} | ||
| - ASCS instance found during restart check: {{ restart_ascs_instance_found | default(false) }} | ||
| when: ansible_hostname == sap_ascs_node_name_initial | ||
|
|
||
| - name: Set global fact to stop further iterations if Message Server stopped restarting | ||
| ansible.builtin.set_fact: | ||
| msg_server_restarted: false | ||
| when: | ||
| - (not (msg_server_restarted | default(false) | bool)) or | ||
| (not (ascs_instance_found | default(true) | bool)) | ||
| - kill_attempt | int >= 2 | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.