Add logs to get time taken by assessment job and install updates job #161

GAURAVRAMRAKHYANI · 2022-12-15T13:55:49Z

Background: Current methods to query core execution time don’t scale well in Kusto. A single-line log can be queried better at scale across machines.

Added logs to get time taken by assessment and install updates jobs. Added relevant details in logs like patch service used (apt, yum or zypper), machine ram, machine cpu, OS details, etc.

Following are the examples of the added log for assessment and install updates job:

Assessment:

{'job': 'Assessment', 'startTime': '2022-12-15 13:40:34.314455', 'endTime': '2022-12-15 13:40:41.887585', 'patchService': 'AptitudePackageManager', 'machineDetails': {'platform_name': 'Ubuntu', 'platform_version': '20.04', 'machine_cpu': 'Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz', 'machine_arch': 'x86_64', 'disk_type': 'Hard drive'}} [TC=105]

Install Updates:

{'job': 'Install Updates', 'startTime': '2022-12-15 12:36:18.153067', 'endTime': '2022-12-15 12:56:34.743747', 'installed_update_count': '13', 'update_run_successful': 'True', 'maintenance_window': '3:55:00', 'maintenance_window_exceeded': 'False', 'patchService': 'AptitudePackageManager', 'machineDetails': {'platform_name': 'Ubuntu', 'platform_version': '20.04', 'machine_cpu': 'Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz', 'machine_arch': 'x86_64', 'disk_type': 'Hard drive'}} [TC=1216]

Manual testings performed:
Performed Check for updates and One-time update and checked that the new logs added are present in the logs file.

… Added relevant details in logs like patch service used (apt, yum or zypper), machine ram, machine cpu, OS details, etc.

codecov · 2022-12-15T14:00:51Z

Codecov Report

Merging #161 (6ca8ce0) into master (d192d35) will increase coverage by 0.22%.
The diff coverage is 98.95%.

@@            Coverage Diff             @@
##           master     #161      +/-   ##
==========================================
+ Coverage   88.52%   88.75%   +0.22%     
==========================================
  Files          86       88       +2     
  Lines       12598    12881     +283     
==========================================
+ Hits        11152    11432     +280     
- Misses       1446     1449       +3

Flag	Coverage Δ
python27	`87.23% <98.95%> (+0.38%)`	⬆️
python39	`88.72% <98.95%> (+0.22%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
src/core/src/core_logic/PatchInstaller.py	`85.39% <87.50%> (+0.13%)`	⬆️
src/core/tests/Test_Stopwatch.py	`99.03% <99.03%> (ø)`
src/core/src/bootstrap/Constants.py	`98.52% <100.00%> (+0.15%)`	⬆️
src/core/src/core_logic/MaintenanceWindow.py	`87.09% <100.00%> (+5.70%)`	⬆️
src/core/src/core_logic/PatchAssessor.py	`99.35% <100.00%> (+0.04%)`	⬆️
src/core/src/core_logic/Stopwatch.py	`100.00% <100.00%> (ø)`
src/core/src/service_interfaces/TelemetryWriter.py	`74.09% <100.00%> (-1.17%)`	⬇️
src/core/tests/Test_MaintenanceWindow.py	`98.66% <100.00%> (+0.51%)`	⬆️
src/core/tests/Test_PatchAssessor.py	`99.20% <100.00%> (+0.16%)`	⬆️
src/core/tests/Test_PatchInstaller.py	`99.55% <100.00%> (+0.07%)`	⬆️
... and 2 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

rane-rajasi · 2022-12-30T01:59:45Z

A few questions and my thoughts here:

Adding new log statements to determine assessment and install jobs duration, seems a straightforward method. However, we already have the issue of the overall operation being slow because of the agent's restrictions in consuming telemetry events. Even though these are just 2 logs statements, it still contributes to the reduced pace. Most of the information added in these new logs is redundant (machine info is already logged, start and endtime are added in the status file)

Question:

Are we only looking at extension telemetry to determine assessment or installation duration? Since this info is already available in status file, can that be used instead? We won't need any code changes in that case

src/core/src/core_logic/PatchAssessor.py

GAURAVRAMRAKHYANI · 2023-01-03T07:44:46Z

A few questions and my thoughts here:

Adding new log statements to determine assessment and install jobs duration, seems a straightforward method. However, we already have the issue of the overall operation being slow because of the agent's restrictions in consuming telemetry events. Even though these are just 2 logs statements, it still contributes to the reduced pace. Most of the information added in these new logs is redundant (machine info is already logged, start and endtime are added in the status file)

Question:

Are we only looking at extension telemetry to determine assessment or installation duration? Since this info is already available in status file, can that be used instead? We won't need any code changes in that case

Yes, the information which we are logging here are already there in different places but to get all the related information for a job, currently we need join in kusto queries which is not scalable. We are logging all the required information in one log so that we don't need join in kusto queries. We need efficient kusto queries because we are going to use kusto queries in future to calculate improvements that we are going to do in the script.

src/core/src/core_logic/PatchAssessor.py

src/core/src/core_logic/PatchInstaller.py

src/core/src/core_logic/PatchAssessor.py

src/core/src/service_interfaces/TelemetryWriter.py

src/core/src/core_logic/PatchInstaller.py

kjohn-msft · 2023-01-04T00:20:14Z

Please include test code to mock and validate changed functionality - you can do this after nailing down comments in the prod code.

kjohn-msft · 2023-01-04T00:30:40Z

A few questions and my thoughts here:

Adding new log statements to determine assessment and install jobs duration, seems a straightforward method. However, we already have the issue of the overall operation being slow because of the agent's restrictions in consuming telemetry events. Even though these are just 2 logs statements, it still contributes to the reduced pace. Most of the information added in these new logs is redundant (machine info is already logged, start and endtime are added in the status file)

Question:

Are we only looking at extension telemetry to determine assessment or installation duration? Since this info is already available in status file, can that be used instead? We won't need any code changes in that case

To @rane-rajasi's note here (which I only saw after I wrote other comments), there's no need to write the existing telemetry event for machine_info if what's being written here happens reliably. Rajasi please review if that mitigates some of the valid concern raised.

kjohn-msft

Comments are inline in PR.

(a) Creating separate class Stopwatch (b) Renaming variable names to be consistent (c) Renaming write_machine_config_info() to set_machine_config_info() as we are not logging machine_info in this function. (d) Creating constant for strings which are used in logs as they are referred multiple places. (e) Pre calculating time taken by task and percentage of maintenance window used instead of calculating them in kusto query. (f) Adding number_of_tries (for assessment) and number_of_rounds (for patch installation) (g) Removing import TelemetryWriter as it is already available in PatchAssessor and PatchInstaller

…iter is already available

src/core/src/core_logic/PatchAssessor.py

src/core/src/core_logic/PatchInstaller.py

src/core/tests/Test_PatchAssessor.py

rane-rajasi

Comments inline

(a) Using numbered index placeholder (b) Do not explicitly throw exception if total_time_in_minutes == 0 in get_percentage_maintenance_window_used

…on't want script to stop abruptly just due to some exception in writing performance logs.

(b) Add test cases

Done code changes for Aptitude package manager and did manual testing. Code changes for Yum and Zipper is pending should be completed by next week. Adding unit tests is pending. Also, due to signature change of the method install_update_and_dependencies, there are some tests failing. I will work on those tests once we confirm the signature of the method. Using Stopwatch implementation in this PR. The implementation of Stopwatch is in the PR #161 which is going to be merged soon.

kjohn-msft · 2023-02-13T17:25:17Z

@GAURAVRAMRAKHYANI - can you please merge the latest from master please? There shouldn't be any conflict but it helps with the PR. @rane-rajasi please consider you're weekly planning and provide an ETA for final review. The ETA from me will be EOD business hours PST. @SathishMSFT, please review and sign off if you're satisfied with the changes.

GAURAVRAMRAKHYANI · 2023-02-13T18:11:08Z

@GAURAVRAMRAKHYANI - can you please merge the latest from master please? There shouldn't be any conflict but it helps with the PR. @rane-rajasi please consider you're weekly planning and provide an ETA for final review. The ETA from me will be EOD business hours PST. @SathishMSFT, please review and sign off if you're satisfied with the changes.

Merged latest master branch

src/core/src/core_logic/PatchAssessor.py

rane-rajasi

Left one comment on write_assessment_perf_logs()

src/core/src/core_logic/PatchInstaller.py

…opwatch functions are called as now there are no exceptions thrown in stopwatch functions

…thod

SathishMSFT

Looks good to me.

src/core/src/bootstrap/Constants.py

src/core/src/core_logic/PatchAssessor.py

src/core/src/core_logic/PatchInstaller.py

* Implement patching in batches to improve performance. Done code changes for Aptitude package manager and did manual testing. Code changes for Yum and Zipper is pending should be completed by next week. Adding unit tests is pending. Also, due to signature change of the method install_update_and_dependencies, there are some tests failing. I will work on those tests once we confirm the signature of the method. Using Stopwatch implementation in this PR. The implementation of Stopwatch is in the PR #161 which is going to be merged soon. * Fixed an issue in is_packages_install_time_available i * Code changes for yum and zypper package manager for parallel patching. Did manual testing and checked that packages are getting installed and the installation status of the packages are shown correctly in the portal. * For yum, add packages with different architecture in list of dependencies. * Correct function call in Yum for extract_dependencies * Following changes are done in this iteration. All the existing unit tests are succeeding now with these changes. (a) updated function name is_packages_install_time_available to is_package_install_time_available (b) Made reboot_manager mandatory in the function is_packages_install_time_available (c) Created a function install_update_and_dependencies_and_get_status * (a) Handled the scenario that if there is not enough remaining time to install packages in batch then do not stop the complete process of installing patches. The reamaining packages can be installed sequentially and there is already check for maintenance window cutoff time when installing packages sequentially so that the installation is stopped when there is not enough remaining time in maintenance window for installing package. Tested this scenario. (b) After batch patching, there is call get_remaining_packages_to_install. The time to execute get_remaining_packages_to_install should not be counted in time taken by nistalling packages in batches. Hence, stopping the stopwatch before calling get_remaining_packages_to_install and writing batch installation details in the telemetry after the call get_remaining_packages_to_install so that we also get the remaining packages to install in the telemetry. Tested this scenario. * Added test test_is_package_install_time_available * Addressed the comments. Some of the changes done are: (a) using self.execution_config.reboot_setting instead of reboot_manager.reboot_setting (b) Reordering import of modules (c) Moving add_arch_dependencies_yum funciton to package_manager and updating its name to add_arch_dependencies * (a) Added unit tests for code coverage (b) refactored the method get_dependent_list. Removed that method include_dependencies as it is not required * include_dependencies method from apt and zypper package managers Using list() instead of copy() as copy is not supported in python2.7 * updated UA_ESM_REQUIRED to UA_ESM_Required in the test test_skip_package_version_UA_ESM_REQUIRED * Added two unit tests: (a) test_dependent_package_excluded (b) test_arch_dependency_install_success * Add three new test cases: test_dependent_package_excluded_and_not_enough_time_for_batch_patching test_no_updates_to_install test_get_remaining_packages_to_install * Added 3 new test cases: test_include_dependency_apt test_include_dependency_yum test_include_dependency_zypper * Incorporated PR comments. Mostly renaming of variables and edit logs. Using package in new code instead of patch and update except for already present variables and functions. * Incorporated PR comments. Some of the changes are: (a) Updated <>_parent_update_count to <>_parent_package_install_count (b) Removed the log ("\n\nInstalling patches in sequence...") (c) Added a prefix string in the performance logs for install package, install batch and the process of install packages in batches. This will help in kusto queries. * Incorporating PR comments * Call self.package_manager.is_reboot_pending() instead of self.is_reboot_pending() after merge from master because is_reboot_pending() is moved to package_manager * Incorporate PR comments * Incorporated PR comments * The changes contains following: (a) Incorporated PR comments (b) Using all_package_versions instead of package_versions while finding version of the dependent package in the function include_dependencies in the file PatchInstaller.py. The list all_package_versions contains versions of all the packages available to install but the list package_versions contains only those packages which are selected by user. There might be dependency on the package which is not selected by user and hence all_package_versions should be used. (c) Making a list containing not attempted packages and failed packages in batch patching process. In fallback to sequential patching, this list is used as the input packages to install. The method get_remaining_packages_to_install is no longer required. But due to this change, the newly available packages in brief time while batch patching was done are not handled. This was the case before this PR also, the newly added packages while installing the packages were not handled. To make it simple, it is better to not handle that case in this big PR. It is difficult to test this scenario. * Changes in this commit: (a) Addressing PR comments (b) Adding some comments in the code.

Add logs to get time taken by assessment job and install updates job.…

ac90666

… Added relevant details in logs like patch service used (apt, yum or zypper), machine ram, machine cpu, OS details, etc.

GAURAVRAMRAKHYANI requested review from kjohn-msft, najams and rane-rajasi as code owners December 15, 2022 13:55

GAURAVRAMRAKHYANI changed the title ~~Add logs to get time taken by assessment job and install updates job.…~~ Add logs to get time taken by assessment job and install updates job Dec 15, 2022

GAURAVRAMRAKHYANI requested review from SathishMSFT and nikhim-um December 15, 2022 14:01

rane-rajasi reviewed Dec 30, 2022

View reviewed changes

src/core/src/core_logic/PatchAssessor.py Outdated Show resolved Hide resolved

kjohn-msft reviewed Jan 3, 2023

View reviewed changes

src/core/src/core_logic/PatchAssessor.py Outdated Show resolved Hide resolved

kjohn-msft reviewed Jan 3, 2023

View reviewed changes

src/core/src/core_logic/PatchInstaller.py Outdated Show resolved Hide resolved

kjohn-msft reviewed Jan 3, 2023

View reviewed changes

src/core/src/core_logic/PatchAssessor.py Outdated Show resolved Hide resolved

kjohn-msft reviewed Jan 3, 2023

View reviewed changes

src/core/src/service_interfaces/TelemetryWriter.py Outdated Show resolved Hide resolved

kjohn-msft reviewed Jan 3, 2023

View reviewed changes

src/core/src/service_interfaces/TelemetryWriter.py Outdated Show resolved Hide resolved

kjohn-msft reviewed Jan 3, 2023

View reviewed changes

src/core/src/service_interfaces/TelemetryWriter.py Outdated Show resolved Hide resolved

kjohn-msft reviewed Jan 4, 2023

View reviewed changes

src/core/src/core_logic/PatchInstaller.py Outdated Show resolved Hide resolved

kjohn-msft reviewed Jan 4, 2023

View reviewed changes

src/core/src/core_logic/PatchInstaller.py Show resolved Hide resolved

kjohn-msft reviewed Jan 4, 2023

View reviewed changes

src/core/src/core_logic/PatchInstaller.py Outdated Show resolved Hide resolved

kjohn-msft reviewed Jan 4, 2023

View reviewed changes

src/core/src/core_logic/PatchInstaller.py Outdated Show resolved Hide resolved

kjohn-msft requested changes Jan 4, 2023

View reviewed changes

GAURAVRAMRAKHYANI added 3 commits January 10, 2023 20:18

Removing import TelemetryWriter from PatchInstaller.py as telemetryWr…

a247894

…iter is already available

Removing extra spaces

9e4b43d

GAURAVRAMRAKHYANI requested review from kjohn-msft and rane-rajasi January 10, 2023 15:00

rane-rajasi reviewed Jan 11, 2023

View reviewed changes

src/core/src/core_logic/PatchAssessor.py Outdated Show resolved Hide resolved

rane-rajasi reviewed Feb 3, 2023

View reviewed changes

src/core/src/core_logic/PatchInstaller.py Outdated Show resolved Hide resolved

rane-rajasi reviewed Feb 3, 2023

View reviewed changes

src/core/tests/Test_PatchAssessor.py Outdated Show resolved Hide resolved

rane-rajasi requested changes Feb 3, 2023

View reviewed changes

Incorporated PR comments.

e404b8e

(a) Using numbered index placeholder (b) Do not explicitly throw exception if total_time_in_minutes == 0 in get_percentage_maintenance_window_used

GAURAVRAMRAKHYANI requested a review from rane-rajasi February 6, 2023 11:57

GAURAVRAMRAKHYANI added 2 commits February 7, 2023 10:22

Swallow exception if any raised during writing performance logs. We d…

a285b43

…on't want script to stop abruptly just due to some exception in writing performance logs.

(a) Do not throw exception in stopwatch. Add logs.

342afd1

(b) Add test cases

GAURAVRAMRAKHYANI mentioned this pull request Feb 10, 2023

Implement patching in batches to improve performance. #171

Merged

Merge branch 'master' into garamrak-addTimeTakenLogs

bbab8d5

rane-rajasi reviewed Feb 14, 2023

View reviewed changes

src/core/src/core_logic/PatchAssessor.py Outdated Show resolved Hide resolved

rane-rajasi requested changes Feb 14, 2023

View reviewed changes

SathishMSFT reviewed Feb 14, 2023

View reviewed changes

src/core/src/core_logic/PatchInstaller.py Show resolved Hide resolved

GAURAVRAMRAKHYANI added 2 commits February 14, 2023 15:14

Removing catch exception in PatchAssessor and PatchInstaller where st…

99298c0

…opwatch functions are called as now there are no exceptions thrown in stopwatch functions

Covering test for catch exception in the write_installer_perf_logs me…

6ca8ce0

…thod

GAURAVRAMRAKHYANI requested review from SathishMSFT and rane-rajasi February 14, 2023 10:11

SathishMSFT approved these changes Feb 14, 2023

View reviewed changes

kjohn-msft reviewed Feb 14, 2023

View reviewed changes

src/core/src/bootstrap/Constants.py Show resolved Hide resolved

kjohn-msft reviewed Feb 14, 2023

View reviewed changes

src/core/src/core_logic/PatchAssessor.py Show resolved Hide resolved

kjohn-msft reviewed Feb 14, 2023

View reviewed changes

src/core/src/core_logic/PatchInstaller.py Show resolved Hide resolved

kjohn-msft approved these changes Feb 14, 2023

View reviewed changes

rane-rajasi approved these changes Feb 15, 2023

View reviewed changes

GAURAVRAMRAKHYANI merged commit 006d225 into master Feb 15, 2023

GAURAVRAMRAKHYANI deleted the garamrak-addTimeTakenLogs branch February 15, 2023 05:31

GAURAVRAMRAKHYANI mentioned this pull request Feb 24, 2023

Update variable names and call different function to get package manager name #173

Merged

kjohn-msft mentioned this pull request Mar 14, 2023

Updating extension version to 1.6.45 #178

Merged

Add logs to get time taken by assessment job and install updates job #161

Add logs to get time taken by assessment job and install updates job #161

Uh oh!

Conversation

GAURAVRAMRAKHYANI commented Dec 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Dec 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

rane-rajasi commented Dec 30, 2022

Uh oh!

Uh oh!

GAURAVRAMRAKHYANI commented Jan 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kjohn-msft commented Jan 4, 2023

Uh oh!

kjohn-msft commented Jan 4, 2023

Uh oh!

kjohn-msft left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rane-rajasi left a comment

Choose a reason for hiding this comment

Uh oh!

kjohn-msft commented Feb 13, 2023

Uh oh!

GAURAVRAMRAKHYANI commented Feb 13, 2023

Uh oh!

Uh oh!

rane-rajasi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

SathishMSFT left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

GAURAVRAMRAKHYANI commented Dec 15, 2022 •

edited

Loading

codecov bot commented Dec 15, 2022 •

edited

Loading

GAURAVRAMRAKHYANI commented Jan 3, 2023 •

edited

Loading