-
Notifications
You must be signed in to change notification settings - Fork 11
Add logs to get time taken by assessment job and install updates job #161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… Added relevant details in logs like patch service used (apt, yum or zypper), machine ram, machine cpu, OS details, etc.
Codecov Report
@@ Coverage Diff @@
## master #161 +/- ##
==========================================
+ Coverage 88.52% 88.75% +0.22%
==========================================
Files 86 88 +2
Lines 12598 12881 +283
==========================================
+ Hits 11152 11432 +280
- Misses 1446 1449 +3
Flags with carried forward coverage won't be shown. Click here to find out more.
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
|
A few questions and my thoughts here: Adding new log statements to determine assessment and install jobs duration, seems a straightforward method. However, we already have the issue of the overall operation being slow because of the agent's restrictions in consuming telemetry events. Even though these are just 2 logs statements, it still contributes to the reduced pace. Most of the information added in these new logs is redundant (machine info is already logged, start and endtime are added in the status file) Question:
|
Yes, the information which we are logging here are already there in different places but to get all the related information for a job, currently we need join in kusto queries which is not scalable. We are logging all the required information in one log so that we don't need join in kusto queries. We need efficient kusto queries because we are going to use kusto queries in future to calculate improvements that we are going to do in the script. |
|
Please include test code to mock and validate changed functionality - you can do this after nailing down comments in the prod code. |
To @rane-rajasi's note here (which I only saw after I wrote other comments), there's no need to write the existing telemetry event for machine_info if what's being written here happens reliably. Rajasi please review if that mitigates some of the valid concern raised. |
kjohn-msft
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments are inline in PR.
(a) Creating separate class Stopwatch (b) Renaming variable names to be consistent (c) Renaming write_machine_config_info() to set_machine_config_info() as we are not logging machine_info in this function. (d) Creating constant for strings which are used in logs as they are referred multiple places. (e) Pre calculating time taken by task and percentage of maintenance window used instead of calculating them in kusto query. (f) Adding number_of_tries (for assessment) and number_of_rounds (for patch installation) (g) Removing import TelemetryWriter as it is already available in PatchAssessor and PatchInstaller
…iter is already available
rane-rajasi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments inline
(a) Using numbered index placeholder (b) Do not explicitly throw exception if total_time_in_minutes == 0 in get_percentage_maintenance_window_used
…on't want script to stop abruptly just due to some exception in writing performance logs.
(b) Add test cases
Done code changes for Aptitude package manager and did manual testing. Code changes for Yum and Zipper is pending should be completed by next week. Adding unit tests is pending. Also, due to signature change of the method install_update_and_dependencies, there are some tests failing. I will work on those tests once we confirm the signature of the method. Using Stopwatch implementation in this PR. The implementation of Stopwatch is in the PR #161 which is going to be merged soon.
|
@GAURAVRAMRAKHYANI - can you please merge the latest from master please? There shouldn't be any conflict but it helps with the PR. @rane-rajasi please consider you're weekly planning and provide an ETA for final review. The ETA from me will be EOD business hours PST. @SathishMSFT, please review and sign off if you're satisfied with the changes. |
Merged latest master branch |
rane-rajasi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left one comment on write_assessment_perf_logs()
…opwatch functions are called as now there are no exceptions thrown in stopwatch functions
SathishMSFT
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
* Implement patching in batches to improve performance. Done code changes for Aptitude package manager and did manual testing. Code changes for Yum and Zipper is pending should be completed by next week. Adding unit tests is pending. Also, due to signature change of the method install_update_and_dependencies, there are some tests failing. I will work on those tests once we confirm the signature of the method. Using Stopwatch implementation in this PR. The implementation of Stopwatch is in the PR #161 which is going to be merged soon. * Fixed an issue in is_packages_install_time_available i * Code changes for yum and zypper package manager for parallel patching. Did manual testing and checked that packages are getting installed and the installation status of the packages are shown correctly in the portal. * For yum, add packages with different architecture in list of dependencies. * Correct function call in Yum for extract_dependencies * Following changes are done in this iteration. All the existing unit tests are succeeding now with these changes. (a) updated function name is_packages_install_time_available to is_package_install_time_available (b) Made reboot_manager mandatory in the function is_packages_install_time_available (c) Created a function install_update_and_dependencies_and_get_status * (a) Handled the scenario that if there is not enough remaining time to install packages in batch then do not stop the complete process of installing patches. The reamaining packages can be installed sequentially and there is already check for maintenance window cutoff time when installing packages sequentially so that the installation is stopped when there is not enough remaining time in maintenance window for installing package. Tested this scenario. (b) After batch patching, there is call get_remaining_packages_to_install. The time to execute get_remaining_packages_to_install should not be counted in time taken by nistalling packages in batches. Hence, stopping the stopwatch before calling get_remaining_packages_to_install and writing batch installation details in the telemetry after the call get_remaining_packages_to_install so that we also get the remaining packages to install in the telemetry. Tested this scenario. * Added test test_is_package_install_time_available * Addressed the comments. Some of the changes done are: (a) using self.execution_config.reboot_setting instead of reboot_manager.reboot_setting (b) Reordering import of modules (c) Moving add_arch_dependencies_yum funciton to package_manager and updating its name to add_arch_dependencies * (a) Added unit tests for code coverage (b) refactored the method get_dependent_list. Removed that method include_dependencies as it is not required * include_dependencies method from apt and zypper package managers Using list() instead of copy() as copy is not supported in python2.7 * updated UA_ESM_REQUIRED to UA_ESM_Required in the test test_skip_package_version_UA_ESM_REQUIRED * Added two unit tests: (a) test_dependent_package_excluded (b) test_arch_dependency_install_success * Add three new test cases: test_dependent_package_excluded_and_not_enough_time_for_batch_patching test_no_updates_to_install test_get_remaining_packages_to_install * Added 3 new test cases: test_include_dependency_apt test_include_dependency_yum test_include_dependency_zypper * Incorporated PR comments. Mostly renaming of variables and edit logs. Using package in new code instead of patch and update except for already present variables and functions. * Incorporated PR comments. Some of the changes are: (a) Updated <>_parent_update_count to <>_parent_package_install_count (b) Removed the log ("\n\nInstalling patches in sequence...") (c) Added a prefix string in the performance logs for install package, install batch and the process of install packages in batches. This will help in kusto queries. * Incorporating PR comments * Call self.package_manager.is_reboot_pending() instead of self.is_reboot_pending() after merge from master because is_reboot_pending() is moved to package_manager * Incorporate PR comments * Incorporated PR comments * The changes contains following: (a) Incorporated PR comments (b) Using all_package_versions instead of package_versions while finding version of the dependent package in the function include_dependencies in the file PatchInstaller.py. The list all_package_versions contains versions of all the packages available to install but the list package_versions contains only those packages which are selected by user. There might be dependency on the package which is not selected by user and hence all_package_versions should be used. (c) Making a list containing not attempted packages and failed packages in batch patching process. In fallback to sequential patching, this list is used as the input packages to install. The method get_remaining_packages_to_install is no longer required. But due to this change, the newly available packages in brief time while batch patching was done are not handled. This was the case before this PR also, the newly added packages while installing the packages were not handled. To make it simple, it is better to not handle that case in this big PR. It is difficult to test this scenario. * Changes in this commit: (a) Addressing PR comments (b) Adding some comments in the code.
Background: Current methods to query core execution time don’t scale well in Kusto. A single-line log can be queried better at scale across machines.
Added logs to get time taken by assessment and install updates jobs. Added relevant details in logs like patch service used (apt, yum or zypper), machine ram, machine cpu, OS details, etc.
Following are the examples of the added log for assessment and install updates job:
Assessment:
{'job': 'Assessment', 'startTime': '2022-12-15 13:40:34.314455', 'endTime': '2022-12-15 13:40:41.887585', 'patchService': 'AptitudePackageManager', 'machineDetails': {'platform_name': 'Ubuntu', 'platform_version': '20.04', 'machine_cpu': 'Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz', 'machine_arch': 'x86_64', 'disk_type': 'Hard drive'}} [TC=105]
Install Updates:
{'job': 'Install Updates', 'startTime': '2022-12-15 12:36:18.153067', 'endTime': '2022-12-15 12:56:34.743747', 'installed_update_count': '13', 'update_run_successful': 'True', 'maintenance_window': '3:55:00', 'maintenance_window_exceeded': 'False', 'patchService': 'AptitudePackageManager', 'machineDetails': {'platform_name': 'Ubuntu', 'platform_version': '20.04', 'machine_cpu': 'Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz', 'machine_arch': 'x86_64', 'disk_type': 'Hard drive'}} [TC=1216]
Manual testings performed:
Performed Check for updates and One-time update and checked that the new logs added are present in the logs file.