-
Notifications
You must be signed in to change notification settings - Fork 10.4k
Add retries to helix installer commands #29842
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
FYI @dougbu @wtgodbe though I haven't seen any more helix arm queue failures due to the install scripts hanging, I can't point to this PR as the fix since it looks like those failures have been somewhat rare since end of last week. However, I still think the changes here are valuable and doesn't incur any penalties in terms of risk or new failure modes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree this is helpful because it focuses on the timeout and does not
- Eliminate tools.ps1 from the stack of bits involved in the failure
- Handle the case where the install-dotnet.ps1 script is successful i.e. spits out
Installation finished
before something hangs
@dougbu I'm getting some mixed messages here.
My understanding of the suggestion in the email was to run the script in a separate process, which was done before and is still done in this PR. The improvement here is to kill the process if it runs over a long period of time (120 seconds in this case). I read your email response and now I'm confused, do we want to eliminate the script completely by duplicating the functionality of
That's what this PR does. It doesn't need to read command line output because if the invocation times out for any reason at all, it will be stopped and retried. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline. This does help immediately…
Followup issue filed #29888 |
Applying recommendations in offline discussions to improve helix matrix ARM queue success rate.