Skip to content

Conversation

JunTaoLuo
Copy link
Contributor

Applying recommendations in offline discussions to improve helix matrix ARM queue success rate.

@JunTaoLuo JunTaoLuo added the area-infrastructure Includes: MSBuild projects/targets, build scripts, CI, Installers and shared framework label Feb 2, 2021
@JunTaoLuo JunTaoLuo requested a review from a team as a code owner February 2, 2021 00:32
@JunTaoLuo JunTaoLuo marked this pull request as draft February 2, 2021 00:33
@JunTaoLuo JunTaoLuo marked this pull request as ready for review February 2, 2021 18:41
@JunTaoLuo
Copy link
Contributor Author

FYI @dougbu @wtgodbe though I haven't seen any more helix arm queue failures due to the install scripts hanging, I can't point to this PR as the fix since it looks like those failures have been somewhat rare since end of last week. However, I still think the changes here are valuable and doesn't incur any penalties in terms of risk or new failure modes.

Copy link
Contributor

@dougbu dougbu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree this is helpful because it focuses on the timeout and does not

  1. Eliminate tools.ps1 from the stack of bits involved in the failure
  2. Handle the case where the install-dotnet.ps1 script is successful i.e. spits out Installation finished before something hangs

@JunTaoLuo
Copy link
Contributor Author

@dougbu I'm getting some mixed messages here.

Eliminate tools.ps1 from the stack of bits involved in the failure

My understanding of the suggestion in the email was to run the script in a separate process, which was done before and is still done in this PR. The improvement here is to kill the process if it runs over a long period of time (120 seconds in this case). I read your email response and now I'm confused, do we want to eliminate the script completely by duplicating the functionality of InstallDotNet in our own script (runtests.ps1)?

Handle the case where the install-dotnet.ps1 script is successful i.e. spits out Installation finished before something hangs

That's what this PR does. It doesn't need to read command line output because if the invocation times out for any reason at all, it will be stopped and retried.

Copy link
Contributor

@dougbu dougbu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline. This does help immediately…

@JunTaoLuo
Copy link
Contributor Author

Followup issue filed #29888

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-infrastructure Includes: MSBuild projects/targets, build scripts, CI, Installers and shared framework
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants