Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Conversation

RussKeldorph
Copy link

No description provided.

Copy link
Member

@MattGal MattGal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but I think we may just want to disable machine 002 as well as it does seem noticeably slower and a source of many of these timeouts.

@RussKeldorph
Copy link
Author

Not taking any chances. Other legs seem to have the same timeout as this.

@RussKeldorph RussKeldorph merged commit 3d0aa77 into dotnet:master Nov 13, 2018
@wfurt
Copy link
Member

wfurt commented Nov 13, 2018

note that the delay is caused by runaway processes from previous build.
This change should not be needed on normal runs.

@RussKeldorph
Copy link
Author

@wfurt Is there something special about FreeBSD that makes these runaway processes cause a problem? Can we do something about them? I'd like to think that our build/test runs should clean up after themselves as much as possible.

@wfurt
Copy link
Member

wfurt commented Nov 14, 2018

there may be multiple reasons.
-1 There was bug in cleanup in VSTS agent we use to run builds. Since there is no official version for FreeBSD, agent is still behind on older version.
-2 we had more build breakages as code is in flux. That may introduce more not properly handled error conditions
-3 FreeBSD is bit slower without crossgen. That again may be more susceptible to error race conditions.

As far as I know we had stray build processes for a while. However on Windows, you get access denied error if there is another copy running. Such failure is somewhat more visible and understood.

Does it make somewhat sense?

@RussKeldorph
Copy link
Author

@wfurt Those reasons are plausible but lack details that make them actionable. I'm not familiar at all with #1, but if we aren't production quality on FreeBSD we need to remove it from the official build or find some way to make it not kill the rest of the product when it fails. For #2, what code is in flux? Are you referring to the code in dotnet/coreclr or somewhere else? I might believe #3 is related to the overall timeout, but I'm skeptical race conditions are being exposed. If they are, we have a major problem that we need to fix.

Please point out additional details about "stray build processes." I'm not aware of these. They need to be fixed immediately.

@wfurt
Copy link
Member

wfurt commented Nov 14, 2018

I looked at one of the machines when in "idle" state between builds. Top shows

CPU: 93.5% user,  0.0% nice,  6.2% system,  0.3% interrupt,  0.0% idle
Mem: 7337M Active, 4198M Inact, 1080M Laundry, 2408M Wired, 1562M Buf, 844M Free
Swap: 1536M Total, 472M Used, 1064M Free, 30% Inuse
 
  PID USERNAME       THR PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
8427 dditadministr   28  77    0 19744M   910M RUN     1 432:16  52.04% dotnet
81351 dditadministr   23  78    0 19745M   947M RUN     1 126.0H  45.76% dotnet
39227 dditadministr   24  78    0 19748M   919M RUN     0 263.0H  45.35% dotnet
83132 dditadministr   23  77    0 19747M   892M RUN     0 125.5H  45.21% dotnet
6642 dditadministr   31  78    0 19748M   900M RUN     3 442:54  44.65% dotnet
17497 dditadministr   29  77    0 19749M   948M RUN     1  44.4H  44.63% dotnet
75999 dditadministr   29  78    0 19746M   897M RUN     2  78.9H  41.53% dotnet
43333 dditadministr   35  77    0 19748M   936M RUN     0  79.5H  40.72% dotnet
7052 dditadministr   23  77    0 19744M   974M RUN     2 262.0H  38.62% dotnet

So we have have dotnet processes building over time and not dying.
As far as VSTS Agent I did build and posted freebsd version last week. DDIT needs to put it on the machines. As far as second item, we had quite a few build which did not finished when working on pipeline definition. We had build breaks in publishing symbols as that is normally not done by local builds.

I strongly agree that freebsd should not impact other platforms. I would be happy to watch and investigate if needed or come up with more sanity scripts. Unfortunately I don't have access.
We could have prevented all this if we monitor OS heath more closely.

@RussKeldorph RussKeldorph deleted the bsdtime branch May 1, 2019 20:51
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants