-
Notifications
You must be signed in to change notification settings - Fork 219
pub run test on 1.14.0-dev.1 throws too many files open errors #373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
|
Do you have an estimate of how many files are transitively reachable from each test? |
Do you mean dart files? i.e. based on what code is pulled in to run the tests? |
Yes. |
I have no idea how to estimate that. One thing I did notice is that I was running in one shell (inside intellij) both reported the same value with Is it possible some files are not being closed so if I run repeatedly it On Sat, 9 Jan 2016 at 10:14 Natalie Weizenbaum [email protected]
|
BTW I can confirm that a simple workaround is just to run So something is definitely leaving files open. Pretty sure that is not on my side as I didn't get these problems till sdk 1.14.0 |
How many Dart files are in your
Ping @whesse, since as I recall he was looking at similar issues on the test bots. Could there have been some change in the VM or @Andersmholmgren: What OS is this? |
Mac - OS X 10.11.1 My tests only import what they need but for the higher level integration On Tue, 12 Jan 2016 at 07:18 Natalie Weizenbaum [email protected]
|
The default number of open files per process is limited to 256 on mac OS X, and the way of changing it to a higher number has changed in OS 10.10. The limit is also set in a different way, in programs started from the command line (ulimit) to programs started from the GUI. This discussion talks about the system limit (which is probably not the problem) and the shell limit, and using ulimit to fix the shell limit: http://superuser.com/questions/433746/is-there-a-fix-for-the-too-many-open-files-in-system-error-on-os-x-10-7-1 This is the best article about controlling the open file limit on processes started from the launcher, on Yosemite: http://docs.basho.com/riak/latest/ops/tuning/open-files-limit/#Mac-OS-X But we should check if there was a change between versions 13 and 14 that unintentionally increased the file usage somehow. |
There were a fair number of pub changes between those versions, but nothing that I'd expect to produce more long-lasting file descriptors. |
Thanks for the pointers on how open file handling has changed on the mac. Not sure it explains why they are held on to by the shell though (i.e. why On Wed, 13 Jan 2016 at 06:42 Natalie Weizenbaum [email protected]
|
@whesse Is it possible that there was a change to the isolate infrastructure that somehow causes it to leak file descriptors even after the process is closed? |
I just got this error as well. It is not consistent, though. I got it when I ran a large number of tests repeatedly with only a second or two in between runs. |
@stevenroose What platform? |
@nex3 VM on Linux |
Now I get it rather consistently, without making significant changes. It's in pointycastle, but since I did not get it last week, it must be something system-specific. (I did move to Arch Linux with an updated SDK version..) |
I have
But my tests consist of a large number of files, though. I have 106 occurrences of Apart from |
Just got this same issue as well.
I also noticed that changing concurrency to 1 (
|
I just ran into this for the dart-protobuf tests, which I hadn't run on my Mac in a while. They consistently work in 1.13.2 and fail using 1.14 and 1.14.2. I tried an older version of the test package and it didn't help. For me, "ulimit -n" returns 256. It should be easy to reproduce:
First error is:
|
I can repro this consistently on my mac when running tests for most packages. The only workaround I've found is to run pub run test with |
@munificent if you get a chance, can you see if this reproduces with the 1.13 SDK? |
We've run into this issue as well when updating from Dart 1.13 to 1.15. What I noticed is that the default of concurrent running tests is different for different install methods. On OSX installed via brew On Ubuntu installed from the Dart Repos via apt-get gives me: On a Codeship instance installed via Download of the ZIP-Package gives me: This lead to tests running fine locally and triggering |
@queltos it's not the install method. It's the hardware. The default maps to the number of cores / 2. 😄 |
I'm still seeing this on Mac OSX Sierra with 'Dart VM version: 1.19.1 (Wed Sep 7 08:59:17 2016) on "macos_x64"'. Is there a fix planned? |
PS - I have an Intel i5, 2 cores |
Our CPTO (Chief Package-Test Officer) – @nex3 – has been on vacation. I'm sure she'll look at it when she's back. |
There's not a lot I can do here. We're not doing anything out of the ordinary; my best guess is that there's a bug (possibly related to dart-lang/sdk#12617) in Dart's isolate infrastructure. As a workaround, you can pass Since there's not really any action we can take to resolve this in |
Sorry, but I'm re-opening. As far as I know, this affects all Mac users. If there are underlying VM or Asking all Mac users to workaround it doesn't seem like a usable solution to me. Many aren't familiar with configuring their If I run pub's tests on my Mac, I get the errors almost immediately, not later in the run. So my hunch is that it's not leaking descriptors, but that's opening a very large number of them in one fast burst right at startup. Can we add a resource pool to the loading process to see if that helps? |
A resource pool isn't going to help much because the bulk of the resources are being consumed by the VM, not by the test runner directly—you can tell because the issue goes away when you use I'll leave this open, but mark it blocked. |
In case it's useful, I am pretty sure I just ran into this in my for funsies project: https://github.com/eseidel/lolsim/tree/c9f0beea004a11e1657a2fe38aa6fdef1637a027 Running
I'm using Maybe errors in isolates cause imported files to leak until the whole |
Sorry, my previous comment seems to be incorrect. Although having the errors did seem to increase the frequency, I am now getting these too many open files errors even having fixed the NPEs in my package. I think they may relate to having recently added a 4th test file to my package? They seem intermittent. |
I was able to repro on my work machine by lowering the file limit:
Dart VM version: 1.21.0 (Wed Dec 7 06:44:15 2016) on "macos_x64" |
It may be possible to repro this on linux using ulimit. I'm not sure how to set per-process resource limits on Mac OS X, hence the kernel-level limit in my instructions above. On stock mac os x (not Google's pre-configured variant) I hit this w/o needing any special configuration instructions. |
I should also note, I was actually running the quiver in |
On my Linux box following @eseidel 's instruction for quiver, this gets up to ~1800 open files even before hitting the failure from dart-lang/sdk#28287. (The number goes back down as loading completes and the files are properly closed. There is no evidence of a leak of file descriptors.) My preliminary thought is that the VM's loader needs to respond to back-pressure that comes from the OS in the form of this error. I'll continue to investigate on bleeding-edge once dart-lang/sdk#28287 is addressed. |
The simplest solution would be to rate limit the loader to only have ~16 outstanding file i/o requests at any given time. The loader is shared across all isolates so the rate limiter will work for any number of isolates. |
The correct long term solution would be to rewrite the dart:io implementation on MacOS to have some form of rate limiting / back pressure built in so that Dart programs aren't exposed to weird platform specific issues like this. |
The loader doesn't manually do much file IO. Mostly it just creates isolates, which it don't give it information or control over how many files they touch and when. |
@nex3 I think @johnmccutchan was talking about the VM's loader rather than pub's or some other loader. |
Oh, that makes sense. |
Let's just track it here rather than filing a new bug against the VM. I'll take a look. |
After the above change, I'm only seeing 300-400 files open at a time, down from 1800. With -j4 it's now ~100. This sounds like a lot to me, but maybe it's expected for pub. In any case, the loader is no longer exploding the number as it can now have no more than 16 requests going at a time. |
@eseidelGoogle Can you confirm whether that fixed your issue? @zanderso @johnmccutchan Thanks for tracking this down! I wouldn't expect even 100 files to be open at once from the |
I hope we're going to have the cycles to work on improving the diagnostic information that we can report from dart:io about file handles, etc.. As things stand now, I don't think we can spend cycles tracking these down by hand without some additional information (like the error messages above). If this solves @eseidel's problem, I'd suggest closing this issue. For right now, I'm going to remove my assignment, and mark as needs-info. |
Very nice of you to fix @zanderso, thank you! Was there any diagnostic-info patch worth saving for future investigations from your debugging this afternoon? I'll be sure to verify that this fixes the occurrences I saw as soon as the next release is tagged with this fix. |
https://github.com/dart-lang/sdk/releases/tag/1.22.0-dev.7.0 includes this fix, just waiting for the release to appear in homebrew. |
The quick hacks that I did yesterday were pretty posix-specific, and wouldn't be useful for general debugging--what you'd really like to know is what lines of your Dart code are responsible for the open files, sockets, handles, etc.. That's something I think we should surface in Observatory, but there's no quick way to add it. |
I can confirm that I'm able to run |
huzzah, |
From @Andersmholmgren on December 6, 2015 6:3
If I switch back to 1.13.0-dev.7.12 the problem goes away
Copied from dart-lang/pub#1366, which was copied from dart-lang/sdk#25123
The text was updated successfully, but these errors were encountered: