Skip to content

pub run test on 1.14.0-dev.1 throws too many files open errors #373

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nex3 opened this issue Dec 10, 2015 · 49 comments
Closed

pub run test on 1.14.0-dev.1 throws too many files open errors #373

nex3 opened this issue Dec 10, 2015 · 49 comments
Labels
needs-info Additional information needed from the issue author status-blocked Blocked from making progress by another (referenced) issue type-bug Incorrect behavior (everything from a crash to more subtle misbehavior)

Comments

@nex3
Copy link
Member

nex3 commented Dec 10, 2015

From @Andersmholmgren on December 6, 2015 6:3

00:00 +0 ~7 -7: loading test/client/entity/take2/user_entity_test.dart
  Failed to load "test/client/entity/take2/user_entity_test.dart":
  Unable to spawn isolate: "package:collection/equality.dart": Error loading file:///Users/blah/dart/backlogio_acdart/gitbacklog_root/gissue_root/gissue_client/packages/collection/equality.dart:
    Cannot open file, path = '/Users/blah/dart/backlogio_acdart/gitbacklog_root/gissue_root/gissue_client/packages/collection/equality.dart' (OS Error: Too many open files, errno = 24)
  #0      _asyncLoadErrorCallback (dart:_builtin:147)
  #1      _asyncLoadError (dart:_builtin:559)
  #2      _handleLoaderReply (dart:_builtin:362)
  #3      _RawReceivePortImpl._handleMessage (dart:isolate-patch/isolate_patch.dart:148)
00:00 +0 ~7 -7: loading test/client/entity/take2/backlog_entity_test.dart

If I switch back to 1.13.0-dev.7.12 the problem goes away

Copied from dart-lang/pub#1366, which was copied from dart-lang/sdk#25123

@nex3 nex3 added the type-bug Incorrect behavior (everything from a crash to more subtle misbehavior) label Dec 10, 2015
@nex3 nex3 added the needs-info Additional information needed from the issue author label Dec 10, 2015
@Andersmholmgren
Copy link

ulimit -n
2560

@nex3
Copy link
Member Author

nex3 commented Dec 11, 2015

Do you have an estimate of how many files are transitively reachable from each test?

@Andersmholmgren
Copy link

Do you mean dart files? i.e. based on what code is pulled in to run the tests?

@nex3
Copy link
Member Author

nex3 commented Jan 8, 2016

Yes.

@Andersmholmgren
Copy link

I have no idea how to estimate that.

One thing I did notice is that I was running in one shell (inside intellij)
and was getting this problem. I switched to an external terminal and it
didn't get the problem.

both reported the same value with ulimit -n and both were running the
same version of dart.

Is it possible some files are not being closed so if I run repeatedly it
eventually exhausts the file handles?

On Sat, 9 Jan 2016 at 10:14 Natalie Weizenbaum [email protected]
wrote:

Yes.


Reply to this email directly or view it on GitHub
#373 (comment).

@Andersmholmgren
Copy link

BTW I can confirm that a simple workaround is just to run exec bash then it goes away for a while.

So something is definitely leaving files open. Pretty sure that is not on my side as I didn't get these problems till sdk 1.14.0

@nex3
Copy link
Member Author

nex3 commented Jan 11, 2016

I have no idea how to estimate that.

How many Dart files are in your lib/ directory? How many are in your packages/ directory? Do your tests import your entire app, or only a subset?

So something is definitely leaving files open. Pretty sure that is not on my side as I didn't get these problems till sdk 1.14.0

Ping @whesse, since as I recall he was looking at similar issues on the test bots. Could there have been some change in the VM or dart:io in 1.14 that causes processes to leave dangling file descriptor?

@Andersmholmgren: What OS is this?

@Andersmholmgren
Copy link

Mac - OS X 10.11.1

My tests only import what they need but for the higher level integration
style tests that will be a fair bit of the code base

On Tue, 12 Jan 2016 at 07:18 Natalie Weizenbaum [email protected]
wrote:

I have no idea how to estimate that.

How many Dart files are in your lib/ directory? How many are in your
packages/ directory? Do your tests import your entire app, or only a
subset?

So something is definitely leaving files open. Pretty sure that is not on
my side as I didn't get these problems till sdk 1.14.0

Ping @whesse https://github.com/whesse, since as I recall he was
looking at similar issues on the test bots. Could there have been some
change in the VM or dart:io in 1.14 that causes processes to leave
dangling file descriptor?

@Andersmholmgren https://github.com/Andersmholmgren: What OS is this?


Reply to this email directly or view it on GitHub
#373 (comment).

@whesse
Copy link
Contributor

whesse commented Jan 12, 2016

The default number of open files per process is limited to 256 on mac OS X, and the way of changing it to a higher number has changed in OS 10.10. The limit is also set in a different way, in programs started from the command line (ulimit) to programs started from the GUI.

This discussion talks about the system limit (which is probably not the problem) and the shell limit, and using ulimit to fix the shell limit: http://superuser.com/questions/433746/is-there-a-fix-for-the-too-many-open-files-in-system-error-on-os-x-10-7-1

This is the best article about controlling the open file limit on processes started from the launcher, on Yosemite: http://docs.basho.com/riak/latest/ops/tuning/open-files-limit/#Mac-OS-X

But we should check if there was a change between versions 13 and 14 that unintentionally increased the file usage somehow.

@nex3
Copy link
Member Author

nex3 commented Jan 12, 2016

There were a fair number of pub changes between those versions, but nothing that I'd expect to produce more long-lasting file descriptors.

@Andersmholmgren
Copy link

Thanks for the pointers on how open file handling has changed on the mac.
I'll take a look into it.

Not sure it explains why they are held on to by the shell though (i.e. why
typing exec bash makes it work for a while)

On Wed, 13 Jan 2016 at 06:42 Natalie Weizenbaum [email protected]
wrote:

There were a fair number of pub changes between those versions, but
nothing that I'd expect to produce more long-lasting file descriptors.


Reply to this email directly or view it on GitHub
#373 (comment).

@nex3
Copy link
Member Author

nex3 commented Jan 12, 2016

@whesse Is it possible that there was a change to the isolate infrastructure that somehow causes it to leak file descriptors even after the process is closed?

@stevenroose
Copy link

I just got this error as well. It is not consistent, though. I got it when I ran a large number of tests repeatedly with only a second or two in between runs.

@nex3
Copy link
Member Author

nex3 commented Feb 1, 2016

@stevenroose What platform?

@stevenroose
Copy link

@nex3 VM on Linux

@stevenroose
Copy link

Now I get it rather consistently, without making significant changes. It's in pointycastle, but since I did not get it last week, it must be something system-specific.

(I did move to Arch Linux with an updated SDK version..)
Dart VM version: 1.14.0 (Thu Jan 28 09:21:50 2016) on "linux_x64"

@stevenroose
Copy link

I have

ulimit -n
1024

But my tests consist of a large number of files, though. I have 106 occurrences of library [a-z0-9_.]+; in my lib/ folder and 28 occurrences of part of [a-z0-9_.]+;. Which makes an estimate of 134 files.

Apart from reflectable, the dependencies are rather minimal.

@pulyaevskiy
Copy link

Just got this same issue as well.

ulimit -n on my OS X 10.11 was set to 256, when I changed it to 1024 the error disappeared.

I also noticed that changing concurrency to 1 (pub run test -j 1) helped with previous ulimit. The exception I was getting:

Unable to spawn isolate: "package:collection/src/utils.dart": Error loading file:///Users/anatoly/Projects/myproject/packages/collection/src/utils.dart:
    Cannot open file, path = '/Users/anatoly/Projects/myproject/packages/collection/src/utils.dart' (OS Error: Too many open files, errno = 24)

@skybrian
Copy link

I just ran into this for the dart-protobuf tests, which I hadn't run on my Mac in a while. They consistently work in 1.13.2 and fail using 1.14 and 1.14.2. I tried an older version of the test package and it didn't help. For me, "ulimit -n" returns 256.

It should be easy to reproduce:

git clone [email protected]:dart-lang/dart-protobuf.git
cd dart-protobuf
pub get
pub run test

First error is:

Failed to load "test/json_test.dart":
  Unable to spawn isolate: "package:stack_trace/src/frame.dart": Error loading file:///Users/skybrian/tmp/dart-protobuf/packages/stack_trace/src/frame.dart:
    Cannot open file, path = '/Users/skybrian/tmp/dart-protobuf/packages/stack_trace/src/frame.dart' (OS Error: Too many open files, errno = 24)
  #0      _asyncLoadErrorCallback (dart:_builtin:155)
  #1      _asyncLoadError (dart:_builtin:566)
  #2      _handleLoaderReply (dart:_builtin:383)
  #3      _RawReceivePortImpl._handleMessage (dart:isolate-patch/isolate_patch.dart:148)

@munificent
Copy link
Member

I can repro this consistently on my mac when running tests for most packages. The only workaround I've found is to run pub run test with -j 1, which makes things painfully slow.

@nex3
Copy link
Member Author

nex3 commented Mar 12, 2016

@munificent if you get a chance, can you see if this reproduces with the 1.13 SDK?

@queltos
Copy link

queltos commented Apr 8, 2016

We've run into this issue as well when updating from Dart 1.13 to 1.15. What I noticed is that the default of concurrent running tests is different for different install methods.

On OSX installed via brew pub run test --help gives me
-j, --concurrency=<threads> The number of concurrent test suites run. (defaults to "4")

On Ubuntu installed from the Dart Repos via apt-get gives me:
-j, --concurrency=<threads> The number of concurrent test suites run. (defaults to "2")

On a Codeship instance installed via Download of the ZIP-Package gives me:
-j, --concurrency=<threads> The number of concurrent test suites run. (defaults to "16")

This lead to tests running fine locally and triggering Too many open files on the Codeship instance.

@kevmoo
Copy link
Member

kevmoo commented Apr 8, 2016

@queltos it's not the install method. It's the hardware. The default maps to the number of cores / 2. 😄

@sanfordredlich
Copy link

I'm still seeing this on Mac OSX Sierra with 'Dart VM version: 1.19.1 (Wed Sep 7 08:59:17 2016) on "macos_x64"'. Is there a fix planned?

@sanfordredlich
Copy link

PS - I have an Intel i5, 2 cores

@kevmoo
Copy link
Member

kevmoo commented Sep 26, 2016

Our CPTO (Chief Package-Test Officer) – @nex3 – has been on vacation. I'm sure she'll look at it when she's back.

@nex3
Copy link
Member Author

nex3 commented Sep 28, 2016

There's not a lot I can do here. We're not doing anything out of the ordinary; my best guess is that there's a bug (possibly related to dart-lang/sdk#12617) in Dart's isolate infrastructure. As a workaround, you can pass -j 1 to avoid any concurrency, and possibly increase your file descriptior maximum.

Since there's not really any action we can take to resolve this in test, I'm going to close out the issue. I'll also try to bring it to the core library team's attention.

@munificent munificent reopened this Sep 28, 2016
@munificent
Copy link
Member

Sorry, but I'm re-opening.

As far as I know, this affects all Mac users. If there are underlying VM or dart:io issues that need to get fixed, we should escalate them. If that doesn't happen, we should look into working around it in the test package.

Asking all Mac users to workaround it doesn't seem like a usable solution to me. Many aren't familiar with configuring their ulimit and it's particularly difficult to tweak that correctly now in the latest OS X.

If I run pub's tests on my Mac, I get the errors almost immediately, not later in the run. So my hunch is that it's not leaking descriptors, but that's opening a very large number of them in one fast burst right at startup. Can we add a resource pool to the loading process to see if that helps?

@nex3
Copy link
Member Author

nex3 commented Sep 28, 2016

A resource pool isn't going to help much because the bulk of the resources are being consumed by the VM, not by the test runner directly—you can tell because the issue goes away when you use -j 1, which only affects isolate loading and running. I do intend to escalate this, as I mentioned in my last comment.

I'll leave this open, but mark it blocked.

@nex3 nex3 added status-blocked Blocked from making progress by another (referenced) issue and removed needs-info Additional information needed from the issue author labels Sep 28, 2016
@eseidel
Copy link

eseidel commented Jan 11, 2017

In case it's useful, I am pretty sure I just ran into this in my for funsies project: https://github.com/eseidel/lolsim/tree/c9f0beea004a11e1657a2fe38aa6fdef1637a027 Running pub run test -j1 I can see that I have an error in my code. But running without the -j1 I (at least most of the time) get meaningless too many open files errors:

00:00 +0 -1: loading test/masteries_test.dart                                                      
  Failed to load "test/masteries_test.dart":
  Unable to spawn isolate: Unhandled exception:
  Could not import "package:path/src/context.dart" from "package:path/path.dart": Cannot open file, path = '/Users/eseidel/.pub-cache/hosted/pub.dartlang.org/path-1.4.1/lib/src/context.dart' (OS Error: Too many open files, errno = 24)
  null

I'm using Dart VM version: 1.21.0 (Wed Dec 7 06:44:15 2016) on "macos_x64" and test-0.12.18+1. I'm on an old macbook air (MacBookAir4,2) with 2 cores, one processor, running 10.12.2.

Maybe errors in isolates cause imported files to leak until the whole dart process shuts down?

@eseidel
Copy link

eseidel commented Jan 11, 2017

Sorry, my previous comment seems to be incorrect. Although having the errors did seem to increase the frequency, I am now getting these too many open files errors even having fixed the NPEs in my package. I think they may relate to having recently added a 4th test file to my package? They seem intermittent.

@eseidelGoogle
Copy link

eseidelGoogle commented Jan 11, 2017

I was able to repro on my work machine by lowering the file limit:
Testing with quiver.

git clone https://github.com/google/quiver-dart
cd quiver-dart
pub get # Do this first, may hang forever with the lower limit
sudo sysctl kern.maxfiles=3000 # I'm not sure what the default limit is, my work machine seems to have it artificially inflated.
pub run test
00:00 +0 -1: loading test/cache/map_cache_test.dart                                                
  Failed to load "test/cache/map_cache_test.dart":
  Unable to spawn isolate: Unhandled exception:
  Could not import "package:quiver/collection.dart" from "package:quiver/cache.dart": Cannot open file, path = '/src/dart_sdk/sdk/third_party/pkg/quiver/lib/collection.dart' (OS Error: Too many open files in system, errno = 23)
  null
00:00 +0 -2: loading test/async/create_timer_test.dart                                             
  Failed to load "test/async/create_timer_test.dart":
  Unable to spawn isolate: Unhandled exception:
  Could not import "package:async/src/async_memoizer.dart" from "package:async/async.dart": Cannot open file, path = '/Users/eseidel/.pub-cache/hosted/pub.dartlang.org/async-1.11.3/lib/src/async_memoizer.dart' (OS Error: Too many open files in system, errno = 23)
  null

Dart VM version: 1.21.0 (Wed Dec 7 06:44:15 2016) on "macos_x64"

@eseidelGoogle
Copy link

It may be possible to repro this on linux using ulimit. I'm not sure how to set per-process resource limits on Mac OS X, hence the kernel-level limit in my instructions above. On stock mac os x (not Google's pre-configured variant) I hit this w/o needing any special configuration instructions.

@eseidelGoogle
Copy link

I should also note, I was actually running the quiver in dart_sdk/sdk/third_party/pkg/quiver since that was already on my machine, and I have re-confirmed (after restarting) that with a maxfiles limit of 16k, I don't hit the issue.

@zanderso
Copy link
Member

On my Linux box following @eseidel 's instruction for quiver, this gets up to ~1800 open files even before hitting the failure from dart-lang/sdk#28287. (The number goes back down as loading completes and the files are properly closed. There is no evidence of a leak of file descriptors.)

My preliminary thought is that the VM's loader needs to respond to back-pressure that comes from the OS in the form of this error. I'll continue to investigate on bleeding-edge once dart-lang/sdk#28287 is addressed.

@johnmccutchan @turnidge

@johnmccutchan
Copy link

The simplest solution would be to rate limit the loader to only have ~16 outstanding file i/o requests at any given time. The loader is shared across all isolates so the rate limiter will work for any number of isolates.

@johnmccutchan
Copy link

The correct long term solution would be to rewrite the dart:io implementation on MacOS to have some form of rate limiting / back pressure built in so that Dart programs aren't exposed to weird platform specific issues like this.

@nex3
Copy link
Member Author

nex3 commented Jan 11, 2017

The loader doesn't manually do much file IO. Mostly it just creates isolates, which it don't give it information or control over how many files they touch and when.

@zanderso
Copy link
Member

@nex3 I think @johnmccutchan was talking about the VM's loader rather than pub's or some other loader.

@nex3
Copy link
Member Author

nex3 commented Jan 11, 2017

Oh, that makes sense.

@zanderso
Copy link
Member

Let's just track it here rather than filing a new bug against the VM. I'll take a look.

@zanderso
Copy link
Member

After the above change, I'm only seeing 300-400 files open at a time, down from 1800. With -j4 it's now ~100. This sounds like a lot to me, but maybe it's expected for pub. In any case, the loader is no longer exploding the number as it can now have no more than 16 requests going at a time.

@nex3
Copy link
Member Author

nex3 commented Jan 11, 2017

@eseidelGoogle Can you confirm whether that fixed your issue?

@zanderso @johnmccutchan Thanks for tracking this down! I wouldn't expect even 100 files to be open at once from the test code itself, though. We only ever open the configuration file (if one exists) and the *_test.dart files, although the latter are loaded synchronously so they shouldn't use more than one descriptor at a time. Everything else would be coming from the VM and/or the Isolate API.

@zanderso zanderso added the needs-info Additional information needed from the issue author label Jan 12, 2017
@zanderso zanderso removed their assignment Jan 12, 2017
@zanderso
Copy link
Member

I hope we're going to have the cycles to work on improving the diagnostic information that we can report from dart:io about file handles, etc.. As things stand now, I don't think we can spend cycles tracking these down by hand without some additional information (like the error messages above). If this solves @eseidel's problem, I'd suggest closing this issue. For right now, I'm going to remove my assignment, and mark as needs-info.

@eseidelGoogle
Copy link

Very nice of you to fix @zanderso, thank you! Was there any diagnostic-info patch worth saving for future investigations from your debugging this afternoon?

I'll be sure to verify that this fixes the occurrences I saw as soon as the next release is tagged with this fix.

@eseidelGoogle
Copy link

https://github.com/dart-lang/sdk/releases/tag/1.22.0-dev.7.0 includes this fix, just waiting for the release to appear in homebrew.

@zanderso
Copy link
Member

The quick hacks that I did yesterday were pretty posix-specific, and wouldn't be useful for general debugging--what you'd really like to know is what lines of your Dart code are responsible for the open files, sockets, handles, etc.. That's something I think we should surface in Observatory, but there's no quick way to add it.

@munificent
Copy link
Member

I can confirm that I'm able to run pub run test on pub's own tests on my Mac without running out of file descriptors like I was before. (I'm getting lots of failures because of dart-lang/sdk#25342, though.)

@eseidel
Copy link

eseidel commented Jan 19, 2017

huzzah, put run test works again with 1.22.0-dev.9.0. Verified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-info Additional information needed from the issue author status-blocked Blocked from making progress by another (referenced) issue type-bug Incorrect behavior (everything from a crash to more subtle misbehavior)
Projects
None yet
Development

No branches or pull requests