Skip to content

dartdoc runs too long - possible patterns #2615

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
isoos opened this issue Apr 14, 2021 · 14 comments
Closed

dartdoc runs too long - possible patterns #2615

isoos opened this issue Apr 14, 2021 · 14 comments
Labels
P2 A bug or feature request we're likely to work on type-enhancement A request for a change that isn't a bug type-performance Issues related to slow dartdoc generation.

Comments

@isoos
Copy link
Contributor

isoos commented Apr 14, 2021

We've started to track dartdoc run elapsed times on pub.dev, and on the recent runtimes, the following patterns emerged:

  • The top-1000 slowest runs were using Flutter SDK 94.2% of the time (89% in top-100).

  • The longest to generate is package:colorz which only has a single (but long) Dart file and a simple class reference pattern. It seems that there may be some opportunities for improvements there by caching some calculations wrt. the Color reference? Random sampling of slow packages shows that similar patterns (e.g. single file with lots of int const values).

  • Another slow pattern is having a large number of public libraries like in package:ews.

  • A more surprising pattern seems to be in package:niku, where both the number of libraries and the source code is relatively small, but uses extension methods a lot.

I've attached the list of slow packages to this issue, in the order of decreasing run times: slow.txt

The above patterns were observed with "dart": "2.12.2", "flutter": "2.0.4", "dartdoc": "0.42.0".

@jcollins-g
Copy link
Contributor

Dartdoc's reference structure currently requires us to load the world to resolve references, and there is currently no way to cache dartdoc data structures for a package between runs. Therefore, packages with complex dependencies (including, unfortunately, packages that use Flutter) can take a long time to process.

Neither issue is simple to correct because a single package could have different data structures associated with it based on dependencies resolved elsewhere.

@jcollins-g jcollins-g added P3 A lower priority bug or feature request type-enhancement A request for a change that isn't a bug P2 A bug or feature request we're likely to work on and removed P3 A lower priority bug or feature request labels Apr 19, 2021
@srawlins
Copy link
Member

I've had really good results with profiling Dartdoc in Observatory before, and these packages would be great to use; they may indicate some pathological cases that Dartdoc doesn't handle well. For example, having one library with lots of int const values should not be a problem, as far as I can think. There might be something fishy.

@isoos
Copy link
Contributor Author

isoos commented Apr 26, 2021

there is currently no way to cache dartdoc data structures for a package between runs.

@jcollins-g: Would it be feasible to do so? On pub.dev we could have a good control over the cached file, and the 10-20 seconds that we could potentially gain on a single run may accumulate to days really fast...

I've had really good results with profiling Dartdoc in Observatory before, and these packages would be great to use; they may indicate some pathological cases that Dartdoc doesn't handle well. For example, having one library with lots of int const values should not be a problem, as far as I can think. There might be something fishy.

@srawlins: I've tried to profile the generation of package:colorz with dartdoc, and I've found the following:

  • contrary to my initial assessment, it is not a single-file dartdoc content, it has 9000+ fields and for each it has a separate file
  • generating the dartdoc content on my machine was about 150 seconds (mostly in DartdocGeneratorBackend.generateProperty), out of which 79 seconds was writing the files (DartdocFileWriter.write), ~44 seconds for generating the HTML (DartdocGeneratorBackend.renderProperty), and ~27 seconds was preparing the PropertyTemplateData.

I could imagine two parts where there is a chance for significant performance gain for a (maybe) low effort:

  • Make the content generation and the content write to a file concurrent with async IO. Right now dartdoc is blocking on the file writes, and DartdocFileWriter could use a concurrency=1 pool to write the generated contents to the filesystem, while at the same time render the next file's content.

  • Caching the SDK properties between dartdoc runs (with a flag that writes them to a file / reads them if exists).

@parlough
Copy link
Member

parlough commented Apr 26, 2021

As a reference, talk of not generating these individual pages(at least for properties) at all has been discussed before:

#1272
#1983

While it doesn't fix any core problem, it reduces the amount of work by not doing it all :P

@isoos
Copy link
Contributor Author

isoos commented Apr 27, 2021

As for DartdocFileWriter, maybe what pub.dev could benefit from is an option to work with in-memory filesystem. Instead of writing to files, the process in pub.dev could just collect these contents in memory and then store them in a way that is more efficient for the pub.dev flow.

I'm a bit at loss here: what would be the easiest hook into dartdoc to achieve that?

Our current entrypoint is relatively simple: https://github.com/dart-lang/pub-dev/blob/master/pkg/pub_dartdoc/bin/pub_dartdoc.dart

@jcollins-g
Copy link
Contributor

Converting file writing to async will be a bit challenging because of the infectious nature of asynchronous code. I'm also not sure it will actually result in any performance improvement though because the writing has to occur somewhere.

Probably the most direct way to reduce operating system overhead might be to write dartdoc data into a tar file with package:archive or similar. Although I've not used the package, it looks like it could be suitable both for storing the data in memory and preventing excessive I/O overhead.

@isoos
Copy link
Contributor Author

isoos commented Apr 28, 2021

@jcollins-g: we are looking into using in-memory fs for now, as it allows us to use additional sanity checks with early abort capability (e.g. number of files and sizes), and also efficient archive creation. Can you please comment on what would be the easiest way to configure the mem fs into our dartdoc entry point?

@jcollins-g
Copy link
Contributor

@isoos I am not certain what the best way is, it will depend on what you want to do with in-memory filesystem (for example, do you want to have all the original source code and the SDK stored there as well that Dartdoc will be reading, or just for output? Using it for all input is probably a much larger change.)

If it is just for output, adding a way to override the ResourceProvider passed to DartdocFileWriter or possibly DartdocFileWriter itself in generateDocsBase() would probably be what I would try first. That would give you full control over the files written. validateLinks would need to be updated to allow for this alternative output as well.

@isoos
Copy link
Contributor Author

isoos commented Apr 30, 2021

@jcollins-g: We would intercept only the output.

I'm not entirely sure how I could hijack the DartdocFileWriter.write, as there seems to be no way to get to it. I could imagine that the ResourceProvider.getFile could be hijacked, by checking the output directory and providing a File implementation with that (otherwise the content reads will be also hijacked).

Would it be possible to add a dartdoc hook to override the DartdocFileWriter, or should we rather go in the other direction?

@jcollins-g
Copy link
Contributor

DartdocFileWriter override is probably the best approach; dartdoc will have to be patched to allow it.

@isoos
Copy link
Contributor Author

isoos commented Apr 30, 2021

DartdocFileWriter override is probably the best approach; dartdoc will have to be patched to allow it.

Any estimate when this could happen and released?

@jcollins-g
Copy link
Contributor

It would depend. We accept PRs if you want to add this feature; I can commit to spinning a release soon after it lands if need be.

@sigurdm
Copy link
Contributor

sigurdm commented Apr 8, 2025

I wonder if this issue is still useful. Package colorz is too old to be rendered with modern dartdoc.
I vaguely remember something was done to limit quadratic behavior...

@isoos
Copy link
Contributor Author

isoos commented Apr 8, 2025

Agreed, let's close this. The template system is already different since we filed this issue.

@isoos isoos closed this as completed Apr 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 A bug or feature request we're likely to work on type-enhancement A request for a change that isn't a bug type-performance Issues related to slow dartdoc generation.
Projects
None yet
Development

No branches or pull requests

5 participants