Skip to content

Add new code for resolving deps graphs in a phased build. #3953

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

davidmorgan
Copy link
Contributor

@davidmorgan davidmorgan commented Mar 28, 2025

The current code computes transitive deps for each build step using some cached intermediate results; this new code uses information about what phase generated files were or will be created in to compute everything once.

Two further steps after this PR to get the performance gains:

  1. InputTracker and so AssetGraph currently flatten the computed LibraryCycleGraphs to a new set of IDs for each build step; instead they should reuse the instances created.
  2. serialization should store multiple references to the same LibraryCycleGraph once, i.e. dedupe by identity

I've started looking at those changes in #3955 and indeed the performance looks promising.

Copy link

github-actions bot commented Mar 28, 2025

PR Health

Changelog Entry ✔️
Package Changed Files

Changes to files need to be accounted for in their respective changelogs.

@davidmorgan davidmorgan force-pushed the deps-graph-with-phases-resolver-rebase branch 2 times, most recently from c970f30 to f8439e3 Compare March 31, 2025 11:26
@davidmorgan davidmorgan changed the title Deps graph with phases resolver rebase Add new code for resolving deps graphs in a phased build. Mar 31, 2025
@davidmorgan davidmorgan force-pushed the deps-graph-with-phases-resolver-rebase branch 4 times, most recently from 2e08485 to b44dd81 Compare April 2, 2025 15:27
@davidmorgan davidmorgan force-pushed the deps-graph-with-phases-resolver-rebase branch from b44dd81 to 4ecc64f Compare April 2, 2025 15:30
@davidmorgan davidmorgan mentioned this pull request Apr 3, 2025
@davidmorgan davidmorgan marked this pull request as ready for review April 3, 2025 12:15
@davidmorgan davidmorgan requested a review from jensjoha April 3, 2025 12:15
LibraryCycleGraph._();

/// All subgraphs in the graph, including the root.
Iterable<LibraryCycleGraph> get transitiveGraphs sync* {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason this (and the method below) is sync*?
While I haven't checked I'll assume there's quite a bit of performance overhead.
(Also I personally don't like them).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure yet if it needs to be fast, and I find sync* easier to write / less likely to have bugs.

The TODO below is about writing away from sync* in case it does need to be fast :)

If there is a good pattern I can follow for writing it without sync*, happy to learn + do that now :)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this instance I think returning seenGraphs would result in the same thing. It's not using more ram (you already have the Set), I'm guessing you're losing some overhead and that your profile will be better at assigning stuff correctly. And then you can get rid of the sync* :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's much better. Thanks :)

final idToLoad = idsToLoad.removeLast();

// Nothing to do if deps were already loaded, unless they were
// loaded and expire and [assetDepsLoader] can see a newer value.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the part about seeing a newer value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to:

   // Nothing to do if deps were already loaded, unless they expire and
  // [assetDepsLoader] is at a late enough phase to see the updated value.

// Work through phases <= `upToPhase` at which graphs expire,
// so there are new values to compute.
if (_graphsToComputeByPhase.isEmpty) break;
phase = (_graphsToComputeByPhase.keys.toList()..sort()).first;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how many phases do we have?
Maybe we should have a helper that fetches the smallest value without creating a new list and sorting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A moderate number: one per builder per package it runs on.

There is definitely one value in keys because of the check one line above, so actually this can be .reduce(min) which is much better :)


// Build graphs from cycles.
final existingCycles =
_cycles.values
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer doing a normal loop instead of doing .where.map.toList (e.g. see dart-lang/sdk#53987)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Also, moved into _buildGraphs, since it's only needed there.

// The graph expires at the earliest of its root cycle deps expirey phases
// and all its child graph expirey phases, compute the initial value for
// the root to be updated as childen are added.
var expiresAt = root.ids
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is completely unreadable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took another shot at it :)

      // The graph expires at first phase in which any asset in the graph
      // expires. Start this calculation by finding the earliest expirey phase
      // of all assets in the graph root cycle. It will be updated for each
      // child graph below.

and extracted an earliestPhase which I can use with reduce, and in updating expiresAt below, I think that's clearer.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's super annoying that I'm not able to put the comment on a specific range of code as I can in gerrit :/

What was - and still is - unreadable (to me) is the code:

      var expiresAt = root.ids
          .map((id) => _assetDeps[id]!.expiringValueAt(phase: phase).expiresAt)
          .fold<int?>(
            null,
            (a, b) =>
                a == null
                    ? b
                    : b == null
                    ? null
                    : min(a, b),
          );

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And now Github is showing me different code :S

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed elsewhere :)

/// For example, if `foo.dart` is a generated file generated in phase 3:
///
/// - the value of `foo.dart` at phase 0 is empty/missing;
/// - that value expires at phase 3
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good explanation, but it seems to me that it doesn't so much "expire at" as "expire after"? (which is also why it's available at expire+1).

E.g. also below --- wouldn't

  bool isExpiredAt({required int phase}) {
    return expiresAt != null && expiresAt! < phase;
  }

be easier to undestand as

  bool isExpiredAt({required int phase}) {
    return expiresAfter != null && expiresAfter! < phase;
  }

? (e.g. With "at" it sounds - at least to me - as if <= should be used instead of <).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's a nice improvement, thanks :) done.

Copy link
Contributor Author

@davidmorgan davidmorgan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, please see if my changes adequately address things :)

LibraryCycleGraph._();

/// All subgraphs in the graph, including the root.
Iterable<LibraryCycleGraph> get transitiveGraphs sync* {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure yet if it needs to be fast, and I find sync* easier to write / less likely to have bugs.

The TODO below is about writing away from sync* in case it does need to be fast :)

If there is a good pattern I can follow for writing it without sync*, happy to learn + do that now :)

final idToLoad = idsToLoad.removeLast();

// Nothing to do if deps were already loaded, unless they were
// loaded and expire and [assetDepsLoader] can see a newer value.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to:

   // Nothing to do if deps were already loaded, unless they expire and
  // [assetDepsLoader] is at a late enough phase to see the updated value.

// Work through phases <= `upToPhase` at which graphs expire,
// so there are new values to compute.
if (_graphsToComputeByPhase.isEmpty) break;
phase = (_graphsToComputeByPhase.keys.toList()..sort()).first;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A moderate number: one per builder per package it runs on.

There is definitely one value in keys because of the check one line above, so actually this can be .reduce(min) which is much better :)


// Build graphs from cycles.
final existingCycles =
_cycles.values
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Also, moved into _buildGraphs, since it's only needed there.

// The graph expires at the earliest of its root cycle deps expirey phases
// and all its child graph expirey phases, compute the initial value for
// the root to be updated as childen are added.
var expiresAt = root.ids
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took another shot at it :)

      // The graph expires at first phase in which any asset in the graph
      // expires. Start this calculation by finding the earliest expirey phase
      // of all assets in the graph root cycle. It will be updated for each
      // child graph below.

and extracted an earliestPhase which I can use with reduce, and in updating expiresAt below, I think that's clearer.

/// For example, if `foo.dart` is a generated file generated in phase 3:
///
/// - the value of `foo.dart` at phase 0 is empty/missing;
/// - that value expires at phase 3
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's a nice improvement, thanks :) done.

LibraryCycleGraph._();

/// All subgraphs in the graph, including the root.
Iterable<LibraryCycleGraph> get transitiveGraphs sync* {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this instance I think returning seenGraphs would result in the same thing. It's not using more ram (you already have the Set), I'm guessing you're losing some overhead and that your profile will be better at assigning stuff correctly. And then you can get rid of the sync* :)

@davidmorgan davidmorgan merged commit ec6d434 into dart-lang:master Apr 4, 2025
75 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants