Skip to content

Refactor scripts/create_repository.py, improve efficiency and correctness, and bump Scalatest, protobuf-java, kind-projector #1642

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Nov 12, 2024

Conversation

mbland
Copy link
Contributor

@mbland mbland commented Nov 7, 2024

Description

Refactors scripts/create_repository.py to make more modular, robust, and efficient, and update Scalatest, protobuf-java, and kind-projector jars. Part of #1482.

There are a lot of changes from individual commits in this one large PR, since @simuons said it wasn't necessary to break up #1639, another large PR to refactor scripts/create_repository.py, and all tests continue to pass. That said, I'm always more than happy to break this one apart if it really is too much.

The highlights are:

  • Bumps to Scalatest 3.2.19, protobuf-java 4.28.3, and kind-projector 0.13.3
  • Each of the output dictionaries are now sorted by key.
    • This makes it much easier to see when and how dependencies change from this point forward, since the ordering of artifacts will remain stable.
  • There are now several distinct objects, making the relationships between groups of functions and any shared data much clearer.
  • The object based design also has made it easier to tweak details and add features without fear of violating scoping assumptions.
  • If the label naming algorithm changes, or any artifacts are added to EXCLUDED_ARTIFACTS, the all script will update existing dictionary data appropriately.
    • Even if an artifact isn't upgraded by the script, the script will update its dictionary key and all deps labels referring to it, or remove the artifact and its references altogether.
    • The script does not yet, however, handle updating macros that instantiate artifact repositories from this data. I have ideas for how to do that, but it's out of scope for this change.

Perhaps most significantly, this cuts the number of Coursier commands and artifact downloads significantly, leading to a very large performance improvement. Here are the times for creating a fresh output dir and then updating it before this change (all with a warm Coursier download cache):

$ /usr/bin/time ./scripts/create_repository.py --output_dir before

[ ...snip... ]
       66.01 real        13.25 user         9.99 sys

$ /usr/bin/time ./scripts/create_repository.py --output_dir before

[ ...snip... ]
        1.76 real         1.41 user         0.49 sys

Here are the times after:

$ /usr/bin/time ./scripts/create_repository.py --output_dir after

[ ...snip... ]
        1.16 real         0.84 user         0.39 sys

$ /usr/bin/time ./scripts/create_repository.py --output_dir after

[ ...snip... ]
        0.96 real         0.72 user         0.27 sys

Motivation

Overall, this script is making my further work to update dependencies to ease the Bzlmod migration in #1482 so much faster, easier, and safer. But I found ways to improve it further, after #1639, to make the script itself more maintainable, more correct and robust, and much faster.

I'm now using it to bump ScalaPB, protoc-bridge, gRPC, guava, etc. dependencies, and resolving any breakages, in an effort to make rules_scala able to use newer versions of the Protobuf library. Though independent from (and not strictly required by) the Bzlmod migration work, this may be especially important to future Bzlmod users that include later Protobuf versions in their dependencies.

The script is now in a state where I really trust it to do the right thing as I continue this work. This makes this complex upgrade work far more tractable than before.

mbland added 20 commits November 6, 2024 11:41
No code changes other than the version bumps.
Improves readability by associating several functions directly with the
`MavenCoordinates` class. Doesn't produce any changes to the
`third_party/repositories/scala_*.bzl` files.
Improves readability and maintainability. No change in
`third_party/repositories/scala_*.bzl`.
Uses a list comprehension in `get_json_dependencies` as well, instead of
the `list(map(...))` composition. No changes in
`third_party/repositories/scala_*.bzl`.
This will hopefully help future contributors discover how to properly
update these files via `scripts/create_repository.py`.
Continuation of refactoring to objects to improve maintainability.
I noticed that `org.typelevel:kind-projector_2.{11.12,12.12,13.15}` was
always updated after putting `print()` statements in
`_map_to_resolved_artifacts`. Arguably if it's a root artifact (which
`kind-projector` is), or dependency of one, it shouldn't be 'testonly'.
It seems version 0.13.3 is available for Scala 2.11.12, so no need to
restrain its version number.
Also hoisted select_root_artifacts to the top of the file, closer to the
actual root artifact version constant declarations.
Also added docstrings to all public methods.
It turns out it's in all the existing
`third_party/repositories/scala_*.bzl` files anyway.

Also removed `EXCLUDED_ARTIFACTS` from create_repository.py.
Slight efficiency improvement to avoid having to recompute the same
labels over and over.
Avoids having `_get_json_dependencies` read the file multiple times, and
allows us to clean it up as soon as it's read.
Starting to update repo names pulled on a thread that led to adding
several new root artifacts and updating others. However, these are all
good changes that pass all tests.
I noticed while running the command directly that we could use the JSON
output file to run through all resolved artifacts and their dependencies
directly. There was no need for separate a separate `cs resolve` step
and `cs fetch` steps for each resolved artifact. Nor was there a need
for a separate `_get_json_dependencies` function.

Most significantly, there was no need to download each artifact for
checksumming. `cs fetch` already downloaded them and listed their paths
in the JSON file. The original code was parsing existing file paths to
generate URLs to download them again.

This change vastly improves performance. Here are the times for creating
a fresh output dir and then updating it before this change:

```txt
$ /usr/bin/time ./scripts/create_repository.py --output_dir before

[ ...snip... ]
       66.01 real        13.25 user         9.99 sys

$ /usr/bin/time ./scripts/create_repository.py --output_dir before

[ ...snip... ]
        1.76 real         1.41 user         0.49 sys
```

Here are the times after:

```txt
$ /usr/bin/time ./scripts/create_repository.py --output_dir after

[ ...snip... ]
        1.16 real         0.84 user         0.39 sys

$ /usr/bin/time ./scripts/create_repository.py --output_dir after

[ ...snip... ]
        0.96 real         0.72 user         0.27 sys
```

Comparing the two output directories via `diff -uNr before after` shows
the later having reordered some dependencies, otherwise the output is
identical.

This change also introduces a metadata cache. It might not help much
overall, but it feels right not to recompute data if possible.

Also decided to emit the Maven coordinates for each artifact actually
updated by the script.
It's only the right thing to do for our fellow humans.
Also:

- Renames `SCALAPB_RUNTIME_VERSION` to `SCALAPB_VERSION`, since it
  affects other artifacts in the ScalaPB suite.

- Updates `ArtifactLabelMaker.get_label` so that it doesn't call
  `self._get_label_impl` unless a label isn't already present. The
  previous `self._cache.setdefault` call took the result of
  `self._get_label_impl` as an argument, which defeated the purpose.
The new mechanism is more thorough and robust than the version from
"Make scala-parser-combinators a root artifact". It will not only
exclude artifacts from the newly generated dict, but will remove `deps`
labels for newly excluded artifacts from the existing dict.

There are no artifacts in the `EXCLUDED_ARTIFACTS` set yet, but we'll
add `com.google.guava:listenablefuture` to it once we get to updating
gRPC and Guava. I wanted this mechanism to stand on its own for clarity.
Replaces `SPECIAL_CASE_GROUP_LABELS`. Even though keying on the group
label worked for now, keying on the artifact seems more sound.
Adds this new private helper function to `ArtifactLabelMaker` to make
stripping Scala version suffixes from artifact names more robust.
Copy link
Collaborator

@simuons simuons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@liucijus liucijus merged commit 3ccaeae into bazel-contrib:master Nov 12, 2024
2 checks passed
@mbland mbland deleted the bzlmod-update-create_repository branch November 12, 2024 20:39
mbland added a commit to mbland/rules_scala that referenced this pull request Nov 15, 2024
Moves the Scalatest, JUnit, and Specs2 toolchains into
`@io_bazel_rules_scala_toolchains//testing`. Part of bazel-contrib#1482.

Updates all `WORKSPACE` files to set the appropriate `scala_toolchains`
parameters and to remove the unnecessary repository import and toolchain
registration macros.

Adds a `fetch_sources_by_id` parameter to `repositories` from
`third_party/repositories/repositories.bzl`. This enables
`scala_toolchains` to build the `artifact_ids_to_fetch_sources` mapping
from artifact ID lists returned by new macros extracted from `WORKSPACE`
macros. The values assigned to each id match the original
`fetch_sources` settings in the corresponding original `WORKSPACE`
macros.

Updates `scala/scala_maven_import_external.bzl` to generate a `load`
line for `//scala:scala_import.bzl` based on the repo's canonical name,
not `@io_bazel_rules_scala`.

As usual, includes several other opportunistic removals of the
`@io_bazel_rules_scala` repo name prefix to avoid an internal dependency
on that name. This means Bzlmod users won't necessarily have to set the
`repo_name` parameter of `bazel_dep` when using `rules_scala`.

---

Introduces more macros to return a framework's Maven artifact
dependencies, rather than inlining them in a `repositories` call. These
inlined lists are replaced by macro invocations, and now the
`scala_toolchains` macro can invoke these macros to collect artifact IDs
to pass to `repositories`. This also allows for future changes to
introduce a `scala_version` parameter if necessary, similar to how
`scalafmt_artifact_ids` already works.

This is important to avoid collisions when creating repositories for
artifacts upon which more than one framework depends under Bzlmod.
`WORKSPACE` doesn't seem affected by these collisions, but Bzlmod will
produce errors like the following, where both `scala_proto` and
`twitter_scrooge` depend upon `io_bazel_rules_scala_guava`:

```txt
$ bazel build //src/...

ERROR: .../scala/scala_maven_import_external.bzl:299:24:
  Traceback (most recent call last):
    File ".../scala/extensions/deps.bzl",
      line 140, column 21, in _scala_deps_impl
        scala_toolchains(
    File ".../scala/private/macros/toolchains.bzl",
      line 140, column 17, in scala_toolchains
        _scrooge(
    File ".../twitter_scrooge/twitter_scrooge.bzl",
      line 96, column 17, in twitter_scrooge
        repositories(
    File ".../third_party/repositories/repositories.bzl",
      line 113, column 37, in repositories
        _scala_maven_import_external(
    File ".../scala/scala_maven_import_external.bzl",
      line 263, column 30, in scala_maven_import_external
        jvm_maven_import_external(
    File ".../scala/scala_maven_import_external.bzl",
      line 299, column 24, in jvm_maven_import_external
        jvm_import_external(jar_urls = jar_urls, srcjar_urls = srcjar_urls, coordinates = artifact, **kwargs)

Error in repository_rule: A repo named
  io_bazel_rules_scala_guava_2_13_15
  is already generated by this module extension at
  .../scala/scala_maven_import_external.bzl:299:24

ERROR: Analysis of target
  '//src/java/io/bazel/rulesscala/worker:worker_test'
  failed; build aborted:
  error evaluating module extension scala_deps in
  //scala/extensions:deps.bzl
```

Recent updates to `scripts/create_repository.py` (bazel-contrib#1639, bazel-contrib#1642) make it
easy to emit full direct dependency lists for artifacts included in
`third_party/repositories/scala_*.bzl`. This increases the likelihood of
collisions, since this expanded metadata forces the macros that
instantiate artifact repos to instantiate even more repos.

By fetching list of artifact IDs from these macros, `scala_toolchains`
can now consolidate them into dictionary keys. Then it passes these
unique keys to `repositories` directly, avoiding the problem of
instantiating the same repo multiple times in the same module extension.

This, in turn, also avoids the need to add parameters to the original
`WORKSPACE` macros that instantiate dependencies to avoid collisions
under Bzlmod. The `scala_toolchains` macro never needs to call these
original macros, under either `WORKSPACE` or Bzlmod.

Finally, it also reduces duplication between these artifact ID lists and
the `_*_DEPS` symbols originally from `testing/BUILD` (and now in
`testing/deps.bzl`). The dependency labels are now generated
programatically.

(Aside: As I mentioned, we may eventually need to pass a Scala version
argument to these macros. It will be possible to cross that bridge
without too much trouble if and when that day comes. Or I can try to
future proof it in a follow up pull request.)
mbland added a commit to mbland/rules_scala that referenced this pull request Nov 27, 2024
Moves the Scalatest, JUnit, and Specs2 toolchains into
`@io_bazel_rules_scala_toolchains//testing`. Part of bazel-contrib#1482.

Updates all `WORKSPACE` files to set the appropriate `scala_toolchains`
parameters and to remove the unnecessary repository import and toolchain
registration macros.

Adds a `fetch_sources_by_id` parameter to `repositories` from
`third_party/repositories/repositories.bzl`. This enables
`scala_toolchains` to build the `artifact_ids_to_fetch_sources` mapping
from artifact ID lists returned by new macros extracted from `WORKSPACE`
macros. The values assigned to each id match the original
`fetch_sources` settings in the corresponding original `WORKSPACE`
macros.

Updates `scala/scala_maven_import_external.bzl` to generate a `load`
line for `//scala:scala_import.bzl` based on the repo's canonical name,
not `@io_bazel_rules_scala`.

As usual, includes several other opportunistic removals of the
`@io_bazel_rules_scala` repo name prefix to avoid an internal dependency
on that name. This means Bzlmod users won't necessarily have to set the
`repo_name` parameter of `bazel_dep` when using `rules_scala`.

---

Introduces more macros to return a framework's Maven artifact
dependencies, rather than inlining them in a `repositories` call. These
inlined lists are replaced by macro invocations, and now the
`scala_toolchains` macro can invoke these macros to collect artifact IDs
to pass to `repositories`. This also allows for future changes to
introduce a `scala_version` parameter if necessary, similar to how
`scalafmt_artifact_ids` already works.

This is important to avoid collisions when creating repositories for
artifacts upon which more than one framework depends under Bzlmod.
`WORKSPACE` doesn't seem affected by these collisions, but Bzlmod will
produce errors like the following, where both `scala_proto` and
`twitter_scrooge` depend upon `io_bazel_rules_scala_guava`:

```txt
$ bazel build //src/...

ERROR: .../scala/scala_maven_import_external.bzl:299:24:
  Traceback (most recent call last):
    File ".../scala/extensions/deps.bzl",
      line 140, column 21, in _scala_deps_impl
        scala_toolchains(
    File ".../scala/private/macros/toolchains.bzl",
      line 140, column 17, in scala_toolchains
        _scrooge(
    File ".../twitter_scrooge/twitter_scrooge.bzl",
      line 96, column 17, in twitter_scrooge
        repositories(
    File ".../third_party/repositories/repositories.bzl",
      line 113, column 37, in repositories
        _scala_maven_import_external(
    File ".../scala/scala_maven_import_external.bzl",
      line 263, column 30, in scala_maven_import_external
        jvm_maven_import_external(
    File ".../scala/scala_maven_import_external.bzl",
      line 299, column 24, in jvm_maven_import_external
        jvm_import_external(jar_urls = jar_urls, srcjar_urls = srcjar_urls, coordinates = artifact, **kwargs)

Error in repository_rule: A repo named
  io_bazel_rules_scala_guava_2_13_15
  is already generated by this module extension at
  .../scala/scala_maven_import_external.bzl:299:24

ERROR: Analysis of target
  '//src/java/io/bazel/rulesscala/worker:worker_test'
  failed; build aborted:
  error evaluating module extension scala_deps in
  //scala/extensions:deps.bzl
```

Recent updates to `scripts/create_repository.py` (bazel-contrib#1639, bazel-contrib#1642) make it
easy to emit full direct dependency lists for artifacts included in
`third_party/repositories/scala_*.bzl`. This increases the likelihood of
collisions, since this expanded metadata forces the macros that
instantiate artifact repos to instantiate even more repos.

By fetching list of artifact IDs from these macros, `scala_toolchains`
can now consolidate them into dictionary keys. Then it passes these
unique keys to `repositories` directly, avoiding the problem of
instantiating the same repo multiple times in the same module extension.

This, in turn, also avoids the need to add parameters to the original
`WORKSPACE` macros that instantiate dependencies to avoid collisions
under Bzlmod. The `scala_toolchains` macro never needs to call these
original macros, under either `WORKSPACE` or Bzlmod.

Finally, it also reduces duplication between these artifact ID lists and
the `_*_DEPS` symbols originally from `testing/BUILD` (and now in
`testing/deps.bzl`). The dependency labels are now generated
programatically.

(Aside: As I mentioned, we may eventually need to pass a Scala version
argument to these macros. It will be possible to cross that bridge
without too much trouble if and when that day comes. Or I can try to
future proof it in a follow up pull request.)
mbland added a commit to mbland/rules_scala that referenced this pull request Nov 27, 2024
Moves the Scalatest, JUnit, and Specs2 toolchains into
`@io_bazel_rules_scala_toolchains//testing`. Part of bazel-contrib#1482.

Updates all `WORKSPACE` files to set the appropriate `scala_toolchains`
parameters and to remove the unnecessary repository import and toolchain
registration macros.

Adds a `fetch_sources_by_id` parameter to `repositories` from
`third_party/repositories/repositories.bzl`. This enables
`scala_toolchains` to build the `artifact_ids_to_fetch_sources` mapping
from artifact ID lists returned by new macros extracted from `WORKSPACE`
macros. The values assigned to each id match the original
`fetch_sources` settings in the corresponding original `WORKSPACE`
macros.

Updates `scala/scala_maven_import_external.bzl` to generate a `load`
line for `//scala:scala_import.bzl` based on the repo's canonical name,
not `@io_bazel_rules_scala`.

As usual, includes several other opportunistic removals of the
`@io_bazel_rules_scala` repo name prefix to avoid an internal dependency
on that name. This means Bzlmod users won't necessarily have to set the
`repo_name` parameter of `bazel_dep` when using `rules_scala`.

---

Introduces more macros to return a framework's Maven artifact
dependencies, rather than inlining them in a `repositories` call. These
inlined lists are replaced by macro invocations, and now the
`scala_toolchains` macro can invoke these macros to collect artifact IDs
to pass to `repositories`. This also allows for future changes to
introduce a `scala_version` parameter if necessary, similar to how
`scalafmt_artifact_ids` already works.

This is important to avoid collisions when creating repositories for
artifacts upon which more than one framework depends under Bzlmod.
`WORKSPACE` doesn't seem affected by these collisions, but Bzlmod will
produce errors like the following, where both `scala_proto` and
`twitter_scrooge` depend upon `io_bazel_rules_scala_guava`:

```txt
$ bazel build //src/...

ERROR: .../scala/scala_maven_import_external.bzl:299:24:
  Traceback (most recent call last):
    File ".../scala/extensions/deps.bzl",
      line 140, column 21, in _scala_deps_impl
        scala_toolchains(
    File ".../scala/private/macros/toolchains.bzl",
      line 140, column 17, in scala_toolchains
        _scrooge(
    File ".../twitter_scrooge/twitter_scrooge.bzl",
      line 96, column 17, in twitter_scrooge
        repositories(
    File ".../third_party/repositories/repositories.bzl",
      line 113, column 37, in repositories
        _scala_maven_import_external(
    File ".../scala/scala_maven_import_external.bzl",
      line 263, column 30, in scala_maven_import_external
        jvm_maven_import_external(
    File ".../scala/scala_maven_import_external.bzl",
      line 299, column 24, in jvm_maven_import_external
        jvm_import_external(jar_urls = jar_urls, srcjar_urls = srcjar_urls, coordinates = artifact, **kwargs)

Error in repository_rule: A repo named
  io_bazel_rules_scala_guava_2_13_15
  is already generated by this module extension at
  .../scala/scala_maven_import_external.bzl:299:24

ERROR: Analysis of target
  '//src/java/io/bazel/rulesscala/worker:worker_test'
  failed; build aborted:
  error evaluating module extension scala_deps in
  //scala/extensions:deps.bzl
```

Recent updates to `scripts/create_repository.py` (bazel-contrib#1639, bazel-contrib#1642) make it
easy to emit full direct dependency lists for artifacts included in
`third_party/repositories/scala_*.bzl`. This increases the likelihood of
collisions, since this expanded metadata forces the macros that
instantiate artifact repos to instantiate even more repos.

By fetching list of artifact IDs from these macros, `scala_toolchains`
can now consolidate them into dictionary keys. Then it passes these
unique keys to `repositories` directly, avoiding the problem of
instantiating the same repo multiple times in the same module extension.

This, in turn, also avoids the need to add parameters to the original
`WORKSPACE` macros that instantiate dependencies to avoid collisions
under Bzlmod. The `scala_toolchains` macro never needs to call these
original macros, under either `WORKSPACE` or Bzlmod.

Finally, it also reduces duplication between these artifact ID lists and
the `_*_DEPS` symbols originally from `testing/BUILD` (and now in
`testing/deps.bzl`). The dependency labels are now generated
programatically.

(Aside: As I mentioned, we may eventually need to pass a Scala version
argument to these macros. It will be possible to cross that bridge
without too much trouble if and when that day comes. Or I can try to
future proof it in a follow up pull request.)
mbland added a commit to mbland/rules_scala that referenced this pull request Dec 8, 2024
Moves the Scalatest, JUnit, and Specs2 toolchains into
`@io_bazel_rules_scala_toolchains//testing`. Part of bazel-contrib#1482.

Updates all `WORKSPACE` files to set the appropriate `scala_toolchains`
parameters and to remove the unnecessary repository import and toolchain
registration macros.

Adds a `fetch_sources_by_id` parameter to `repositories` from
`third_party/repositories/repositories.bzl`. This enables
`scala_toolchains` to build the `artifact_ids_to_fetch_sources` mapping
from artifact ID lists returned by new macros extracted from `WORKSPACE`
macros. The values assigned to each id match the original
`fetch_sources` settings in the corresponding original `WORKSPACE`
macros.

Updates `scala/scala_maven_import_external.bzl` to generate a `load`
line for `//scala:scala_import.bzl` based on the repo's canonical name,
not `@io_bazel_rules_scala`.

As usual, includes several other opportunistic removals of the
`@io_bazel_rules_scala` repo name prefix to avoid an internal dependency
on that name. This means Bzlmod users won't necessarily have to set the
`repo_name` parameter of `bazel_dep` when using `rules_scala`.

---

Introduces more macros to return a framework's Maven artifact
dependencies, rather than inlining them in a `repositories` call. These
inlined lists are replaced by macro invocations, and now the
`scala_toolchains` macro can invoke these macros to collect artifact IDs
to pass to `repositories`. This also allows for future changes to
introduce a `scala_version` parameter if necessary, similar to how
`scalafmt_artifact_ids` already works.

This is important to avoid collisions when creating repositories for
artifacts upon which more than one framework depends under Bzlmod.
`WORKSPACE` doesn't seem affected by these collisions, but Bzlmod will
produce errors like the following, where both `scala_proto` and
`twitter_scrooge` depend upon `io_bazel_rules_scala_guava`:

```txt
$ bazel build //src/...

ERROR: .../scala/scala_maven_import_external.bzl:299:24:
  Traceback (most recent call last):
    File ".../scala/extensions/deps.bzl",
      line 140, column 21, in _scala_deps_impl
        scala_toolchains(
    File ".../scala/private/macros/toolchains.bzl",
      line 140, column 17, in scala_toolchains
        _scrooge(
    File ".../twitter_scrooge/twitter_scrooge.bzl",
      line 96, column 17, in twitter_scrooge
        repositories(
    File ".../third_party/repositories/repositories.bzl",
      line 113, column 37, in repositories
        _scala_maven_import_external(
    File ".../scala/scala_maven_import_external.bzl",
      line 263, column 30, in scala_maven_import_external
        jvm_maven_import_external(
    File ".../scala/scala_maven_import_external.bzl",
      line 299, column 24, in jvm_maven_import_external
        jvm_import_external(jar_urls = jar_urls, srcjar_urls = srcjar_urls, coordinates = artifact, **kwargs)

Error in repository_rule: A repo named
  io_bazel_rules_scala_guava_2_13_15
  is already generated by this module extension at
  .../scala/scala_maven_import_external.bzl:299:24

ERROR: Analysis of target
  '//src/java/io/bazel/rulesscala/worker:worker_test'
  failed; build aborted:
  error evaluating module extension scala_deps in
  //scala/extensions:deps.bzl
```

Recent updates to `scripts/create_repository.py` (bazel-contrib#1639, bazel-contrib#1642) make it
easy to emit full direct dependency lists for artifacts included in
`third_party/repositories/scala_*.bzl`. This increases the likelihood of
collisions, since this expanded metadata forces the macros that
instantiate artifact repos to instantiate even more repos.

By fetching list of artifact IDs from these macros, `scala_toolchains`
can now consolidate them into dictionary keys. Then it passes these
unique keys to `repositories` directly, avoiding the problem of
instantiating the same repo multiple times in the same module extension.

This, in turn, also avoids the need to add parameters to the original
`WORKSPACE` macros that instantiate dependencies to avoid collisions
under Bzlmod. The `scala_toolchains` macro never needs to call these
original macros, under either `WORKSPACE` or Bzlmod.

Finally, it also reduces duplication between these artifact ID lists and
the `_*_DEPS` symbols originally from `testing/BUILD` (and now in
`testing/deps.bzl`). The dependency labels are now generated
programatically.

(Aside: As I mentioned, we may eventually need to pass a Scala version
argument to these macros. It will be possible to cross that bridge
without too much trouble if and when that day comes. Or I can try to
future proof it in a follow up pull request.)
mbland added a commit to mbland/rules_scala that referenced this pull request Dec 10, 2024
Moves the Scalatest, JUnit, and Specs2 toolchains into
`@io_bazel_rules_scala_toolchains//testing`. Part of bazel-contrib#1482.

Updates all `WORKSPACE` files to set the appropriate `scala_toolchains`
parameters and to remove the unnecessary repository import and toolchain
registration macros.

Adds a `fetch_sources_by_id` parameter to `repositories` from
`third_party/repositories/repositories.bzl`. This enables
`scala_toolchains` to build the `artifact_ids_to_fetch_sources` mapping
from artifact ID lists returned by new macros extracted from `WORKSPACE`
macros. The values assigned to each id match the original
`fetch_sources` settings in the corresponding original `WORKSPACE`
macros.

Updates `scala/scala_maven_import_external.bzl` to generate a `load`
line for `//scala:scala_import.bzl` based on the repo's canonical name,
not `@io_bazel_rules_scala`.

As usual, includes several other opportunistic removals of the
`@io_bazel_rules_scala` repo name prefix to avoid an internal dependency
on that name. This means Bzlmod users won't necessarily have to set the
`repo_name` parameter of `bazel_dep` when using `rules_scala`.

---

Introduces more macros to return a framework's Maven artifact
dependencies, rather than inlining them in a `repositories` call. These
inlined lists are replaced by macro invocations, and now the
`scala_toolchains` macro can invoke these macros to collect artifact IDs
to pass to `repositories`. This also allows for future changes to
introduce a `scala_version` parameter if necessary, similar to how
`scalafmt_artifact_ids` already works.

This is important to avoid collisions when creating repositories for
artifacts upon which more than one framework depends under Bzlmod.
`WORKSPACE` doesn't seem affected by these collisions, but Bzlmod will
produce errors like the following, where both `scala_proto` and
`twitter_scrooge` depend upon `io_bazel_rules_scala_guava`:

```txt
$ bazel build //src/...

ERROR: .../scala/scala_maven_import_external.bzl:299:24:
  Traceback (most recent call last):
    File ".../scala/extensions/deps.bzl",
      line 140, column 21, in _scala_deps_impl
        scala_toolchains(
    File ".../scala/private/macros/toolchains.bzl",
      line 140, column 17, in scala_toolchains
        _scrooge(
    File ".../twitter_scrooge/twitter_scrooge.bzl",
      line 96, column 17, in twitter_scrooge
        repositories(
    File ".../third_party/repositories/repositories.bzl",
      line 113, column 37, in repositories
        _scala_maven_import_external(
    File ".../scala/scala_maven_import_external.bzl",
      line 263, column 30, in scala_maven_import_external
        jvm_maven_import_external(
    File ".../scala/scala_maven_import_external.bzl",
      line 299, column 24, in jvm_maven_import_external
        jvm_import_external(jar_urls = jar_urls, srcjar_urls = srcjar_urls, coordinates = artifact, **kwargs)

Error in repository_rule: A repo named
  io_bazel_rules_scala_guava_2_13_15
  is already generated by this module extension at
  .../scala/scala_maven_import_external.bzl:299:24

ERROR: Analysis of target
  '//src/java/io/bazel/rulesscala/worker:worker_test'
  failed; build aborted:
  error evaluating module extension scala_deps in
  //scala/extensions:deps.bzl
```

Recent updates to `scripts/create_repository.py` (bazel-contrib#1639, bazel-contrib#1642) make it
easy to emit full direct dependency lists for artifacts included in
`third_party/repositories/scala_*.bzl`. This increases the likelihood of
collisions, since this expanded metadata forces the macros that
instantiate artifact repos to instantiate even more repos.

By fetching list of artifact IDs from these macros, `scala_toolchains`
can now consolidate them into dictionary keys. Then it passes these
unique keys to `repositories` directly, avoiding the problem of
instantiating the same repo multiple times in the same module extension.

This, in turn, also avoids the need to add parameters to the original
`WORKSPACE` macros that instantiate dependencies to avoid collisions
under Bzlmod. The `scala_toolchains` macro never needs to call these
original macros, under either `WORKSPACE` or Bzlmod.

Finally, it also reduces duplication between these artifact ID lists and
the `_*_DEPS` symbols originally from `testing/BUILD` (and now in
`testing/deps.bzl`). The dependency labels are now generated
programatically.

(Aside: As I mentioned, we may eventually need to pass a Scala version
argument to these macros. It will be possible to cross that bridge
without too much trouble if and when that day comes. Or I can try to
future proof it in a follow up pull request.)
mbland added a commit to mbland/rules_scala that referenced this pull request Dec 16, 2024
Moves the Scalatest, JUnit, and Specs2 toolchains into
`@io_bazel_rules_scala_toolchains//testing`. Part of bazel-contrib#1482.

Updates all `WORKSPACE` files to set the appropriate `scala_toolchains`
parameters and to remove the unnecessary repository import and toolchain
registration macros.

Adds a `fetch_sources_by_id` parameter to `repositories` from
`third_party/repositories/repositories.bzl`. This enables
`scala_toolchains` to build the `artifact_ids_to_fetch_sources` mapping
from artifact ID lists returned by new macros extracted from `WORKSPACE`
macros. The values assigned to each id match the original
`fetch_sources` settings in the corresponding original `WORKSPACE`
macros.

Updates `scala/scala_maven_import_external.bzl` to generate a `load`
line for `//scala:scala_import.bzl` based on the repo's canonical name,
not `@io_bazel_rules_scala`.

As usual, includes several other opportunistic removals of the
`@io_bazel_rules_scala` repo name prefix to avoid an internal dependency
on that name. This means Bzlmod users won't necessarily have to set the
`repo_name` parameter of `bazel_dep` when using `rules_scala`.

---

Introduces more macros to return a framework's Maven artifact
dependencies, rather than inlining them in a `repositories` call. These
inlined lists are replaced by macro invocations, and now the
`scala_toolchains` macro can invoke these macros to collect artifact IDs
to pass to `repositories`. This also allows for future changes to
introduce a `scala_version` parameter if necessary, similar to how
`scalafmt_artifact_ids` already works.

This is important to avoid collisions when creating repositories for
artifacts upon which more than one framework depends under Bzlmod.
`WORKSPACE` doesn't seem affected by these collisions, but Bzlmod will
produce errors like the following, where both `scala_proto` and
`twitter_scrooge` depend upon `io_bazel_rules_scala_guava`:

```txt
$ bazel build //src/...

ERROR: .../scala/scala_maven_import_external.bzl:299:24:
  Traceback (most recent call last):
    File ".../scala/extensions/deps.bzl",
      line 140, column 21, in _scala_deps_impl
        scala_toolchains(
    File ".../scala/private/macros/toolchains.bzl",
      line 140, column 17, in scala_toolchains
        _scrooge(
    File ".../twitter_scrooge/twitter_scrooge.bzl",
      line 96, column 17, in twitter_scrooge
        repositories(
    File ".../third_party/repositories/repositories.bzl",
      line 113, column 37, in repositories
        _scala_maven_import_external(
    File ".../scala/scala_maven_import_external.bzl",
      line 263, column 30, in scala_maven_import_external
        jvm_maven_import_external(
    File ".../scala/scala_maven_import_external.bzl",
      line 299, column 24, in jvm_maven_import_external
        jvm_import_external(jar_urls = jar_urls, srcjar_urls = srcjar_urls, coordinates = artifact, **kwargs)

Error in repository_rule: A repo named
  io_bazel_rules_scala_guava_2_13_15
  is already generated by this module extension at
  .../scala/scala_maven_import_external.bzl:299:24

ERROR: Analysis of target
  '//src/java/io/bazel/rulesscala/worker:worker_test'
  failed; build aborted:
  error evaluating module extension scala_deps in
  //scala/extensions:deps.bzl
```

Recent updates to `scripts/create_repository.py` (bazel-contrib#1639, bazel-contrib#1642) make it
easy to emit full direct dependency lists for artifacts included in
`third_party/repositories/scala_*.bzl`. This increases the likelihood of
collisions, since this expanded metadata forces the macros that
instantiate artifact repos to instantiate even more repos.

By fetching list of artifact IDs from these macros, `scala_toolchains`
can now consolidate them into dictionary keys. Then it passes these
unique keys to `repositories` directly, avoiding the problem of
instantiating the same repo multiple times in the same module extension.

This, in turn, also avoids the need to add parameters to the original
`WORKSPACE` macros that instantiate dependencies to avoid collisions
under Bzlmod. The `scala_toolchains` macro never needs to call these
original macros, under either `WORKSPACE` or Bzlmod.

Finally, it also reduces duplication between these artifact ID lists and
the `_*_DEPS` symbols originally from `testing/BUILD` (and now in
`testing/deps.bzl`). The dependency labels are now generated
programatically.

(Aside: As I mentioned, we may eventually need to pass a Scala version
argument to these macros. It will be possible to cross that bridge
without too much trouble if and when that day comes. Or I can try to
future proof it in a follow up pull request.)
liucijus pushed a commit that referenced this pull request Jan 14, 2025
* Toolchainize all testing toolchains

Moves the Scalatest, JUnit, and Specs2 toolchains into
`@io_bazel_rules_scala_toolchains//testing`. Part of #1482.

Updates all `WORKSPACE` files to set the appropriate `scala_toolchains`
parameters and to remove the unnecessary repository import and toolchain
registration macros.

Adds a `fetch_sources_by_id` parameter to `repositories` from
`third_party/repositories/repositories.bzl`. This enables
`scala_toolchains` to build the `artifact_ids_to_fetch_sources` mapping
from artifact ID lists returned by new macros extracted from `WORKSPACE`
macros. The values assigned to each id match the original
`fetch_sources` settings in the corresponding original `WORKSPACE`
macros.

Updates `scala/scala_maven_import_external.bzl` to generate a `load`
line for `//scala:scala_import.bzl` based on the repo's canonical name,
not `@io_bazel_rules_scala`.

As usual, includes several other opportunistic removals of the
`@io_bazel_rules_scala` repo name prefix to avoid an internal dependency
on that name. This means Bzlmod users won't necessarily have to set the
`repo_name` parameter of `bazel_dep` when using `rules_scala`.

---

Introduces more macros to return a framework's Maven artifact
dependencies, rather than inlining them in a `repositories` call. These
inlined lists are replaced by macro invocations, and now the
`scala_toolchains` macro can invoke these macros to collect artifact IDs
to pass to `repositories`. This also allows for future changes to
introduce a `scala_version` parameter if necessary, similar to how
`scalafmt_artifact_ids` already works.

This is important to avoid collisions when creating repositories for
artifacts upon which more than one framework depends under Bzlmod.
`WORKSPACE` doesn't seem affected by these collisions, but Bzlmod will
produce errors like the following, where both `scala_proto` and
`twitter_scrooge` depend upon `io_bazel_rules_scala_guava`:

```txt
$ bazel build //src/...

ERROR: .../scala/scala_maven_import_external.bzl:299:24:
  Traceback (most recent call last):
    File ".../scala/extensions/deps.bzl",
      line 140, column 21, in _scala_deps_impl
        scala_toolchains(
    File ".../scala/private/macros/toolchains.bzl",
      line 140, column 17, in scala_toolchains
        _scrooge(
    File ".../twitter_scrooge/twitter_scrooge.bzl",
      line 96, column 17, in twitter_scrooge
        repositories(
    File ".../third_party/repositories/repositories.bzl",
      line 113, column 37, in repositories
        _scala_maven_import_external(
    File ".../scala/scala_maven_import_external.bzl",
      line 263, column 30, in scala_maven_import_external
        jvm_maven_import_external(
    File ".../scala/scala_maven_import_external.bzl",
      line 299, column 24, in jvm_maven_import_external
        jvm_import_external(jar_urls = jar_urls, srcjar_urls = srcjar_urls, coordinates = artifact, **kwargs)

Error in repository_rule: A repo named
  io_bazel_rules_scala_guava_2_13_15
  is already generated by this module extension at
  .../scala/scala_maven_import_external.bzl:299:24

ERROR: Analysis of target
  '//src/java/io/bazel/rulesscala/worker:worker_test'
  failed; build aborted:
  error evaluating module extension scala_deps in
  //scala/extensions:deps.bzl
```

Recent updates to `scripts/create_repository.py` (#1639, #1642) make it
easy to emit full direct dependency lists for artifacts included in
`third_party/repositories/scala_*.bzl`. This increases the likelihood of
collisions, since this expanded metadata forces the macros that
instantiate artifact repos to instantiate even more repos.

By fetching list of artifact IDs from these macros, `scala_toolchains`
can now consolidate them into dictionary keys. Then it passes these
unique keys to `repositories` directly, avoiding the problem of
instantiating the same repo multiple times in the same module extension.

This, in turn, also avoids the need to add parameters to the original
`WORKSPACE` macros that instantiate dependencies to avoid collisions
under Bzlmod. The `scala_toolchains` macro never needs to call these
original macros, under either `WORKSPACE` or Bzlmod.

Finally, it also reduces duplication between these artifact ID lists and
the `_*_DEPS` symbols originally from `testing/BUILD` (and now in
`testing/deps.bzl`). The dependency labels are now generated
programatically.

(Aside: As I mentioned, we may eventually need to pass a Scala version
argument to these macros. It will be possible to cross that bridge
without too much trouble if and when that day comes. Or I can try to
future proof it in a follow up pull request.)

* Move `scala_toolchains` to `scala/toolchains.bzl`

Removes this symbol from `scala/scala.bzl` as well as
`setup_scala_testing_toolchain`, and deletes
`scala/private/macros/toolchains.bzl`. Part of #1482 and #1652.

This is required for Bazel 8 and `rules_java` 8 compatibility, but is
also compatible with Bazel 6 and 7. In #1652, @hvadehra suggested
partitioning the `.bzl` files such that `WORKSPACE` doesn't `load` a
file that tries to `load` symbols from `rules_java`. I successfully did
so in a separate branch, and along with other minor changes, got
`rules_scala` to build with `rules_java` 8.5.1.

The other changes will come in separate pull requests, but it makes
sense to land this change now before adding any other toolchains to
`scala_toolchains`.

---

Arguably, we should remove all macros exported from `scala/scala.bzl`
that only instantiate toolchain dependencies and define toolchains. That
may be a breaking change for some users, but will ultimately be
necessary for these macros to remain compatible with Bazel 8.

* Extract versioned `_JUNIT_DEPS` in `test/BUILD`

Eliminates reliance on the default `@io_bazel_rules_scala_junit_junit`
artifact repository.

* Update `{junit,specs2_junit}_toolchain()`

These macros now mirror the implementation of `scalatest_toolchain()`.

However, I realized that the old pattern of calling
`scalatest_repositories()` followed by `scalatest_toolchain()` will no
longer work without first calling `scala_toolchains(scalatest = True)`.
This is because the `alias` targets in `testing/BUILD` that replace the
previous implementations all point to
`@io_bazel_rules_scala_toolchains`.

So if we want to keep these macros, it seems like we should maybe
restore the original toolchain targets in `testing/BUILD`. If we don't,
we can remove these macros, but we can document these as breaking
changes, and update other documentation accordingly.

* Move `scala/{private/macros/,}toolchains_repo.bzl`

Like "Move `scala_toolchains` to `scala/toolchains.bzl`",
removes the `scala_toolchains_repo` symbol from `scala/scala.bzl` and
makes it available from `scala/toolchains_repo.bzl`.

This avoids a future `test_scala_version 2.12.20` failure during Bazel 8
builds after adding `twitter_scrooge` toolchain support in the new
`test_version/version_specific_tests_dir/scrooge_repositories.bzl` file.
Otherwise, this new file would load `toolchains_repo` from
`scala/scala.bzl`. The `test_version/test_scala_version_.../WORKSPACE`
file generated from `test_version/WORKSPACE.template` would then
transitively load `.bzl` files with `rules_java` symbols, breaking the
test.

```txt
$ RULES_SCALA_TEST_ONLY="test_scala_version 2.12.20" ./test_version.sh

ERROR: Traceback (most recent call last):
  File ".../external/rules_scala/scala/private/common_attributes.bzl",
  line 18, column 28, in <toplevel>
    "deps": attr.label_list(

Error in label_list:
  Illegal argument: element in 'providers' is of unexpected type.
  Either all elements should be providers,
  or all elements should be lists of providers,
  but got list with an element of type NoneType.

ERROR: Error computing the main repository mapping:
  at test_version/test_scala_version_.../scrooge_repositories.bzl:1:6:
  at .../scala/scala.bzl:30:5:
  at .../scala/private/rules/scala_junit_test.bzl:5:5:
initialization of module 'scala/private/common_attributes.bzl' failed
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants