From 351c4bcf87a637bd6cf99bd96cba5373579be0a0 Mon Sep 17 00:00:00 2001 From: Tobias Bieniek Date: Tue, 24 Oct 2023 15:17:31 +0200 Subject: [PATCH 01/10] Add "A tale of broken badges and 23,000 features" blog post --- ...23-10-24-broken-badges-and-23k-keywords.md | 40 +++++++++++++++++++ 1 file changed, 40 insertions(+) create mode 100644 posts/2023-10-24-broken-badges-and-23k-keywords.md diff --git a/posts/2023-10-24-broken-badges-and-23k-keywords.md b/posts/2023-10-24-broken-badges-and-23k-keywords.md new file mode 100644 index 000000000..b6c6fca55 --- /dev/null +++ b/posts/2023-10-24-broken-badges-and-23k-keywords.md @@ -0,0 +1,40 @@ +--- +layout: post +title: A tale of broken badges and 23,000 features +author: Tobias Bieniek +team: the crates.io team +--- + +Around mid-October of 2023 we, the crates.io team, were [notified](https://github.com/rust-lang/crates.io/issues/7269) by one of our users that a [shields.io](https://shields.io) badge for their crate stopped working. The issue reporter was kind enough to already debug the problem and figured out that the API request that shields.io sends to crates.io was most likely the problem. Here is a quote from the original issue: + +> This crate makes heavy use of feature flags which bloats the response payload of the API. + +Apparently the API response for this specific crate had broken the 20 MB mark and shields.io wasn't particularly happen with this. Interestingly, this crate only had 9 versions published at this point in time. But how do you get to 20 MB with only 9 published versions? + +As the quote above already mentions, this crate is using feature… a lot of features… almost 23,000 features to be precise! 😱 + +What crate needs that many features? Well, this crate provides SVG icons for Rust-based web applications… and it uses one feature per icon so that the payload size of the final WebAssembly bundle stays small. + +At first glance there should be nothing wrong with this. This seems like a reasonable thing to do from a crate author perspective and neither cargo, nor crates.io were showing any warnings about this. Unfortunately, some of the implementation details are not too happy about such high numbers of features though… + +The first problem that was already identified by the crate author: the API responses from crates.io are getting veeeery large. Adding to the problem is the fact that the crates.io API currently does not paginate the list of published versions. Changing this is obviously a breaking change, so our team had been a bit reluctant to changing the behavior of the API in that regard, though this situation has shown that we will likely have to tackle this problem in the near future. + +The next problem is that the [index file](https://index.crates.io/ic/on/icondata) for this crate is also getting large. With 9 published versions it was also already containing 11 MB of data. And just like the crates.io API, there is currently no pagination built into the package index file format. + +Now you may ask, why do the package index and `cargo` need to know about features? Well, the easy answer is: for dependency resolution. Features can enable optional dependencies, so when a dependency feature is used it might influence the dependency resolution. Our initial thought was that we could at least drop all the empty feature declarations from the index file (e.g. `foo = []`), but the cargo team informed us that `cargo` relies on them being available there too, and so for backwards-compatibility reasons this is not an option either. + +On the bright side, most Rust users are on cargo versions these days that use the sparse package index by default, which only downloads index files for packages that are actually being used. In other words: only users of this icon crate need to pay the price for downloading all the metadata. On the flipside, this means that users who are still using the git-based index are all paying for this one crate using 23,000 features. + +So, where do we go from here? 🤔 + +While we believe that supporting such high numbers of features is conceptually a valid request, with the current implementation details in crates.io and cargo we can not support this. After analyzing all of these downstream effects from a single crate having that many features, we realized that we need some form of restriction on crates.io to keep the system from falling apart. + +Now comes the important part: **on 2023-10-16 the crates.io team deployed a change that limits the number of features a crate can have to 300.** + +… for now, or at least until we have found solutions for the above problems. + +We are aware of a couple of crates that also have legitimate reasons for having more than 300 features, and we have granted them appropriate exceptions to this rule, but we would like to ask everyone to be mindful of these limitations of our current systems. + +We also invite everyone to participate in finding solutions to the above problems. The best place to discuss ideas is the [crates.io Zulip stream](https://rust-lang.zulipchat.com/#narrow/stream/318791-t-crates-io/), and once an idea is a bit more fleshed out it will then be transformed into an [RFC](https://github.com/rust-lang/rfcs/). + +Finally, we would like to thank [Charles Edward Gagnon](https://github.com/Carlosted) for making us aware of this problem. We also want to reiterate that the author and their crate are not to blame for this. It is hard to know of these crates.io implementation details when developing crates, so if anything, the blame would be on us, the crates.io team, for not having limits on this earlier. Anyway, we have them now, and now you all know why! 👋 From 1bf9b22f2caf7d60d4a6697204d5e3ef5e993158 Mon Sep 17 00:00:00 2001 From: Tobias Bieniek Date: Tue, 24 Oct 2023 18:54:00 +0200 Subject: [PATCH 02/10] Update posts/2023-10-24-broken-badges-and-23k-keywords.md Co-authored-by: Eric Huss --- posts/2023-10-24-broken-badges-and-23k-keywords.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/posts/2023-10-24-broken-badges-and-23k-keywords.md b/posts/2023-10-24-broken-badges-and-23k-keywords.md index b6c6fca55..55b7b7d64 100644 --- a/posts/2023-10-24-broken-badges-and-23k-keywords.md +++ b/posts/2023-10-24-broken-badges-and-23k-keywords.md @@ -9,7 +9,7 @@ Around mid-October of 2023 we, the crates.io team, were [notified](https://githu > This crate makes heavy use of feature flags which bloats the response payload of the API. -Apparently the API response for this specific crate had broken the 20 MB mark and shields.io wasn't particularly happen with this. Interestingly, this crate only had 9 versions published at this point in time. But how do you get to 20 MB with only 9 published versions? +Apparently the API response for this specific crate had broken the 20 MB mark and shields.io wasn't particularly happy with this. Interestingly, this crate only had 9 versions published at this point in time. But how do you get to 20 MB with only 9 published versions? As the quote above already mentions, this crate is using feature… a lot of features… almost 23,000 features to be precise! 😱 From 4820ab99498ae5a7845acc3173d07ade8092c5b7 Mon Sep 17 00:00:00 2001 From: Tobias Bieniek Date: Tue, 24 Oct 2023 19:29:07 +0200 Subject: [PATCH 03/10] Apply suggestions from code review Co-authored-by: Matthew --- posts/2023-10-24-broken-badges-and-23k-keywords.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/posts/2023-10-24-broken-badges-and-23k-keywords.md b/posts/2023-10-24-broken-badges-and-23k-keywords.md index 55b7b7d64..82519c7a1 100644 --- a/posts/2023-10-24-broken-badges-and-23k-keywords.md +++ b/posts/2023-10-24-broken-badges-and-23k-keywords.md @@ -11,7 +11,7 @@ Around mid-October of 2023 we, the crates.io team, were [notified](https://githu Apparently the API response for this specific crate had broken the 20 MB mark and shields.io wasn't particularly happy with this. Interestingly, this crate only had 9 versions published at this point in time. But how do you get to 20 MB with only 9 published versions? -As the quote above already mentions, this crate is using feature… a lot of features… almost 23,000 features to be precise! 😱 +As the quote above already mentions, this crate is using features… a lot of features… almost 23,000 features to be precise! 😱 What crate needs that many features? Well, this crate provides SVG icons for Rust-based web applications… and it uses one feature per icon so that the payload size of the final WebAssembly bundle stays small. @@ -21,15 +21,15 @@ The first problem that was already identified by the crate author: the API respo The next problem is that the [index file](https://index.crates.io/ic/on/icondata) for this crate is also getting large. With 9 published versions it was also already containing 11 MB of data. And just like the crates.io API, there is currently no pagination built into the package index file format. -Now you may ask, why do the package index and `cargo` need to know about features? Well, the easy answer is: for dependency resolution. Features can enable optional dependencies, so when a dependency feature is used it might influence the dependency resolution. Our initial thought was that we could at least drop all the empty feature declarations from the index file (e.g. `foo = []`), but the cargo team informed us that `cargo` relies on them being available there too, and so for backwards-compatibility reasons this is not an option either. +Now you may ask, why does the package index and `cargo` need to know about features? Well, the easy answer is: for dependency resolution. Features can enable optional dependencies, so when a dependency feature is used it might influence the dependency resolution. Our initial thought was that we could at least drop all the empty feature declarations from the index file (e.g. `foo = []`), but the cargo team informed us that `cargo` relies on them being available there too, and so for backwards-compatibility reasons this is not an option. -On the bright side, most Rust users are on cargo versions these days that use the sparse package index by default, which only downloads index files for packages that are actually being used. In other words: only users of this icon crate need to pay the price for downloading all the metadata. On the flipside, this means that users who are still using the git-based index are all paying for this one crate using 23,000 features. +On the bright side, most Rust users are on cargo versions these days that use the sparse package index by default, which only downloads index files for packages actually being used. In other words: only users of this icon crate need to pay the price for downloading all the metadata. On the flipside, this means users who are still using the git-based index are all paying for this one crate using 23,000 features. So, where do we go from here? 🤔 -While we believe that supporting such high numbers of features is conceptually a valid request, with the current implementation details in crates.io and cargo we can not support this. After analyzing all of these downstream effects from a single crate having that many features, we realized that we need some form of restriction on crates.io to keep the system from falling apart. +While we believe that supporting such high numbers of features is conceptually a valid request, with the current implementation details in crates.io and cargo we can not support this. After analyzing all of these downstream effects from a single crate having that many features, we realized we need some form of restriction on crates.io to keep the system from falling apart. -Now comes the important part: **on 2023-10-16 the crates.io team deployed a change that limits the number of features a crate can have to 300.** +Now comes the important part: **on 2023-10-16 the crates.io team deployed a change limiting the number of features a crate can have to 300.** … for now, or at least until we have found solutions for the above problems. From 390a14a1e95a276fcf34076b502d9bdfa13aeb80 Mon Sep 17 00:00:00 2001 From: Tobias Bieniek Date: Thu, 26 Oct 2023 10:33:48 +0200 Subject: [PATCH 04/10] Apply suggestions from code review Co-authored-by: Jubilee <46493976+workingjubilee@users.noreply.github.com> --- posts/2023-10-24-broken-badges-and-23k-keywords.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/posts/2023-10-24-broken-badges-and-23k-keywords.md b/posts/2023-10-24-broken-badges-and-23k-keywords.md index 82519c7a1..425891fe3 100644 --- a/posts/2023-10-24-broken-badges-and-23k-keywords.md +++ b/posts/2023-10-24-broken-badges-and-23k-keywords.md @@ -11,15 +11,15 @@ Around mid-October of 2023 we, the crates.io team, were [notified](https://githu Apparently the API response for this specific crate had broken the 20 MB mark and shields.io wasn't particularly happy with this. Interestingly, this crate only had 9 versions published at this point in time. But how do you get to 20 MB with only 9 published versions? -As the quote above already mentions, this crate is using features… a lot of features… almost 23,000 features to be precise! 😱 +As the quote above already mentions, this crate is using features… a lot of features… almost 23,000! 😱 What crate needs that many features? Well, this crate provides SVG icons for Rust-based web applications… and it uses one feature per icon so that the payload size of the final WebAssembly bundle stays small. -At first glance there should be nothing wrong with this. This seems like a reasonable thing to do from a crate author perspective and neither cargo, nor crates.io were showing any warnings about this. Unfortunately, some of the implementation details are not too happy about such high numbers of features though… +At first glance there should be nothing wrong with this. This seems like a reasonable thing to do from a crate author perspective and neither cargo, nor crates.io, were showing any warnings about this. Unfortunately, some of the implementation details are not too happy about such high numbers of features… -The first problem that was already identified by the crate author: the API responses from crates.io are getting veeeery large. Adding to the problem is the fact that the crates.io API currently does not paginate the list of published versions. Changing this is obviously a breaking change, so our team had been a bit reluctant to changing the behavior of the API in that regard, though this situation has shown that we will likely have to tackle this problem in the near future. +The first problem that was already identified by the crate author: the API responses from crates.io are getting veeeery large. Adding to the problem is the fact that the crates.io API currently does not paginate the list of published versions. Changing this is obviously a breaking change, so our team had been a bit reluctant to change the behavior of the API in that regard, though this situation has shown that we will likely have to tackle this problem in the near future. -The next problem is that the [index file](https://index.crates.io/ic/on/icondata) for this crate is also getting large. With 9 published versions it was also already containing 11 MB of data. And just like the crates.io API, there is currently no pagination built into the package index file format. +The next problem is that the [index file](https://index.crates.io/ic/on/icondata) for this crate is also getting large. With 9 published versions it already contains 11 MB of data. And just like the crates.io API, there is currently no pagination built into the package index file format. Now you may ask, why does the package index and `cargo` need to know about features? Well, the easy answer is: for dependency resolution. Features can enable optional dependencies, so when a dependency feature is used it might influence the dependency resolution. Our initial thought was that we could at least drop all the empty feature declarations from the index file (e.g. `foo = []`), but the cargo team informed us that `cargo` relies on them being available there too, and so for backwards-compatibility reasons this is not an option. From f871af8c124bc3c7f416c3c6e06b66c3eee22faa Mon Sep 17 00:00:00 2001 From: Tobias Bieniek Date: Thu, 26 Oct 2023 12:21:58 +0200 Subject: [PATCH 05/10] Apply suggestions from code review Co-authored-by: Luca Palmieri <20745048+LukeMathWalker@users.noreply.github.com> --- posts/2023-10-24-broken-badges-and-23k-keywords.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/posts/2023-10-24-broken-badges-and-23k-keywords.md b/posts/2023-10-24-broken-badges-and-23k-keywords.md index 425891fe3..cd59ba680 100644 --- a/posts/2023-10-24-broken-badges-and-23k-keywords.md +++ b/posts/2023-10-24-broken-badges-and-23k-keywords.md @@ -15,19 +15,19 @@ As the quote above already mentions, this crate is using features… a lot of fe What crate needs that many features? Well, this crate provides SVG icons for Rust-based web applications… and it uses one feature per icon so that the payload size of the final WebAssembly bundle stays small. -At first glance there should be nothing wrong with this. This seems like a reasonable thing to do from a crate author perspective and neither cargo, nor crates.io, were showing any warnings about this. Unfortunately, some of the implementation details are not too happy about such high numbers of features… +At first glance there should be nothing wrong with this. This seems like a reasonable thing to do from a crate author perspective and neither cargo, nor crates.io, were showing any warnings about this. Unfortunately, some of the internals are not too happy about such a high number of features… The first problem that was already identified by the crate author: the API responses from crates.io are getting veeeery large. Adding to the problem is the fact that the crates.io API currently does not paginate the list of published versions. Changing this is obviously a breaking change, so our team had been a bit reluctant to change the behavior of the API in that regard, though this situation has shown that we will likely have to tackle this problem in the near future. The next problem is that the [index file](https://index.crates.io/ic/on/icondata) for this crate is also getting large. With 9 published versions it already contains 11 MB of data. And just like the crates.io API, there is currently no pagination built into the package index file format. -Now you may ask, why does the package index and `cargo` need to know about features? Well, the easy answer is: for dependency resolution. Features can enable optional dependencies, so when a dependency feature is used it might influence the dependency resolution. Our initial thought was that we could at least drop all the empty feature declarations from the index file (e.g. `foo = []`), but the cargo team informed us that `cargo` relies on them being available there too, and so for backwards-compatibility reasons this is not an option. +Now you may ask, why do the package index and `cargo` need to know about features? Well, the easy answer is: for dependency resolution. Features can enable optional dependencies, so when a dependency feature is used it might influence the dependency resolution. Our initial thought was that we could at least drop all empty feature declarations from the index file (e.g. `foo = []`), but the cargo team informed us that `cargo` relies on them being available there too, and so for backwards-compatibility reasons this is not an option. On the bright side, most Rust users are on cargo versions these days that use the sparse package index by default, which only downloads index files for packages actually being used. In other words: only users of this icon crate need to pay the price for downloading all the metadata. On the flipside, this means users who are still using the git-based index are all paying for this one crate using 23,000 features. So, where do we go from here? 🤔 -While we believe that supporting such high numbers of features is conceptually a valid request, with the current implementation details in crates.io and cargo we can not support this. After analyzing all of these downstream effects from a single crate having that many features, we realized we need some form of restriction on crates.io to keep the system from falling apart. +While we believe that supporting such a high number of features is conceptually a valid request, with the current implementation details in crates.io and cargo we cannot support this. After analyzing all of these downstream effects from a single crate having that many features, we realized we need some form of restriction on crates.io to keep the system from falling apart. Now comes the important part: **on 2023-10-16 the crates.io team deployed a change limiting the number of features a crate can have to 300.** From e8c44bdda0922523512fa79852f195e99c81d19a Mon Sep 17 00:00:00 2001 From: Tobias Bieniek Date: Thu, 26 Oct 2023 12:25:02 +0200 Subject: [PATCH 06/10] Apply suggestions from code review Co-authored-by: Eric Huss --- posts/2023-10-24-broken-badges-and-23k-keywords.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/posts/2023-10-24-broken-badges-and-23k-keywords.md b/posts/2023-10-24-broken-badges-and-23k-keywords.md index cd59ba680..8e888b255 100644 --- a/posts/2023-10-24-broken-badges-and-23k-keywords.md +++ b/posts/2023-10-24-broken-badges-and-23k-keywords.md @@ -5,7 +5,7 @@ author: Tobias Bieniek team: the crates.io team --- -Around mid-October of 2023 we, the crates.io team, were [notified](https://github.com/rust-lang/crates.io/issues/7269) by one of our users that a [shields.io](https://shields.io) badge for their crate stopped working. The issue reporter was kind enough to already debug the problem and figured out that the API request that shields.io sends to crates.io was most likely the problem. Here is a quote from the original issue: +Around mid-October of 2023 the crates.io team was [notified](https://github.com/rust-lang/crates.io/issues/7269) by one of our users that a [shields.io](https://shields.io) badge for their crate stopped working. The issue reporter was kind enough to already debug the problem and figured out that the API request that shields.io sends to crates.io was most likely the problem. Here is a quote from the original issue: > This crate makes heavy use of feature flags which bloats the response payload of the API. From b82b1d3e57917e1655f4e0db2bff08582c932124 Mon Sep 17 00:00:00 2001 From: Tobias Bieniek Date: Thu, 26 Oct 2023 12:26:25 +0200 Subject: [PATCH 07/10] Update posts/2023-10-24-broken-badges-and-23k-keywords.md --- posts/2023-10-24-broken-badges-and-23k-keywords.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/posts/2023-10-24-broken-badges-and-23k-keywords.md b/posts/2023-10-24-broken-badges-and-23k-keywords.md index 8e888b255..02bc8d243 100644 --- a/posts/2023-10-24-broken-badges-and-23k-keywords.md +++ b/posts/2023-10-24-broken-badges-and-23k-keywords.md @@ -21,7 +21,7 @@ The first problem that was already identified by the crate author: the API respo The next problem is that the [index file](https://index.crates.io/ic/on/icondata) for this crate is also getting large. With 9 published versions it already contains 11 MB of data. And just like the crates.io API, there is currently no pagination built into the package index file format. -Now you may ask, why do the package index and `cargo` need to know about features? Well, the easy answer is: for dependency resolution. Features can enable optional dependencies, so when a dependency feature is used it might influence the dependency resolution. Our initial thought was that we could at least drop all empty feature declarations from the index file (e.g. `foo = []`), but the cargo team informed us that `cargo` relies on them being available there too, and so for backwards-compatibility reasons this is not an option. +Now you may ask, why do the package index and cargo need to know about features? Well, the easy answer is: for dependency resolution. Features can enable optional dependencies, so when a dependency feature is used it might influence the dependency resolution. Our initial thought was that we could at least drop all empty feature declarations from the index file (e.g. `foo = []`), but the cargo team informed us that cargo relies on them being available there too, and so for backwards-compatibility reasons this is not an option. On the bright side, most Rust users are on cargo versions these days that use the sparse package index by default, which only downloads index files for packages actually being used. In other words: only users of this icon crate need to pay the price for downloading all the metadata. On the flipside, this means users who are still using the git-based index are all paying for this one crate using 23,000 features. From d1b2a99b794f89dc6705a408689fbfd02bc434cf Mon Sep 17 00:00:00 2001 From: Tobias Bieniek Date: Thu, 26 Oct 2023 12:27:41 +0200 Subject: [PATCH 08/10] Update date --- ...k-keywords.md => 2023-10-26-broken-badges-and-23k-keywords.md} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename posts/{2023-10-24-broken-badges-and-23k-keywords.md => 2023-10-26-broken-badges-and-23k-keywords.md} (100%) diff --git a/posts/2023-10-24-broken-badges-and-23k-keywords.md b/posts/2023-10-26-broken-badges-and-23k-keywords.md similarity index 100% rename from posts/2023-10-24-broken-badges-and-23k-keywords.md rename to posts/2023-10-26-broken-badges-and-23k-keywords.md From 1ab17dd680aacb9537677fd073299eafa0006aa6 Mon Sep 17 00:00:00 2001 From: Tobias Bieniek Date: Thu, 26 Oct 2023 12:51:33 +0200 Subject: [PATCH 09/10] Update posts/2023-10-26-broken-badges-and-23k-keywords.md --- posts/2023-10-26-broken-badges-and-23k-keywords.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/posts/2023-10-26-broken-badges-and-23k-keywords.md b/posts/2023-10-26-broken-badges-and-23k-keywords.md index 02bc8d243..ca4136dc5 100644 --- a/posts/2023-10-26-broken-badges-and-23k-keywords.md +++ b/posts/2023-10-26-broken-badges-and-23k-keywords.md @@ -29,7 +29,7 @@ So, where do we go from here? 🤔 While we believe that supporting such a high number of features is conceptually a valid request, with the current implementation details in crates.io and cargo we cannot support this. After analyzing all of these downstream effects from a single crate having that many features, we realized we need some form of restriction on crates.io to keep the system from falling apart. -Now comes the important part: **on 2023-10-16 the crates.io team deployed a change limiting the number of features a crate can have to 300.** +Now comes the important part: **on 2023-10-16 the crates.io team deployed a change limiting the number of features a crate can have to 300 for any new crates/versions being published.** … for now, or at least until we have found solutions for the above problems. From ce93106e18c8eb69667f780d2c099a15397c3021 Mon Sep 17 00:00:00 2001 From: Tobias Bieniek Date: Thu, 26 Oct 2023 13:26:59 +0200 Subject: [PATCH 10/10] Update posts/2023-10-26-broken-badges-and-23k-keywords.md --- posts/2023-10-26-broken-badges-and-23k-keywords.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/posts/2023-10-26-broken-badges-and-23k-keywords.md b/posts/2023-10-26-broken-badges-and-23k-keywords.md index ca4136dc5..0e21d2ac9 100644 --- a/posts/2023-10-26-broken-badges-and-23k-keywords.md +++ b/posts/2023-10-26-broken-badges-and-23k-keywords.md @@ -7,7 +7,7 @@ team: the crates.io team Around mid-October of 2023 the crates.io team was [notified](https://github.com/rust-lang/crates.io/issues/7269) by one of our users that a [shields.io](https://shields.io) badge for their crate stopped working. The issue reporter was kind enough to already debug the problem and figured out that the API request that shields.io sends to crates.io was most likely the problem. Here is a quote from the original issue: -> This crate makes heavy use of feature flags which bloats the response payload of the API. +> This crate makes heavy use of feature flags which bloat the response payload of the API. Apparently the API response for this specific crate had broken the 20 MB mark and shields.io wasn't particularly happy with this. Interestingly, this crate only had 9 versions published at this point in time. But how do you get to 20 MB with only 9 published versions?