-
Notifications
You must be signed in to change notification settings - Fork 13.3k
rustc can produce non-deterministic crates that can't be mixed together #89904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Aha! Of course after posting an issue, I get further. I found that I can shove some code into pub fn create_resolver(
sess: Lrc<Session>,
metadata_loader: Box<MetadataLoaderDyn>,
krate: &ast::Crate,
crate_name: &str,
) -> BoxedResolver {
tracing::trace!("create_resolver for {}: {:?}", crate_name, krate);
if crate_name == "cloudsim" {
let dbg = |path: &'static str| {
let data = rustc_metadata::locator::get_metadata_section(&sess.target, rustc_metadata::locator::CrateFlavor::Rlib, &std::path::Path::new(path), &*metadata_loader).unwrap();
let mut buf: Vec<u8> = format!("XXX META FOR {}:\n", path).as_bytes().to_vec();
data.list_crate_metadata(&mut buf).unwrap();
let buf = String::from_utf8(buf).unwrap();
tracing::trace!("{}", buf);
};
dbg("/nix/store/pfy5c0x8jbx44xfnsrqb5yp5sz5ahpxx-rust_parse-display-0.5.2-lib/lib/libparse_display-38cc57346a.rlib");
dbg("/tmp/foo/libparse_display-38cc57346a.rlib");
} This gives us:
The line 24 is different: so the problem is the same just one level down. I will now check rlib for that and keep going until I find the one package that has a difference somewhere else. Though, macro package looks promising honestly and it might come up right there. |
parse_display_derive are just shared libraries so we can use objdump and similar tools as usual! They do seem a bit different. The metadata seems to be a byte different but I have yet to look closer. I will attach the files in case someone wants to look directly.
|
I think I understand now that this is created at create build time and we only read the information back afterwards. So we have to dump out the metadata itself and inspect closely. |
So far I was able to confirm all but the commented out fields (and the hash, obvs) are identical:
I'll continue on Monday or whenever I'm on next. |
Rust doesn't have stable abi, so you can't directly cache builded crates, but you can try to use sccache. |
More precisely: You cannot assume crate caches are stable between versions, at the moment, thus
This is on purpose, yes. You mention it is rare: Does it happen more when trying to reuse the cache between different dev envs, such as different directory layouts? This is a detail that comes up with
A-reproducibility
|
Given exact same set of sources and all dependencies (system and rust ones), I don't expect the compiler to roll a dice and give different ABI sometimes. We rely on reproducible builds and also reproducible runtime behaviour for our use-case so sccache isn't really an option.
Right. That's fine. This ticket is only about "I'm giving same inputs to the compiler but get two incompatible outputs sometimes" and presumably this is why the reproducability label even exists.
I think that when nix performs the build, all the sources are in some chroot unpacked in some I will keep going with dumping the metadata that rustc sees about the resulting differing |
Debug printing more and more contents of the so file didn't get me far and interesting sections would usually panic the compiler so I decided to do different approach. I simply cloned the Subsequent builds of parse_display_derive like so, (command lifted out of cargo) given different crate hash each time.
I see that in Once I had the crate_hash output (and adding extra info), I found a line that gave differing result: @@ -5309,7 +5309,7 @@ kind:
Lit(Spanned
{
node:
-Str("dump",
+Str("default",
Cooked),
span:
parse-display-derive/src/lib.rs:678:10:
@@ -5337,7 +5337,7 @@ kind:
Lit(Spanned
{
node:
-Str("default",
+Str("dump",
Cooked),
span:
parse-display-derive/src/lib.rs:678:10:
@@ -5430,7 +5430,7 @@ kind:
Lit(Spanned
{
node:
-Str("regex",
+Str("new",
Cooked),
span:
parse-display-derive/src/lib.rs:678:10:
@@ -5458,7 +5458,7 @@ kind:
Lit(Spanned
{
node:
-Str("new",
+Str("regex",
Cooked),
span:
parse-display-derive/src/lib.rs:678:10:
@@ -6432,8 +6432,8 @@ local_id:
178
},
span: If we look in parse-display-derive source, we see this https://github.com/frozenlib/parse-display/blob/e105b49e715549c8c40e73a980587deb12e49983/parse-display-derive/src/lib.rs#L678 Possibly this If yes, this would explain a lot:
I'm not sure if there's much Maybe EDIT: It may still be |
I updated my rust-analyzer and was able to expand the derive macro. Definitely suspicious, especially look at those diff --git a/tmp/one_lines b/tmp/two_lines
index 2e1be23..6041042 100644
--- a/tmp/one_lines
+++ b/tmp/two_lines
@@ -6,8 +6,8 @@ parse_display_derive[c310]::{impl#21}::parse),
Some(OwnerNodes
{
hash:
-Fingerprint(11937788502066574632,
-3118914414890280417),
+Fingerprint(1511079356857906158,
+15621007578138819097),
nodes:
[Some(ParentedNode
{
@@ -633,7 +633,7 @@ parse_display_derive[c310]::{impl#21}::parse),
local_id:
6
},
-_value_0#1612,
+_value_4#1612,
None),
span:
parse-display-derive/src/lib.rs:678:10:
@@ -745,7 +745,7 @@ parse_display_derive[c310]::{impl#21}::parse),
local_id:
6
},
-_value_0#1612,
+_value_4#1612,
None),
span:
parse-display-derive/src/lib.rs:678:10:
<snip> // Recursive expansion of StructMeta! macro
// =========================================
#[automatically_derived]
impl::syn::parse::Parse for DisplayArgs {
fn parse(input: ::syn::parse::ParseStream< '_>) -> ::syn::Result<Self>{
let mut _value_0:Option<LitStr> = None;
let mut _value_1 = None;
let mut _value_2 = None;
let mut _value_3 = None;
let mut is_next = false;
let mut unnamed_index = 0;
let mut named_used = false;
while!input.is_empty(){
if is_next {
input.parse:: < ::syn::Token![,]>()? ;
if input.is_empty(){
break;
}
}is_next = true;
if let Some((index,span)) = ::structmeta::helpers::try_parse_name(input, &["dump",],false, &["style",],false, &["bound",],false,false)?{
named_used = true;
match index {
::structmeta::helpers::NameIndex::Flag(Ok(0usize)) => {
if _value_3.is_some(){
return Err(::syn::Error::new(span,"parameter `dump` speficied more than once"));
}_value_3 = Some(span);
}
::structmeta::helpers::NameIndex::NameValue(Ok(0usize)) => {
if _value_1.is_some(){
return Err(::syn::Error::new(span,"parameter `style` speficied more than once"));
}_value_1 = Some(input.parse:: <LitStr>()?);
}
::structmeta::helpers::NameIndex::NameArgs(Ok(0usize)) => {
if _value_2.is_some(){
return Err(::syn::Error::new(span,"parameter `bound` speficied more than once"));
}_value_2 = Some({
let content;
::syn::parenthesized!(content in input);
::syn::punctuated::Punctuated:: <Quotable<Bound> , ::syn::Token![,]> ::parse_terminated(&content)? .into_iter().collect()
});
}
_ => unreachable!()
}
}else {
if named_used {
return Err(input.error("cannot use unnamed parameter after named parameter"));
}match unnamed_index {
0usize => {
_value_0 = Some(input.parse:: <LitStr>()?);
}
_ => {
return Err(input.error("too many unnamed parameter"));
}
}unnamed_index+=1;
}
}Ok(Self {
format:_value_0,style:_value_1,bound:_value_2,dump:_value_3.is_some(),
})
}
} |
OK, I found it: named: HashMap<String, NamedParam<'a>>, fn named_ps(&self, f: impl Fn(&NamedParamType<'a>) -> bool) -> (Vec<&NamedParam<'a>>, bool) {
(
self.named.values().filter(|p| f(&p.ty)).collect(),
if let Some(p) = &self.rest {
f(&p.ty)
} else {
false
},
)
} this then feeds into let (flag_ps, flag_rest) = self.named_ps(|p| p.is_flag());
let (name_value_ps, name_value_rest) = self.named_ps(|p| p.is_name_value());
let (name_args_ps, name_args_rest) = self.named_ps(|p| p.is_name_args());
let mut arms_named = Vec::new();
for (index, p) in flag_ps.iter().enumerate() {
arms_named.push(p.build_arm_parse(index, ArgKind::Flag));
} and
So this crate is producing unstable output and stomping all over the crate hash! Thank you for indulging me. I'll close the ticket; I'll re-open if necessary though I doubt it. |
Currently, when user specifies `derive(StructMeta)`, this crate generates a branch of code for each of the attributes. These attributes however have so far been stored inside a `HashMap` which means the ordering of the branches is more or less random. This has no real impact on the runtime: the order of the branches doesn't change the semantics of the code. It does matter however for compile time: `rustc` will calculate a different crate hash (effectively a sort of ABI signature) if code is re-ordered. This means that any time we recompile code with `derive(StructMeta)` somewhere, we're breaking the ABI even if there were _no_ changes at all to any code. Basically, we're throwing a dice at compile time as to what order the code will be laid out in. This is very bad for reproducability and plays very poorly with any sort of binary caching. I initially thought that the reproducability issue was coming from `rustc` itself but the more I was debugging, I made my way back to this crate. You can see the original rustc issue for context [here](rust-lang/rust#89904). I think this change should be basically invisible to anyone that's been using the crate so far.
Upstream fix at frozenlib/structmeta#1 I even get the same binary now whereas before it'd change hash every time. Thanks for your time. |
We're using
nix
to build rust crates and benefit from binary caches which are usually populated by our CI. I'll assume a slightly familiarity but I can elaborate when necessary. The details are not too important but it means that the following scenario can happen:Consider a crate
C
that depends on crateB
that depends on crateA
(that depends onstd
). Now, the following sequence of events happens.A
locally. Let's call this build resultA-local
. This goes into the/nix/store
andA
is never rebuilt again (it is in immutable location) and is always reused when all the inputs (sources, dependencies) are identical.A
,B
andC
. CI is also using nix and binary cache is currently empty.A
: this has exactly the same inputs and so is given exactly same hash. Let's call itA-cache
. It pushes pushesA-cache
to some binary cache (in S3 for example).B
using theA-cache
it just built, let's say it's calledB-cache
. It pushes it to binary cache too.C
asC-cache
and pushes to binary cache. Everything is great, CI is green.C
on local machine, perhaps it is a binary crate . However, the developer first makes some changes toC
source code, maybe added some debug info. So what do we have to build?A
doesn't need building: it's already stored locally asA-local
in local/nix/store
B
is needed to buildC
: luckilyB
exists in a binary cache asB-cache
so we just download it.C
source code changed so we have to build it, we haveA-local
andB-cache
in the dependency tree now.Does the build succeed? Most of the time, yes. But sometimes, it fails if rustc produced two incompatible results for
A
: that is, ifA-local
andA-cache
differ, we're in trouble. The developer can't useA-local
withB-cache
: onlyA-cache
is usable.An error message might look like:
In my real case,
A
isparse_display
create,B
is our local crate calledtrader
andC
was another crate in the workspace that depended ontrader
. AsB
(trader) came from binary cache, it is binary-identical and we only have to focus onA
.Investigation
First, let's see what rustc says:
Looks like the hash it expects is not what it gets. Back to this later. Let's try to look for something obvious in the
.rlib
.If I compare binaries side-by-side, it looks like the two files are 1 byte misaligned in first section and have minor differences later on. I will include both files for your own inspection too.
Of course, it's very difficult to tell anything by just looking at raw binary. I tried to add extra information to
rustc
itself so that I could see during the hashing what values it's getting: if hash inputs were different, we'd see which ones and be able to go backwards from there. Looking around, I thinkcrate_hash
function is the point to change# Includes one of the default files in src/bootstrap/defaults profile = "compiler" changelog-seen = 2 [rust] channel = "stable"
Sadly, this doesn't work for ironically similar reason: building this compiler locally gives a new
std
hash which means I can't re-invoke originalrustc
command with the original.rlibs
as those depend onstd
which has the wrong hash...At this point I decided to open this ticket: how can I debug this further? At this point I am more or less getting ready to try to gut
rustc
to the point that I can callcrate_hash
myself or the rlibs directly but it seems like a lot of work and I'm not confident it's even going to work. I didn't find any tool to readrlib
files either.If we can find what the difference between the two files it, presumably we can work backwards and find why
rustc
produced two different results.Here are the files: different_rlibs.tar.gz.
local
was one I built locally andcache
is from the binary cache, built by CI machine.Meta
rustc --version --verbose
:I originally suspected that codegen-units may be the culprit: some concurrent resource getting non-deterministically modified, such as to hand out some Ids or something that made it to the code. But no, the issue occurred even with codegen-units=1.
It is difficult to verify or replicate the bug because it happens fairly rarely so it seems like looking at the rlibs I attached above is probably the fastest way to figure something out.
The text was updated successfully, but these errors were encountered: