Skip to content

Define unspecified behavior #214

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions reference/src/glossary.md
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,15 @@ For unsafe code, however, the burden is still on the programmer.

Also see: [Soundness][soundness].

#### Unspecified behavior
[unspecified]: #unspecified

*Unspecified behavior* is not an error condition in the abstract machine, but beyond that, the Rust language provides no other guarantees about what behavior these programs have.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an odd introduction... the first thing you say is what this not is.

Also, we are moving towards a more general "assumptions made by compiler" def.n for UB.

Proposal:

Unspecified behavior is behavior of the Rust Abstract Machine that the Rust language provides no guarantee for. Unspecified behavior always comes with a set of behaviors that the implementation can pick from.

The latter part is important. I don't think "anything but the error state" is a useful spec. And for your example of field offsets, there is such a set: In https://github.com/rust-lang/unsafe-code-guidelines/blob/master/reference/src/layout/structs-and-tuples.md we define what the dregrees of freedom are here for the compiler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an odd introduction... the first thing you say is what this not is.

The only thing we guarantee about unspecified behavior is that it is not an error in the abstract machine. I'm open of different ways of wording this guarantee.

Unspecified behavior is behavior of the Rust Abstract Machine that the Rust language provides no guarantee for.

That's incorrect, "the behavior for which the Rust Abstract Machine provides no guarantees for" is undefined behavior. For unspecified behavior we do provide some guarantees, the most important one being that unspecified behavior is not undefined.

I don't think "anything but the error state" is a useful spec. And for your example of field offsets, there is such a set: In https://github.com/rust-lang/unsafe-code-guidelines/blob/master/reference/src/layout/structs-and-tuples.md we define what the dregrees of freedom are here for the compiler.

In that document, we define that field offset is a degree-of-freedom that the compiler has when determining struct layout, but that the compiler is "free to re-order field layout as it wishes".

Copy link
Member

@RalfJung RalfJung Oct 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"the behavior for which the Rust Abstract Machine provides no guarantees for" is undefined behavior

Not really -- there's no behavior in UB for the R-AM, it is an error state. But the wording is still not great; I agree with that part.

The only thing we guarantee about unspecified behavior is that it is not an error in the abstract machine.

That's useless. Then the behavior could still be "replace all memory contents by 0x00", making it impossible to program.

We always need to give a bound on what "unspecified behavior" can do, or we might as well declare it UB.

Copy link
Member

@RalfJung RalfJung Oct 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Next attempt:

Unspecified behavior occurs in the Rust language when the implementation is free to pick any one of a given set of possible behaviors of the R-AM. The implementation does not have to document that choice nor commit to it, and the choice it makes can vary even within the execution of a single program.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's useless. Then the behavior could still be "replace all memory contents by 0x00", making it impossible to program.

We always need to give a bound on what "unspecified behavior" can do, or we might as well declare it UB.

This definition isn't useless since it provides a guarantee over undefined behavior. Text that uses it might provide extra bounds, but I don't think this definition needs to try to provide such bounds nor require them to exist.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To elaborate on "useless", imagine we had a function fn unspec(); and said calling it is unspecified behavior. "Anything except for UB can happen". Well, one possible choice for "anything" is "oops your memory is empty now, we deallocated all of it", so the following program could have UB:

let x = 4;
unspec();
assert!(x == 4); // UB! x might not be allocated any more

We could carefully try to restrict what unspecified behavior can do in general, but that's going to be super painful. So unless there is a strong motivation for having "(almost) unbounded unspecified behavior", we better avoid it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless you allow "bounded by a potentially unspecified bound" I'm not sure how you can describe FFI.

Copy link
Contributor Author

@gnzlbg gnzlbg Oct 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But at that point, I'd rather leave the bound to whatever feature decides to be "unspecified behavior", and if that feature decides to provide no bounds, and that in your opinion makes the feature useless, then just make the case against adding such a feature to the language? If a feature provides absolutely no bounds, an RFC would really need to make a good case for it for landing such a feature in the language.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FFI isn't "unspecified behavior"... that would be rather catastrophic as one couldn't program against it.^^ Specifying FFI is hard, but we shouldn't pretend that we can properly handle it by saying "unspecified". We need to define cross-language linking to specify FFI. Without xLTO we could do it on the target/assembly level; with xLTO... TBH at that point we probably have to work on the LLVM level as I doubt we can make C programs run on the R-AM.^^

OTOH, struct field offsets are a good example for unspecified behavior precisely because we can bound the choices but do not want to commit. That's what we should use it for; not as an excuse for "sorry it's hard we don't know what to say".

Copy link
Contributor Author

@gnzlbg gnzlbg Oct 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FFI isn't "unspecified behavior"... that would be rather catastrophic as one couldn't program against it.^^

Really? What is it then ? (notice that for many platforms, the C ABI is unspecified - also notice that some platforms don't have a C ABI at all).


For example, the field offsets of `repr(Rust)` types is *unspecified*, so a Rust program that prints the offset of a field exhibits *unspecified behavior*: it prints something, but we do not make any guarantees about what it prints. What it prints can therefore change across compiler versions, depending on compiler flags, or even across compiler invocations.

Programs that make assumptions about what a particular source of *unspecified behavior* does often end up exhibiting *undefined behavior* when those assumptions are incorrect. For example, making the assumption that a field of a `repr(Rust)` struct is at a particular offset might lead a program to exhibit *undefined behavior* if that assumption is incorrect.

#### Soundness (of code / of a library)
[soundness]: #soundness-of-code--of-a-library

Expand Down