Diving Deep: implied bounds and variance #25860
while Rust is a great language and tends to be sound in most cases, its complex trait system together with its fairly organic growth results in some unsoundness and unintended behavior. fixing or even just understanding these can often be non-trivial, which is why I am looking through them to summarize the issues for myself and discuss some potential fixes. any unsoundness discussed here is pretty much impossible to trigger unintentionally and were mostly found by explicitly searching for themThe first issue I am taking a look at is probably the most well known unsound issue of Rust.
EDIT: After publishing this post and talking to people about it, a potentially better
introduction to this bug would be something like this code example. My understanding of the issue has significantly changed since writing this post and I now strongly believe that we should add where
-clauses to binders.
Let’s start with a slight modification of the example by @arielb1:
// because of `&'out &'input ()` in the signature, `foo` has an
// implied bound `'input: 'out`
//
// In other words, in `foo` we assume 'out to be shorter than 'input
fn foo<'out, 'input, T>(_dummy: &'out &'input (), value: &'input T) -> &'out T {
value
}
fn bad<'short, T>(x: &'short T) -> &'static T {
// instantiate `foo`
let foo1: for<'out, 'input> fn(&'out &'input (), &'input T) -> &'out T = foo;
// subtyping: instantiate 'input as 'short
let foo2: for<'out> fn(&'out &'short (), &'short T) -> &'out T = foo1;
// subtyping: function arguments are contravariant,
// go from 'short to 'static, but only for one of the inputs
let foo3: for<'out> fn(&'out &'static (), &'short T) -> &'out T = foo2;
// subtyping: instantiate 'out as 'static
let foo4: fn(&'static &'static (), &'short T) -> &'static T = foo3;
// boom!
foo4(&&(), x)
}
The function bad
now allows us to arbitrarily transmute lifetimes.
To show that this is unsound, we can use the following safe code which
results in a segmentation fault:
fn main() {
let mut outer: Option<&'static u32> = Some(&3);
// `reference` only borrows from `outer` inside of the `match`,
// but we extend its lifetime to 'static here
let static_ref: &'static &'static u32 = match outer {
Some(ref reference) => bad(reference),
None => unreachable!(),
};
// the inner reference of `static_ref` is now `0`
outer = None;
// dereference the inner reference of `static_ref`
println!("{}", static_ref);
}
In foo
, we use the implied 'input: 'out
bound to shrink the lifetime of value
.
Inside of bad
this implied bound only remains for the &'out &'input ()
input of the
function item. This means that foo1
to foo4
have weaker implied bounds than foo
.
It still doesn’t seem too clear to me what is at fault here, so let’s go through a bunch of possible culprits here.
Removing contravariance
Contravariance - the ability to use an fn(&'a ())
as an fn(&'static ())
- is sound by
itself, but rarely useful and was assumed to close this soundness issue. See this comment for example.
This is not true however. Even without contravariance, covariance in the return type is enough to cause unsoundness:
// we again have an implied `'input: 'out` bound, this time because of the return type
fn foo<'out, 'input, T>(_dummy: &'out (), value: &'input T) -> (&'out &'input (), &'out T) {
(&&(), value)
}
fn bad<'short, T>(x: &'short T) -> &'static T {
// instantiate `foo`
let foo1: for<'out, 'input> fn(&'out (), &'input T) -> (&'out &'input (), &'out T) = foo;
// instantiate 'out as 'static
let foo2: for<'input> fn(&'static (), &'input T) -> (&'static &'input (), &'static T) = foo1;
// function return types are covariant,
// go from 'static to 'input
let foo3: for<'input> fn(&'static (), &'input T) -> (&'input &'input (), &'static T) = foo2;
// instantiate 'input as 'short
let foo4: fn(&'static (), &'short T) -> (&'short &'short (), &'static T) = foo3;
// boom!
foo4(&(), x).1
}
Add where
bounds to binders
Explored by @nikomatsakis in the same comment as above, another solution would be
to add where
-bounds to binders, so instead of being of the type
for<'out, 'input> fn(&'out (), &'input T) -> (&'out &'input (), &'out T)
, the function
foo
has the type for<'out, 'input> where<'input: 'out> fn(&'out (), &'input T) -> (&'out &'input (), &'out T)
1.
Let’s now attempt the above example except that the function item has a where<'input: 'out>
bound
which we can’t discard:
fn bad<'short, T>(x: &'short T) -> &'static T {
// instantiate `foo` with the relevant `where` bound
let foo1: for<'out, 'input> where<'input: 'out> fn(&'out &'input (), &'input T) -> &'out T = foo;
// instantiate 'input as 'short, also substituting 'input in the `where` bound
let foo2: for<'out> where<'short: 'out> fn(&'out &'short (), &'short T) -> &'out T = foo1;
// function arguments are contravariant,
// go from 'short to 'static, but only for one of the inputs
//
// depending on our implementation, this would either duplicate the `where` bound
// or actually just error here, because we can't split 'short into 'short and 'static
//
// while I assume that "splitting" lifetimes in `where` bounds will error
// let's assume that we can split 'short into two separate lifetimes
let foo3: for<'out> where<'short: 'out, 'static: 'out> fn(&'out &'static (), &'short T) -> &'out T = foo2;
// not necessary, but discard the `'static: 'out` bound, as it is trivially true and makes
// this more difficult to read
let foo4: for<'out> where<'short: 'out> fn(&'out &'static (), &'short T) -> &'out T = foo3;
// try to instantiate 'out as 'static, this means that we have to prove
// 'short: 'static which does not hold
let foo5: fn(&'static &'static (), &'short T) -> &'static T = foo4;
// all good, `foo5` is not a subtype of `foo4`
foo5(&&(), x)
}
This shows that adding where
bounds to binders would solve this soundness issue.
It does have a fairly high cost, as reading types like for<'out, 'input> where<'input: 'out, T: 'input> fn(&'out &'input (), &'input T) -> &'out T
is pretty difficult and it also isn’t obvious how this interacts with lifetime elision. It might be possible to even keep these where
bounds internal to the compiler, as mentioned here. I don’t personally like that idea,
as I don’t see how this approach won’t break at function boundaries.
Implementing this without a noticeable performance impact or without missing some suble edge cases is definitely also a challenge.
Make lifetimes mentioned in implied bounds early-bound
You might have tried to explicitly write the 'input: 'out
bound of foo
while looking at this issue and have already
noticed that it then doesn’t compile, therefore “fixing” this soundness issue.
// note the explicit `'input: 'out` bound
fn foo<'out, 'input, T>(_dummy: &'out &'input (), value: &'input T) -> &'out T
where
'input: 'out
{
value
}
fn bad<'short, T>(x: &'short T) -> &'static T {
// try to instantiate `foo`
//
// this now errors with "mismatched types"
//
// expected fn pointer `for<'out, 'input> fn(&'out &'input (), &'input T) -> &'out T`
// found fn pointer `fn(&&(), &T) -> &T`
let foo1: for<'out, 'input> fn(&'out &'input (), &'input T) -> &'out T = foo;
// ...
todo!()
}
By adding an explicit bound mentioning both 'out
and 'input
, these two lifetimes are now
considered early-bound2. This means that you have to instantiate both of these lifetimes
at the same time you also instantiate type and const parameters.
When instantiating 'out
and 'input
at the same time we need to instantiate them to two
new lifetimes 'short
and 'long
with 'long: 'short
, ending up with the type
fn(&'short &'long (), &'long T) -> &'short T
.
With this, we can’t ever go to a type fn(_, &'a T) -> &'b T
without 'a: 'b
, as subtyping can only
extend the lifetimes for the inputs or shrink the lifetimes of the output, which both maintain
this bound.
This means that by changing lifetimes mentioned in implied bounds to be early-bound we fix this bug. While I assume the breakage of this to be acceptable, we won’t know that for sure until we try.
Conclusion
With my current understanding, I personally would like to see some experimentation with forcing
lifetimes mentioned in implied bounds to be early-bound and only consider adding where
bounds to binders
if the breakage from this change is too great.
-
it would also have
T: 'input
andT: 'out
bounds. we don’t care about these for now ↩ -
for more about early-bound lifetimes, the best source is probably the rustc-dev-guide for now ↩