adding implicit auto-trait bounds is hard
Waddup gamers, there are a lot of desirable extensions for Rust which boil down to adding a new implicit/default trait bound.
- some sort of
Leak
bound for improved linear type support/to avoid dropping stuff that isAsyncDrop
- something something
Moveable
to supercedePin
- the merged experimental
feature(sized_hierarchy)
which addsMetaSized
as a super trait ofSized
The performance cost of adding an implicit trait bound which does not stop at indirection
Adding an implicit auto trait bound whose impls do not stop at pointer-indirection has a prohibitively high performance cost.
Computing Sized
is generally quite cheap, even for very complex types as even the most gnarly types tend to be fairly shallow. Their fields end up quite quickly relying on Vec
, Box
or other kinds of references. Checking whether the following type is Sized
is trivial. Checking whether GlobalCtxt
is Sized
is still comparatively fast, even though it’s an outlier. Even with its large number of fields the amount of types hidden behind indirection is significantly greater.
Proving a comparatively rarely checked auto trait for TyCtxt
added a 7 second performance regression to our bootstrap builds, causing us to add explicit non-recursive impls.
As a more informative datapoint, @Bryanskiy actually went ahead and added support for additional default auto-traits to rustc. This is now disabled by default. Initial testing by them caused building the compiler itself to take 33 minutes instead of 7.
We’ve recently also landed support for MetaSized
. This is one of the “good traits” as it stops at indirection and has the same behavior as Sized
. It resulted in an up-to 10% performance impact, even though it mostly piggy-packed on the fact that it’s implied by stuff being Sized
and the fact that checking Sized
-ness is comparatively easy: comment.
Sized
is able to avoid looking at most fields:
- we don’t care about indirection, so we don’t need to recurse into e.g.
Vec
andBox
- we only need to look at the last fields, as all other fields are required to be
Sized
for well-formedness. Checking whether a struct implements an auto trait needs to check all fields.
How to test the performance impact yourself
We’ve actually merged support for additional implicit trait bounds. To test the performance impact, we need to add the following to core
and enable -Zexperimental-default-bounds
.
#![feature(lang_items, auto_traits)]
#[lang = "default_trait1"]
auto trait Leak {}
Adding this to a leaf crate is insufficient as we need this trait to be a where-bound of all methods and items of the standard library to get the full perf impact.
We will not be able to avoid these performance regressions by improving the implementation
Even right now this is largely a linear performance hit. We are simply doing more work to prove the additional trait bounds. It is not avoidable by e.g. waiting for -Znext-solver
. Enabling MetaSized
resulted in the same kind of performance regressions there as with the old solver.
Annoyingly, we are only able to cache the “head” of cycles, so if we’ve got List<T>: Send
, which depends on Option<Box<List<T>>: Send
, which depends on List<T>: Send
again, we are not able to cache the result of Option<Box<List<T>>: Send
as it’s not a cycle head. We only cache List<T>: Send
and then have to separately compute the cycle if we start by proving Option<Box<List<T>>: Send
. This is likely a significant part of the performance cost of proving auto trait bounds for large types. Caching cycle participants is not possible for two reasons which may both be solveable, but require a lot of effort:
- incremental query dependency tracking zulip
- the result of evaluating cycles depends on their root, but caching must not be observable: UI test
There are other issues
There are a few other hard/blocking issues we need to resolve if we want to add additional implicit trait bounds.
If the implicit traits don’t stop at indirection, some crates are forced to increase their #[recursion_limit]
to avoid errors.
Having to prove additional trait bounds triggers incorrect coroutine lifetime errors in far more cases, which is a breaking change.
These don’t seem as fundamental as the performance hit however. I expect that we can avoid these given enough time and effort.
if you find any typos or errors in this post, please pm me on zulip, discord or cohost