adding implicit auto-trait bounds is hard

Waddup gamers, there are a lot of desirable extensions for Rust which boil down to adding a new implicit/default trait bound.

The performance cost of adding an implicit trait bound which does not stop at indirection

Adding an implicit auto trait bound whose impls do not stop at pointer-indirection has a prohibitively high performance cost.

Computing Sized is generally quite cheap, even for very complex types as even the most gnarly types tend to be fairly shallow. Their fields end up quite quickly relying on Vec, Box or other kinds of references. Checking whether the following type is Sized is trivial. Checking whether GlobalCtxt is Sized is still comparatively fast, even though it’s an outlier. Even with its large number of fields the amount of types hidden behind indirection is significantly greater.

Proving a comparatively rarely checked auto trait for TyCtxt added a 7 second performance regression to our bootstrap builds, causing us to add explicit non-recursive impls.

As a more informative datapoint, @Bryanskiy actually went ahead and added support for additional default auto-traits to rustc. This is now disabled by default. Initial testing by them caused building the compiler itself to take 33 minutes instead of 7.

We’ve recently also landed support for MetaSized. This is one of the “good traits” as it stops at indirection and has the same behavior as Sized. It resulted in an up-to 10% performance impact, even though it mostly piggy-packed on the fact that it’s implied by stuff being Sized and the fact that checking Sized-ness is comparatively easy: comment.

Sized is able to avoid looking at most fields:

How to test the performance impact yourself

We’ve actually merged support for additional implicit trait bounds. To test the performance impact, we need to add the following to core and enable -Zexperimental-default-bounds.

#![feature(lang_items, auto_traits)]

#[lang = "default_trait1"]
auto trait Leak {}

Adding this to a leaf crate is insufficient as we need this trait to be a where-bound of all methods and items of the standard library to get the full perf impact.

We will not be able to avoid these performance regressions by improving the implementation

Even right now this is largely a linear performance hit. We are simply doing more work to prove the additional trait bounds. It is not avoidable by e.g. waiting for -Znext-solver. Enabling MetaSized resulted in the same kind of performance regressions there as with the old solver.

Annoyingly, we are only able to cache the “head” of cycles, so if we’ve got List<T>: Send, which depends on Option<Box<List<T>>: Send, which depends on List<T>: Send again, we are not able to cache the result of Option<Box<List<T>>: Send as it’s not a cycle head. We only cache List<T>: Send and then have to separately compute the cycle if we start by proving Option<Box<List<T>>: Send. This is likely a significant part of the performance cost of proving auto trait bounds for large types. Caching cycle participants is not possible for two reasons which may both be solveable, but require a lot of effort:

There are other issues

There are a few other hard/blocking issues we need to resolve if we want to add additional implicit trait bounds.

If the implicit traits don’t stop at indirection, some crates are forced to increase their #[recursion_limit] to avoid errors.

Having to prove additional trait bounds triggers incorrect coroutine lifetime errors in far more cases, which is a breaking change.

These don’t seem as fundamental as the performance hit however. I expect that we can avoid these given enough time and effort.

if you find any typos or errors in this post, please pm me on zulip, discord or cohost

back

impressum rss