adding implicit auto-trait bounds is hard :3

adding implicit auto-trait bounds is hard

Waddup gamers, there are a lot of desirable extensions for Rust which boil down to adding a new implicit/default trait bound.

some sort of Leak bound for improved linear type support/to avoid dropping stuff that is AsyncDrop
something something Moveable to supercede Pin
the merged experimental feature(sized_hierarchy) which adds MetaSized as a super trait of Sized

The performance cost of adding an implicit trait bound which does not stop at indirection

Adding an implicit auto trait bound whose impls do not stop at pointer-indirection has a prohibitively high performance cost.

Computing Sized is generally quite cheap, even for very complex types as even the most gnarly types tend to be fairly shallow. Their fields end up quite quickly relying on Vec, Box or other kinds of references. Checking whether the following type is Sized is trivial. Checking whether GlobalCtxt is Sized is still comparatively fast, even though it’s an outlier. Even with its large number of fields the amount of types hidden behind indirection is significantly greater.

Proving a comparatively rarely checked auto trait for TyCtxt added a 7 second performance regression to our bootstrap builds, causing us to add explicit non-recursive impls.

As a more informative datapoint, @Bryanskiy actually went ahead and added support for additional default auto-traits to rustc. This is now disabled by default. Initial testing by them caused building the compiler itself to take 33 minutes instead of 7.

We’ve recently also landed support for MetaSized. This is one of the “good traits” as it stops at indirection and has the same behavior as Sized. It resulted in an up-to 10% performance impact, even though it mostly piggy-packed on the fact that it’s implied by stuff being Sized and the fact that checking Sized-ness is comparatively easy: comment.

Sized is able to avoid looking at most fields:

we don’t care about indirection, so we don’t need to recurse into e.g. Vec and Box
we only need to look at the last fields, as all other fields are required to be Sized for well-formedness. Checking whether a struct implements an auto trait needs to check all fields.

How to test the performance impact yourself

We’ve actually merged support for additional implicit trait bounds. To test the performance impact, we need to add the following to core and enable -Zexperimental-default-bounds.

#![feature(lang_items, auto_traits)]

#[lang = "default_trait1"]
auto trait Leak {}

Adding this to a leaf crate is insufficient as we need this trait to be a where-bound of all methods and items of the standard library to get the full perf impact.

We will not be able to avoid these performance regressions by improving the implementation

Even right now this is largely a linear performance hit. We are simply doing more work to prove the additional trait bounds. It is not avoidable by e.g. waiting for -Znext-solver. Enabling MetaSized resulted in the same kind of performance regressions there as with the old solver.

Annoyingly, we are only able to cache the “head” of cycles, so if we’ve got List<T>: Send, which depends on Option<Box<List<T>>: Send, which depends on List<T>: Send again, we are not able to cache the result of Option<Box<List<T>>: Send as it’s not a cycle head. We only cache List<T>: Send and then have to separately compute the cycle if we start by proving Option<Box<List<T>>: Send. This is likely a significant part of the performance cost of proving auto trait bounds for large types. Caching cycle participants is not possible for two reasons which may both be solveable, but require a lot of effort:

incremental query dependency tracking zulip
the result of evaluating cycles depends on their root, but caching must not be observable: UI test

There are other issues

There are a few other hard/blocking issues we need to resolve if we want to add additional implicit trait bounds.

If the implicit traits don’t stop at indirection, some crates are forced to increase their #[recursion_limit] to avoid errors.

Having to prove additional trait bounds triggers incorrect coroutine lifetime errors in far more cases, which is a breaking change.

These don’t seem as fundamental as the performance hit however. I expect that we can avoid these given enough time and effort.

edit 2025-09-19: actually adding `Move`

I’ve been chatting with yoshuawuyts we’ve tested the performance impact of a Move trait in #146201. Some notes from the experiment:

Move is quite nice as it is similar to Sized and also does not recur through pointers or references. The performance impact ended up being kinda acceptable with a max of 50% and there are still some avenues to significantly reduce this impact. I have no idea, but expect it’s going to be 5-10%.

Adding new default traits needs to add them as implied supertrait bounds. This encounters candidate preference issues we need to hack around by preferring impl and alias-bound candidates over where-bounds in most cases. Properly supporting this is blocked on -Znext-solver which should land soon-ish (less than a year). We’ve got the same issue for MetaSized but there it’s a bit easier as there are no user-written impls of MetaSized. cc #144064.

Updating all the blanket impls in std is really annoying. This is not a technical problem, just a general gripe with default trait bounds.

if you find any typos or errors in this post, please pm me on zulip, discord or cohost

back

impressum rss