Next-generation trait solver update
For general info about the new trait solver, see this separate blog post. This blog post is not targeted at a general audience. It’s targeted at nobody to be honest. I wrote it for myself to figure out what I need to do.
The new trait solver is somewhat close to stabilization, let’s look at an exhaustive list of everything that’s necessary. The potentially biggest - but hopefully fairly small - item is anything not mentioned below; unknown unknowns. That sure is an easy way to make this document exhaustive.
nyaaa
There are still a few behavior changes and improvements that should happen before we stabilize the trait solver. We have to figure out how to make overflow errors into a hard error during monomorphization #155. @adwinwhite implemented a partial fix in #146096. This fix is insufficient, see #147770. I personally don’t really know how I feel about this approach.
My intended approach would be for certain users of the trait solver to make it clear that ambiguity due to overflow should be treated as as a hard error here. I’ve discussed this with @compiler-errors a long time ago, and doing so is apparently non-trivial and somewhat annoying. That seems to also match my perspective from reading through the more recent discussions here.
nyau
While the way we handle opaque types - e.g. return position impl Trait - is a lot cleaner and stronger than the behavior of the old solver and now mostly subsumes the behavior supported on stable, we don’t yet support calling methods on unconstrained associated types of not-yet-defined opaque types #248. Dealing with this is annoying, but so is dealing with not-yet-defined opaque types in their defining scopes in general. I think the way forward here is to extend the opaque type storage to also track these associated types and then extend the method selection hacks to these as well. More on opaque types later on.
myuw
We also just fail to compile ml-kem due to overflow errors and hangs #246. No idea what’s causing that failure yet and not too excited to find out. @jdonszelmann is currently looking into one of the hangs here and hopefully there aren’t too many distinct issues here.
rawr
One of the hardest problems in Rust is how to deal with aliases referencing bound regions, e.g. for<'a> fn(<T as Trait>::Assoc<'a>). I want to look into normalizing higher-ranked aliases during generalization. This would also fix an issue with the way we eagerly deduce closure signatures #191.
rarw
We also have performance issues. The wg-grammar benchmark is multiple times slower than with the old solver #228. That’s not good and shouldn’t be the case. A bunch of other crates on crater also have bad perf #254. There have also been a bunch of spurious regressions in the last crater run. I expect that a few of them are also caused by hangs in the trait solver.
@lqd opened crater#812 asking whether we could track the duration of each job during crater runs to uncover some significant slowdowns caused by the new solver. This would be incredibly helpful to detect more cases in which we still need to improve the performance of the new solver.
I’ve got 2 large performance improvements in mind I would like to try before stabilizing the new solver. I want to look into further search graph improvements and filter unnecessary information from query input to enable significantly more reuse there.
miuw
A performance optimization in the search graph I’ve added before unfortunately breaks otherwise desirable code #257. @ShoyuVanilla is currently looking into removing that performance optimization and hopefully other improvements mean that this hack is not actually necessary.
nyeww
We haven’t triaged all crater regressions yet. @lqd recently went through a lot of them, but there are still a few we haven’t looked at yet. We have fixed a lot of issues since the last run so I intend to rebase #133502 once most of the currently in-flight fixes have been merged. I expect that we’re then able to triage all regressions in that crater run.
myau
There are also issues which don’t necessarily require changes to the implementation itself. I want to look into the blanket impl handling of the not-yet-defined opaque type inference hack again #229. There are currently some undesirable edge cases there and while we might go ahead and stabilize the behavior as is, I want to spend time trying to come up with a better alternative.
myow
The way we handle non-defining uses of opaque types can introduce closures with non-identity generic arguments during analysis. This is not an issue in theory but may break code which did not consider that possibility when written. I should look through all uses of ty::Closure before stabilizing anything here #243.
more cat noises
Hitting the #[recursion_limit] is not a fatal error in the new trait solver. This means different crates may be able to observe different behavior due to that limit, potentially resulting in unsoundness. We should take some time to look into the way the trait solver is used to check whether any of them may be problematic #258.
Klirrr!
Finally, we need to actually document the behavior of the trait solver, writing multiple FCP proposals and one RFC. @BoxyUwU and I have spent some time a few months ago writing down everything we want to document as part of the stabilization procedure in a HackMD file. Let’s quickly summarize the most involved documents.
We need to write an RFC explaining why the new way we handle trait solver cycles is sound and desirable. I would like to do so in collaboration with @nikomatsakis. This still requires some effort to figure out how to explain the behavior and ideally we also get a testable model of my underlying reasoning here.
I also have to explain the way we handle opaque types in the new solver. Their behavior is significantly different from how it works on stable right now. It’s significantly simpler and more consistent in some ways. It also has some fairly odd hacks to maintain backwards compatability. We need to explain all of this in a way that’s approachable to the rest of T-types.