> I propose a new kind of type inference algorithm that always prioritises the type unifications the end-user is most likely to care about, independent from the types order in source code.
> If we could instead unify types in the order the end-user deems most important
I think the problem with this is that the desirable priority is context and programmer-dependent, and no real reason is given why ordering expressions is unacceptable. Also this approach apparently still isn't independent from the order of expressions, it just imposes additional structure on top of that with multiple inference passes, making the whole affair more rigid and more complicated. I can only guess this makes reasoning about higher order polymorphism much harder.
Having it ranked on order alone is something which is simple, easy to internalize, easy to reason about, and gives the utmost control to the programmer. For instance with the example given, this is the intuitive way I'd have constructed that function without even really thinking about it:
fn example(x) -> Point[int] {
let p = Point(x, x);
log(["origin: ", x]);
return p;
}
netting the "ideal" error.
I think the problem here stems from an expectation that type inference is kind of a magic wand for getting something like dynamic typing. So then when no thought is put into the actual structure of the program WRT the type inference, what's instead gotten is an icky system that isn't Just Doing The Thing. Vibes-based-typing? If that's the goal, I wonder if it might be better served by fuzzy inference based on multiple-passes generating potential conclusions with various confidence scores.
edmundgoodman 3 days ago [-]
This is really cool!! It looks interesting for making errors in complex type systems easier to debug, but the quadratic performance of the title sounds a bit worrying for productive compiler use — and imo the benchmarks don’t really mean anything without a point of reference to a traditional unification implementation.
If this system only provides benefits in the type-error path of the compiler I wonder if a traditional single-pass unification could be used for speed on the common path of code compiling without type errors, then when unification fails this slower multi-pass approach could be run on-demand to give better error reporting. This could lazily avoid the cost of the approach in most cases, and the cases in which it would be used are less latency critical anyway.
Also, I think there is a typo in one of the code blocks: ‘2 should be unified into (string, string) not just string afaict
simvux 2 days ago [-]
Working around the performance concerns by only running this algorithm after a faster algorithm found an error is an interesting idea. I think it could work, as things that produce errors would in a real compiler be marked as "poisoned" to be ignored in further analysis and IR lowerings. Thus the two algorithms disagreeing on a type wouldn't cause a noticable difference in the end result.
Comparative benchmarks are tricky. I considered making a simpler single-pass inference branch which is based around the same data structures to create a more one-to-one comparison. But this algorithm is rather different so porting it wasn't very straight forward. I'm currently integrating this into my real compiler so from there it'll be easier to estimate the real-world performance impact of this system.
Typo has been fixed, thanks!
edmundgoodman 2 days ago [-]
Couldn’t the comparison (and also the fast path I guess) just be putting all the inference passes into a single big pass, which then avoids the quadratic number of re-applications of passes? Looks like unification is otherwise the same?
munificent 2 hours ago [-]
This is what I was thinking too. Just do a single unification pass, but track the provenance of each type assignment. If an error is needed, use the provenances of the colliding unifications to decide which context locations to prioritize.
Rendered at 20:08:05 GMT+0000 (Coordinated Universal Time) with Vercel.
> If we could instead unify types in the order the end-user deems most important
I think the problem with this is that the desirable priority is context and programmer-dependent, and no real reason is given why ordering expressions is unacceptable. Also this approach apparently still isn't independent from the order of expressions, it just imposes additional structure on top of that with multiple inference passes, making the whole affair more rigid and more complicated. I can only guess this makes reasoning about higher order polymorphism much harder.
Having it ranked on order alone is something which is simple, easy to internalize, easy to reason about, and gives the utmost control to the programmer. For instance with the example given, this is the intuitive way I'd have constructed that function without even really thinking about it:
netting the "ideal" error.I think the problem here stems from an expectation that type inference is kind of a magic wand for getting something like dynamic typing. So then when no thought is put into the actual structure of the program WRT the type inference, what's instead gotten is an icky system that isn't Just Doing The Thing. Vibes-based-typing? If that's the goal, I wonder if it might be better served by fuzzy inference based on multiple-passes generating potential conclusions with various confidence scores.
If this system only provides benefits in the type-error path of the compiler I wonder if a traditional single-pass unification could be used for speed on the common path of code compiling without type errors, then when unification fails this slower multi-pass approach could be run on-demand to give better error reporting. This could lazily avoid the cost of the approach in most cases, and the cases in which it would be used are less latency critical anyway.
Also, I think there is a typo in one of the code blocks: ‘2 should be unified into (string, string) not just string afaict
Comparative benchmarks are tricky. I considered making a simpler single-pass inference branch which is based around the same data structures to create a more one-to-one comparison. But this algorithm is rather different so porting it wasn't very straight forward. I'm currently integrating this into my real compiler so from there it'll be easier to estimate the real-world performance impact of this system.
Typo has been fixed, thanks!