A real interesting read as someone who spends a bit of time with Elixir. Wasn't aware of the atomic and counter Erlang features that break isolation.
Though they do say that race conditions are purely mitigated by discipline at design time, but then mention race conditions found via static analysis:
> Maria Christakis and Konstantinos Sagonas built a static race detector for Erlang and integrated it into Dialyzer, Erlang’s standard static analysis tool. They ran it against OTP’s own libraries, which are heavily tested and widely deployed.
> They found previously unknown race conditions. Not in obscure corners of the codebase. Not in exotic edge cases. In the kind of code that every Erlang application depends on, code that had been running in production for years.
I imagine that the 4th issue of protocol violation could possibly be mitigated by a typesafe abstracted language like Gleam (or Elixir when types are fully implemented)
WJW 1 hours ago [-]
> They found previously unknown race conditions. Not in obscure corners of the codebase. Not in exotic edge cases. In the kind of code that every Erlang application depends on, code that had been running in production for years.
If these race conditions are in code that has been in production for years and yet the race conditions are "previously unknown", that does suggest to me that it is in practice quite hard to trigger these race conditions. Bugs that happen regularly in prod (and maybe I'm biased, but especially bugs that happen to erlang systems in prod) tend to get fixed.
aeonfox 51 minutes ago [-]
True. And that the subtle bugs were then picked up by static analysis makes the safety proposition of Erlang even better.
> Bugs that happen regularly in prod
It depends on how regular and reproducible they are. Timing bugs are notoriously difficult to pin down. Pair that with let-it-crash philosophy, and it's maybe not worth tracking down. OTOH, Erlang has been used for critical systems for a very long time – plenty long enough for such bugs to be tracked down if they posed real problems in practice.
kamma4434 24 minutes ago [-]
The 4th issue is a feature- it’s what allows zero downtime hot updates.
lukeasrodgers 25 minutes ago [-]
I don’t have much experience with pony but it seems like it addresses the core concerns in this article by design https://www.ponylang.io/discover/why-pony/. I wish it were more popular.
cyberpunk 2 hours ago [-]
Eh maybe. I work on a big, mature, production erlang system which has millions of processes per cluster and while the author is right in theory, these are quite extreme edge cases and i’ve never tripped over them.
Sure, if you design a shit system that depends on ETS for shares state there are dangers, so maybe don’t do that?
I’d still rather be writing this system in erlang than in another language, where the footguns are bigger.
anonymous_user9 1 hours ago [-]
This seems interesting, but the sheer density of LLM-isms make it hard to get through.
boxed 4 minutes ago [-]
I think at this point comments like this are equivalent to saying "I didn't like this article, because it's written in too good English".
Rendered at 11:37:55 GMT+0000 (Coordinated Universal Time) with Vercel.
Though they do say that race conditions are purely mitigated by discipline at design time, but then mention race conditions found via static analysis:
> Maria Christakis and Konstantinos Sagonas built a static race detector for Erlang and integrated it into Dialyzer, Erlang’s standard static analysis tool. They ran it against OTP’s own libraries, which are heavily tested and widely deployed.
> They found previously unknown race conditions. Not in obscure corners of the codebase. Not in exotic edge cases. In the kind of code that every Erlang application depends on, code that had been running in production for years.
I imagine that the 4th issue of protocol violation could possibly be mitigated by a typesafe abstracted language like Gleam (or Elixir when types are fully implemented)
If these race conditions are in code that has been in production for years and yet the race conditions are "previously unknown", that does suggest to me that it is in practice quite hard to trigger these race conditions. Bugs that happen regularly in prod (and maybe I'm biased, but especially bugs that happen to erlang systems in prod) tend to get fixed.
> Bugs that happen regularly in prod
It depends on how regular and reproducible they are. Timing bugs are notoriously difficult to pin down. Pair that with let-it-crash philosophy, and it's maybe not worth tracking down. OTOH, Erlang has been used for critical systems for a very long time – plenty long enough for such bugs to be tracked down if they posed real problems in practice.
Sure, if you design a shit system that depends on ETS for shares state there are dangers, so maybe don’t do that?
I’d still rather be writing this system in erlang than in another language, where the footguns are bigger.