NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
Fast-Servers (geocar.sdf1.org)
epicprogrammer 51 minutes ago [-]
It’s an interesting throwback to SEDA, but physically passing file descriptors between different cores as a connection changes state is usually a performance killer on modern hardware. While it sounds elegant on a whiteboard to have a dedicated 'accept' core and a 'read' core, you end up trading a slightly simpler state machine for massive L1/L2 cache thrashing. Every time you hand off that connection, you immediately invalidate the buffers and TCP state you just built up. There’s a reason the industry largely settled on shared-nothing architectures like NGINX having a single pinned thread handle the entire lifecycle of a request keeps all that data strictly local to the CPU cache. When you're trying to scale, respecting data locality almost always beats pipeline cleanliness.
toast0 53 seconds ago [-]
You could presumably have an acceptor thread per core, which passes the fds to core alligned next thread, etc.

That would get you the code simplicity benefits the article suggests, while keeping the socket bound to a single core, which is definitely needed.

Depending on if you actually need to share anything, you could do process per core, thread per loop, and there's no inter processor communication at all.

bee_rider 1 hours ago [-]
> One thread per core, pinned (affinity) to separate CPUs, each with their own epoll/kqueue fd

> Each major state transition (accept, reader) is handled by a separate thread, and transitioning one client from one state to another involves passing the file descriptor to the epoll/kqueue fd of the other thread.

So this seems like a little pipeline that all of the requests go through, right? For somebody who doesn’t do server stuff, is there a general idea of how many stages a typical server might be able to implement? And does it create a load-balancing problem? I’d expect some stages to be quite cheap…

marcosdumay 40 minutes ago [-]
> For somebody who doesn’t do server stuff, is there a general idea of how many stages a typical server might be able to implement?

On the HTTP server from the article, what I understood is that those 2 you are seeing are the ones you have. Or maybe 3, if disposing of things is slow.

I'm not sure what I prefer. On one hand, there's some expensive coordination for passing those file descriptors around. On the other hand, having some separate code bother with creating and closing the connections make it easier to focus on the actual performance issues where they appear, and create opportunity to dispatch work smartly.

Of course, you can go all the way in and make a green threads server where every bit of IO puts the work back on the queue. But you would use a single queue then, and dispatch the code that works on it. So you get more branching, but less coordination.

luizfelberti 1 hours ago [-]
A bit dated in the sense that for Linux you'd probably use io_uring nowadays, but otherwise it's a timeless design

Still, I'm conflicted on whether separating stages per thread (accept on one thread and the client loop in another) is a good idea. It sounds like the gains would be minimal or non-existent even in ideal circumstances, and on some workloads where there's not a lot of clients or connection churn it would waste an entire core for handling a low-volume event.

I'm open to contrarian opinions on this though, maybe I'm not seeing soemthing...

raggi 56 minutes ago [-]
It’s not a good idea and that’s where I’d really start with the dated commentary here rather than focusing on the polling mechanism. It depends on the application but if the buffers are large (>=64kb) such as a common TCP workload then uring won’t necessarily help that much. You’ll gain a lot of scalability regardless of polling mechanism by making sure you can utilize rss and xss optimizations.
jfindley 49 minutes ago [-]
io_uring is in a curious place. Yes it does offer significant performance advantages, but it continues to be such a consistent source of bugs - many with serious security implications - that it's questionable if it's really worth using.

I do agree that it's a bit dated and today you'd do other things (notably SO_REUSEPORT), just feel that io_uring is a questionable example.

ciconia 41 minutes ago [-]
> continues to be such a consistent source of bugs - many with serious security implications... just feel that io_uring is a questionable example.

Are you saying this as someone with experience, or is it just a feeling? Please give examples of recent bugs in io_uring that have security implications.

dspillett 26 minutes ago [-]
Not OP, and I'm no expert in the area at all, but I _do_ have a feeling that there have been quite a few such issues posted here and elsewhere that I read in the last year.

https://www.cve.org/CVERecord/SearchResults?query=io_uring seems to back that up. Only one relevant CVE listed there for 2026 so far, for more than two per month on average in 2025. Caveat: I've not looked into the severity and ease of exploit for any of those issues listed.

eklavya 1 hours ago [-]
It is not a good idea, especially with the new chiplet/CCX processors.
kogus 2 hours ago [-]
Slightly tangential, but why is the first diagram duplicated at .1 opacity?
tecleandor 1 hours ago [-]
That plus the ellipsis makes me thing that it means the additional threads that would open for next connections...
kogus 1 hours ago [-]
Ah, that makes sense.
ratrocket 2 hours ago [-]
discussed in 2016: https://news.ycombinator.com/item?id=10872209 (53 comments)
rot13maxi 23 minutes ago [-]
i havent seen an sdf1.org url in a looooong time. lovely to see its still around
lmz 2 hours ago [-]
fao_ 55 minutes ago [-]
this is more or less, in some way, what Erlang does and how Erlang is so easy to scale.
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 17:02:41 GMT+0000 (Coordinated Universal Time) with Vercel.