NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
Pipelining in psql (PostgreSQL 18) (postgresql.verite.pro)
vlovich123 41 minutes ago [-]
I’m pretty sure the reasoning and conclusion is way off on explaining the speed up:

> The network is better utilized because successive queries can be grouped in the same network packets, resulting in less packets overall.

> the network packets are like 50 seater buses that ride with only one passenger.

The performance improvement is not likely to be because you’re sending larger packets, since most queries transfer very little data and the benchmark the conclusion is drawn from definitely is transferring near 0 data. The speed up comes from removing waiting on a round trip ack of a batch from executing subsequent queries; the number of network packets is irrelevant.

WilcoKruijer 3 hours ago [-]
I feel pipelines (or batches) are slept upon. So many applications use interactive transactions to ‘batch’ multiple queries, waiting for the result of each individual query. Network roundtrip is the biggest contributor to latency in most applications, and this makes it so much worse. Most Postgres drivers don’t even support batching, at least in the JavaScript world.

In many cases it would be good to forego interactive transactions and instead execute all read-only queries at once, and another write batch after doing processing on the obtained data. That way, the amount of roundtrips is bounded. There are some complications of course, like dealing with concurrency becomes more complicated. I’m currently prototyping a library exploring these ideas.

tmountain 3 hours ago [-]
Batching in general is slept upon. So many queue systems support batch injection, and I have seen countless cases where a poorly performing system is “fixed” simply by moving away from incremental injection. This stuff is usually on page two of the docs, which explains why it’s so overlooked…
mpweiher 47 minutes ago [-]
My guess is that this is because our default way of expressing code execution is the procedure call, meaning the default unit of code that we can later is the procedure, which needs to execute synchronously. That's what our programming languages support directly, and that's just how "things are done".

Everything else both feels weird and also truly is awkward to express because our programming languages don't really allow us to express it well. And usually by the time we figure out that we need a more reified, batch-oriented mechanism. (the one on page 2) it is too late, the procedural assumptions have been deeply baked into the code we've written so far.

See Can programmers escape the gentle tyranny of call/return? by yours truly.

https://www.hpi.uni-potsdam.de/hirschfeld/publications/media...

See also: https://news.ycombinator.com/item?id=45367519

WilcoKruijer 19 minutes ago [-]
This analysis makes sense to me, but at the same time: we’re already switching between procedural and declarative when switching from [mainstream language] to SQL. This impedance mismatch (or awkwardness) is already there, might as well embrace it.
ako 2 hours ago [-]
I would expect most drivers to support (anonymous) stored procedures so you can batch/pipeline multiple queries into one statement to be executed by the database. Probably more a problem of developers not knowing how to use databases properly, not so much a limitation of technology.
mxey 1 hours ago [-]
You don’t even need driver support, you can use https://www.postgresql.org/docs/current/sql-do.html
mxey 1 hours ago [-]
I have started to use batching with the Go pgx driver for simple transactions of multiple inserts. Since a batch is automatically a transaction, it’s actually fewer lines of code.
theThree 1 hours ago [-]
I developed a JS pg client that use pipeline mode by default: https://github.com/stanNthe5/pgline
tlocke 2 hours ago [-]
I must confess, the Python driver pg8000 which I maintain doesn't support pipeline mode. I didn't realise it existed until now, and nobody has ever asked for it. I've created an issue for it https://codeberg.org/tlocke/pg8000/issues/174
codesnik 2 hours ago [-]
activerecord in Rails has async mode, which allows you to queue several requests and read results later. But those will go through the connection pool, and will be executed in separate connections, separate transactions, and separate PostgreSQL server processes. I wonder if using pipelining instead, on a driver level (app code would be the same), would be a better approach in general, or at least easier on db instance.
codesnik 2 hours ago [-]
ah, of course it have been discussed already https://discuss.rubyonrails.org/t/proposal-adding-postgres-p...
byroot 1 hours ago [-]
Yes, the need isn't exactly the same. `load_async` use case if for known slow-ish queries, hence for which you want actual parallelization on the server.

Since that discussion on the forum, I talked more about pipelining with some other core devs, and that may happen in some form or another in the future.

The main limiting factor is that most of the big Rails contributors work with MySQL, not Postgres, and MySQL doesn't really have proper pipelining support.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 10:33:21 GMT+0000 (Coordinated Universal Time) with Vercel.