NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
CVE-2026-31431: Copy Fail vs. rootless containers (dragonsreach.it)
netheril96 1 hours ago [-]
If the goal is just preventing full root privileges, a CapabilityBoundingSet in a systemd unit will do.

However copy fail can be used in many other ways not contained by containers or the above settings. For example it can modify the /etc/ssl/certs to prepare for MitM attacks. If you have multiple containers based on the same image then one compromised CA set affects another.

est 18 minutes ago [-]
I added these

    AmbientCapabilities=CAP_NET_BIND_SERVICE
    CapabilityBoundingSet=CAP_NET_BIND_SERVICE
    NoNewPrivileges=yes
to my .service. Is it good enough?
amluto 5 hours ago [-]
Sigh.

1. I would hope the default seccomp policy blocks AF_ALG in these containers. I bet it doesn’t. Oh well.

2. The write-to-RO-page-cache primitive STILL WORKED! It’s just that the particular exploit used had no meaningful effect in the already-root-in-a-container context. If you think you are safe, you’re probably wrong. All you need to make a new exploit is an fd representing something that you aren’t supposed to be able to write. This likely includes CoW things where you are supposed to be able to write after CoW but you aren’t supposed to be able to write to the source.

So:

- Are you using these containers with a common image or even a common layer in an image to isolate dangerous workloads from each other. Oops, they can modify the image layers and corrupt each other. There goes any sort of cross-tenant isolation.

- What if you get an fd backed by the zero page and write to it? This can’t result in anything that the administrator would approve of.

- What if you ro-bind-mount something in? It’s not ro any more.

jeroenhd 50 minutes ago [-]
> I would hope the default seccomp policy blocks AF_ALG in these containers. I bet it doesn’t. Oh well.

I see a lot of projects blocking those sockets in containers as a response to this exploit, but it seems rather strange to me. We're disabling a cryptographic performance enhancement feature entirely because there was a security bug in them that one time? It's a rather weird default to use. It's not like we're mass-disabling kernel modules everywhere every time someone discovers an EoP bug, do we? Did we blacklist OpenSSL's binaries after Heartbleed?

I suppose it makes sense as a default on vulnerable kernels (though people running vulnerable kernels should put effort into patching rather than workarounds in my opinion), but these defaults are going to be around ten years from now when copy.fail is a distant memory.

fguerraz 2 hours ago [-]
I just contributed this [1] which does what you want for seccomp. Well, not by default, but profiling is now effective against this attack.

Oh, an this [2] just happened

[1] https://github.com/containers/oci-seccomp-bpf-hook/pull/209 [2] https://github.com/moby/moby/pull/52501

hlieberman 4 hours ago [-]
In fact, the authors specifically say on the very first line of their website that the copy/fail primitive can be used as a container escape. The entire premise of this article is flawed and irresponsible.
4 hours ago [-]
dwroberts 3 hours ago [-]
There is an addendum at the bottom where they admit the page corruption is still problematic even with rootless podman.

Although using this to justify their migration to micro-VMs is very strange to me. Sure for this CVE it would have been better, but surely for a future attack it could hit a component shared across VMs but not containers? Are people really choosing technology based on CVE-of-the-week?

anygivnthursday 2 hours ago [-]
Containers were never a security boundary. VMs have better isolation, which is why people choose them for security. Containers are convenience and usually have better performance.
dwroberts 59 minutes ago [-]
I see the ‘not a security boundary’ thing repeated constantly, and while it makes sense (eg. they’re sharing the underlying kernel or at least some access to it) if you think about it a little more, VMs are not magically different: they are better isolated, but VMs on the same host still share the host in common. A CVE next week that allows corruption of host state that affects eg every VM under a particular hypervisor will be no less damaging than this CVE is to containers
necovek 37 minutes ago [-]
You are obviously right that these are similar in principle: VM isolation exploit would lead to the same exposure like container-related isolation exploits.

VMs are considered vastly better because the surface area where exploits can happen is smaller and/or better isolated within the kernel.

If you are arguing the latter is not true — and we are all collectively hand-waving away big chunk of the surface area so that may be the case — it would help to be explicit in why you believe an exploit in that area is similarly likely?

ButlerianJihad 2 hours ago [-]
Containers are a convenience boundary and they increase complexity of your risk assessments.

It is easy for security scanners to scan a Linux system, but will they inspect your containers, and snaps, and flatpaks, and VMs? It is easy for DevOps to ssh into your Linux server, but can they also get logged in to each container, and do useful things? Your patches and all dependencies are up-to-date on your server, but those containers are still dragging around legacy dependencies, by design. Is your backup system aware of containers and capable of creating backup images or files, that are suitable for restoring back to service?

necovek 36 minutes ago [-]
Security scanners already support most container and VM image formats in widespread use.

Does this increase complexity? Yes, it does. Is it worth the cost? Depends on each individual case IMO.

firesteelrain 10 minutes ago [-]
You need a tool like Anchore and PrismaCloud to scan the container images then monitor them in runtime with PrismaCloud. Trellix can “scan” however most people turn off or exclude container directories on the host because it can interfere with the running container.
raesene9 3 hours ago [-]
I've not looked for podman but moby/docker I believe does now block this https://github.com/moby/profiles/commit/7158007a83005b14a24f...
Titan2189 4 hours ago [-]
> [...] that root was just my unprivileged podman user on the host

Couldn't you then simply re-run the exploit again as unprivileged podman user and gain root on the host?

averi 4 hours ago [-]
@titan2189, in a rootless Podman environment, the User Namespace acts as a constant translator between the container and the underlying host. Let's say you re-run the exploit, setuid(0) is run, kernel responds with a 0 (success) return code, you'd still be "trapped" inside the user namespace, which in turn maps your (root) user to an unprivileged user.
averi 4 hours ago [-]
@amluto, fair points, I still consider this vulnerability extremely severe as yes, if you use rootless containers the attacker won't be able to get root on the host and with that injecting additional potential malware, but at the same time for containers re-using the same image layers it will give the ability to poison those binaries in memory and break to some extent the container to container isolation.

If we weren't looking into moving away from using containers completely into using ephemeral microVMs one area I'd invest in would be replicating what CargoWall does for GitHub actions in GitLab CI. At that point even if the attacker gained access to a container, modified a binary with some specific instructions (like reading env vars and sending them to an external server) it'd not be able to send credentials or fetch a malware remotely at all due to the DNS queries being intercepted by eBPF and being sent to a CoreDNS proxy.

I still think rootless containers increase the attack vector complexity in more than just a one liner into an attack scenario that, at that point, should also involve understanding additional details about the underlying host with information such as, as you correctly pointed out, what container images (and thus shared image layers) are present and also whether these images use setuid binaries which specific CI jobs explicitly call throughout the build process (kind of unusual to see anyone running a setuid binary in a CI pipeline anyway as that is generally an action that would result in a permission denied in normal conditions).

M_bara 4 hours ago [-]
> (like reading env vars and sending them to an external server) it'd not be able to send credentials or fetch a malware remotely at all due to the DNS queries being intercepted by eBPF and being sent to a CoreDNS proxy.

Wouldn’t the exploit then just use ip addresses directly?

4 hours ago [-]
averi 4 hours ago [-]
@hlieberman, the researchers imply container escape == root access on the target host, that is why they used a setuid binary in order to demonstrate the whole exploit. What this article mentions is that while the container escape (as in the ability to modify a binary in memory that may be shared across multiple containers) is still present gaining root in the underlying host doesn't happen.

@isityettime, the vulnerability happens not because of file contents being modified on disk (think of a base image that is shared across multiple CI builds) rather because a binary in one base image shares the same inode (and thus the same address space in memory as an optimization) as the same binary in another container, meaning container B will execute a poisoned binary and that's where the "container escape" happens.

ezequiel-garzon 3 hours ago [-]
Please reply instead of (or in addition to) tagging the user you're replying to.
pjmlp 3 hours ago [-]
Tagging isn't a feature in HN.
ramon156 3 hours ago [-]
Thanks for the bikeshedding, they meant mentioning.
pjmlp 3 hours ago [-]
It is also not supported, beyond people by sheer luck see their nick.
zenoprax 56 minutes ago [-]
If I see my points shoot up a bit I check my comment history to see what caused it.
anygivnthursday 2 hours ago [-]
Or running their Claw scraping HN comments periodically for their mentions.
hlieberman 3 hours ago [-]
That's true... for the exploit demo that they released. The primitive that underlies the exploit, however -- a page cache write -- can easily bypass the container boundary. One only needs to hook an executable which is also present in the host.
2bitencryption 5 hours ago [-]
tl;dr - within the container, the exploit works, and elevates to root (uid 0) within the container - BUT because that namespace actually maps to uid 1000 (the user) outside the container, the escalation does not flow up to the host.

But… does this escape the container? If not (the author seems to indicate it does not) then does it matter if you are in Docker or rootless Podman, right, since the end result is always: you have elevated to root within the container. If the rest of the container filesystem isolation does its job, the end result is the same? Though I guess another chained exploit to escape the container would be worse in Docker? Do I have that right?

firesteelrain 7 minutes ago [-]
This is a problem and most people hadn’t considered it before because the caching is done to speed up build pipeline performance:

“ While rootless containers prevent the attacker from escalating to host root, the page cache is still shared across the host. Containers that re-use the same base image layers share the same cached pages for those layers — if a malicious CI job corrupts a binary in the page cache, other containers launched from that same image could end up executing the poisoned version.”

eqvinox 5 hours ago [-]
Running sstrip on an ELF binary is called ELF "golfing"? TIL…
walletdrainer 3 hours ago [-]
This feels LLM generated, lots of emdashes and even more text around a completely false premise.
washbasin 5 hours ago [-]
Please post a tl;dr at the top or even in the subject. Many of us are scrambling to patch/reboot our **.
donaldjbiden 5 hours ago [-]
This isn't a new CVE. It's just documenting what happened when this person ran the exploit inside a certain type of container.
isityettime 4 hours ago [-]
It already has a table of contents. The heading titled "why rootless containers stopped the escalation" is your tl;dr.
nullsanity 5 hours ago [-]
[dead]
foreman_ 5 hours ago [-]
[dead]
QuietLedge375 5 hours ago [-]
[dead]
hackeman300 5 hours ago [-]
[dead]
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 08:59:25 GMT+0000 (Coordinated Universal Time) with Vercel.