NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
The iPad was on Tailscale: a WebRTC debugging story (p2claw.com)
Sean-Der 44 minutes ago [-]
Amazing debugging, I loved reading that. HN doesn't get enough good posts like this anymore :)

If https://github.com/pion/sctp/issues/12 had happened (not just in Pion but across all implementations) this could have been fixed years ago. The hardcoding we all settle for is tragic.

syllogistic 29 minutes ago [-]
Author here, thank you, that means a lot coming from you. Pion was the prior art I pointed the webrtc-rs maintainers at. And pion/sctp#12 is super relevant. A known, proposed fix years before we hit it.

"The hardcoding we all settle for" might be the epigraph for the whole incident. webrtc-rs invited a PR for the configurable-MTU + better default half [webrtc-rs/webrtc#806] to unblock folks today. Whether PMTUD gets implemented will be interesting to see.

inigyou 1 hours ago [-]
I don't understand how a product as popular as Tailscale can get this far while dropping certain ordinary types of packets.

It is impossible to parse the UDP or TCP port number out of a fragment. This is surely the reason the ACL module entirely rejects them. TCP will adjust it's segment size based on PMTUD so as to not require fragmentation. This is why it hasn't been noticed so far. But fragmented UDP packets are a corner case of normal behavior and it boggles the mind that someone could just decide to completely drop them.

UDP fragment filtering could be implemented by a global fragments on/off setting (works for "allow everything" = fragments on, cautious = fragments off) or by blocking the first fragment which includes the port number (and blocking it if the port number is split across fragments which I think is technically allowed but completely abnormal).

syllogistic 51 minutes ago [-]
Author here,

Agreed. The port-number point is the most plausible rationale I've heard, more convincing than the RFC line in their source comment. The historical fix for "can't classify fragments" was virtual reassembly or flow tracking [conntrack on linux, scrub in pf], so dropping them outright punts past known prior approaches. Even your lighter idea would have saved us: a first-fragment match would have let our pair through.

We've reported upstream to both projects, tailscale/tailscale#20083 and webrtc-rs/webrtc#806, and webrtc-rs already invited a PR.

inigyou 40 minutes ago [-]
You are shadowbanned.
hylaride 51 minutes ago [-]
I'm having flashbacks to 1990s-era PPPoE, where the slightly smaller MTU had issues with some server OS's that had TCP/IP stacks that didn't support or ignored MTUs smaller than 1500 bytes and bulk data transfers would get messed up. I don't remember which ones, but it was some commercial UNIX.
katericksonnow 1 hours ago [-]
MTU black holes are the worst because every health check is small enough to survive.
1 hours ago [-]
syllogistic 2 hours ago [-]
Author here.

This started as a blank page on one device and ended two weeks later at the intersection of two bugs: webrtc-rs hardcodes INITIAL_MTU=1228 [never updated, no path probing, retransmits at the same size forever], and Tailscale's packet filter classifies any IPv6 packet with a Fragment header as unknown protocol, so the default deny fires. On every platform, counted under reason="acl". Neither is unreasonable alone. Together: silent wedge, every health check green, because everything that tests the path is small and only the payload fragments. Two-command repro on any tailnet: ping -s 100 works, ping -s 1400 over the Tailscale IPv6 address is 100% loss. Full WebRTC repro and captures: https://github.com/phact/mtu-webrtc-bug. We've reported upstream to both projects https://github.com/tailscale/tailscale/issues/20083 and https://github.com/webrtc-rs/webrtc/issues/806. Happy to answer questions. Especially interested if anyone knows the history behind the IPv6 fragment decision in Tailscale's filter.

2 hours ago [-]
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 16:56:48 GMT+0000 (Coordinated Universal Time) with Vercel.