Excellently explained writeup. Kudos on explaining the shockingly multiple kernel bugs in a (a) simple (b) interesting way.
TL;DR the main issue arises because the context switch and sampling event both need to be written to the `ringBuffer` eBPF map. sampling event lock needs to be taken in an NMI which is by definition non-maskable. This leads to lock contention and recursive locks etc as explained when context switch handler tries to do the same thing.
Why not have context switches write to ringBuffer1 and sampling events write to ringBuffer2 (i.e. use different ringBuffers). This way buggy kernels should work properly too !?
legedemon 4 hours ago [-]
Thanks for the great write-up with links to many more interesting articles and code! I have long stopped working on Linux kernel but deep dives like these are very exciting reading.
Rendered at 06:59:19 GMT+0000 (Coordinated Universal Time) with Vercel.
TL;DR the main issue arises because the context switch and sampling event both need to be written to the `ringBuffer` eBPF map. sampling event lock needs to be taken in an NMI which is by definition non-maskable. This leads to lock contention and recursive locks etc as explained when context switch handler tries to do the same thing.
Why not have context switches write to ringBuffer1 and sampling events write to ringBuffer2 (i.e. use different ringBuffers). This way buggy kernels should work properly too !?