NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
Fixing a kubelet memory leak in Kubernetes 1.36 (heyoncall.com)
compumike 16 hours ago [-]
Author here! If you're running a Kubernetes cluster, I recommend you check `kubectl version` and see if you're running "Server Version: v1.36.[0,1,2]". If so, you may want to use the one-liner at the end of the article to check your "process_resident_memory_bytes" on each node, and consider restarting kubelet as a temporary workaround to tame the memory leak until v1.36.3 is released.
__turbobrew__ 23 minutes ago [-]
A good reason to health check the kubelet process and restart it when the checks fail.
compumike 8 minutes ago [-]
What kind of health checks? In my case, the kubelet process was staying alive and responsive to queries, I believe due to:

  # cat /proc/$(pgrep kubelet)/oom_score_adj
  -999
  
  (from OOMScoreAdjust=-999 in /etc/systemd/system/kubelet.service)  
With this score, the Linux OOM killer wouldn't touch it, but any of my Pods were fair game.
rirze 1 hours ago [-]
Very cool. It's often daunting to contribute to such a well-established and recognizable project, but this is exactly how it should work.
CamouflagedKiwi 1 hours ago [-]
Nice find.

Can't help but feel this is one of the subtle traps hidden beneath the advice that contexts aren't supposed to be stored. I know it's not always that easy, of course.

compumike 33 minutes ago [-]
Thanks. I know there's a `go vet` tool that's run as part of Kubernetes CI, and one of its checks is:

  lostcancel: check cancel func returned by context.WithCancel is called
I'm not 100% sure why `go vet` didn't catch this issue, but storing the cancelFn in the struct is probably part of the reason. Any Go experts know if that's the case?
fsuts 39 minutes ago [-]
Not all heroes wear capes! Well done
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 18:13:49 GMT+0000 (Coordinated Universal Time) with Vercel.