Linux Unable to Fork Processes
We’ve been busy at work upgrading a couple hundred hosts around the world to Ubuntu 24.04 LTS. While doing so we have had a subset of our hosts freak out and become almost entirely uncommunicative.
Examples of how bad things were (and hoping that the SEO gods pick this up and lead you to victory while trying to get out of your misery)
maloche ~ % ssh admin@region-2
Last login: Mon Oct 13 09:50:04 2025 from
-bash: fork: retry: Resource temporarily unavailable
-bash: fork: retry: Resource temporarily unavailable
-bash: fork: retry: Resource temporarily unavailable
/usr/bin/lesspipe: 1: Cannot fork
admin@region-2:~$ df -h
-bash: fork: retry: Resource temporarily unavailable
-bash: fork: retry: Resource temporarily unavailable
Filesystem Size Used Avail Use% Mounted on
tmpfs 197M 19M 179M 10% /run
/dev/mapper/ubuntu--vg-ubuntu--lv 15G 7.8G 6.2G 56% /
tmpfs 982M 0 982M 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
/dev/sda2 2.0G 229M 1.6G 13% /boot
tmpfs 197M 4.0K 197M 1% /run/user/1000
admin@region-2:~$ htop
admin@region-2:~$ cat /proc/sys/kernel/pid_max
-bash: fork: retry: Resource temporarily unavailable
-bash: fork: retry: Resource temporarily unavailable
-bash: fork: retry: Resource temporarily unavailable
4194304
admin@region-2:~$ sudo apt update
FATAL -> Failed to fork.
The FATAL -> Failed to fork.
error message finally lead me down a path of something making sense. Clearly this was a resource issue but both RAM & CPU usage were fine and the disk was not out of space. This leaves file descriptors and process ids.
As it turns out for some strange reason, in our case if WireGuard was operational on the host as we started the upgrade process it would set off a process bomb that persisted across reboots.
Before the fix
maloche ~ % ssh admin@region-2
Welcome to Ubuntu 22.04.5 LTS (GNU/Linux 5.15.0-157-generic x86_64)
* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/pro
System information as of Mon Oct 13 10:43:49 PM UTC 2025
System load: 0.42 Processes: 30296
Usage of /: 53.2% of 14.66GB Users logged in: 0
Memory usage: 35% IPv4 address for ens18:
Swap usage: 0%
* Ubuntu 20.04 LTS Focal Fossa has reached its end of standard support on 31 Ma
For more details see:
https://ubuntu.com/20-04
Expanded Security Maintenance for Applications is not enabled.
0 updates can be applied immediately.
and after
maloche ~ % ssh admin@region-2
Welcome to Ubuntu 22.04.5 LTS (GNU/Linux 5.15.0-157-generic x86_64)
* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/pro
System information as of Mon Oct 13 10:52:54 PM UTC 2025
System load: 0.15 Processes: 118
Usage of /: 53.1% of 14.66GB Users logged in: 0
Memory usage: 8% IPv4 address for ens18:
Swap usage: 0%
* Ubuntu 20.04 LTS Focal Fossa has reached its end of standard support on 31 Ma
For more details see:
https://ubuntu.com/20-04
Expanded Security Maintenance for Applications is not enabled.
0 updates can be applied immediately.
6 additional security updates can be applied with ESM Apps.
Learn more about enabling ESM Apps service at https://ubuntu.com/esm
As you can imagine 30296 active processes vs 118 makes a difference!
I initially stumbled over this by checking ps -ef
and seeing a wall of duplicate processes all with their own pid
root 31120 2 0 15:53 ? 00:00:00 [napi/wg-0]
root 31121 2 0 15:53 ? 00:00:00 [napi/wg-0]
root 31122 2 0 15:53 ? 00:00:00 [napi/wg-0]
root 31123 2 0 15:53 ? 00:00:00 [napi/wg-0]
root 31124 2 0 15:53 ? 00:00:00 [napi/wg-0]
root 31125 2 0 15:53 ? 00:00:00 [napi/wg-0]
root 31126 2 0 15:53 ? 00:00:00 [napi/wg-0]
root 31127 2 0 15:53 ? 00:00:00 [napi/wg-0]
root 31128 2 0 15:53 ? 00:00:00 [napi/wg-0]
root 31129 2 0 15:53 ? 00:00:00 [napi/wg-0]
root 31130 2 0 15:53 ? 00:00:00 [napi/wg-0]
root 31131 2 0 15:53 ? 00:00:00 [napi/wg-0]
root 31132 2 0 15:53 ? 00:00:00 [napi/wg-0]
root 31133 2 0 15:53 ? 00:00:00 [napi/wg-0]
root 31134 2 0 15:53 ? 00:00:00 [napi/wg-0]
root 31135 2 0 15:53 ? 00:00:00 [napi/wg-0]
root 31136 2 0 15:53 ? 00:00:00 [napi/wg-0]
root 31137 2 0 15:53 ? 00:00:00 [napi/wg-0]
root 31138 2 0 15:53 ? 00:00:00 [napi/wg-0]
root 31139 2 0 15:53 ? 00:00:00 [napi/wg-0]
root 31140 2 0 15:53 ? 00:00:00 [napi/wg-0]
root 31141 2 0 15:53 ? 00:00:00 [napi/wg-0]
root 31142 2 0 15:53 ? 00:00:00 [napi/wg-0]
root 31143 2 0 15:53 ? 00:00:00 [napi/wg-0]
root 31144 2 0 15:53 ? 00:00:00 [napi/wg-0]
root 31145 2 0 15:53 ? 00:00:00 [napi/wg-0]
root 31146 2 0 15:53 ? 00:00:00 [napi/wg-0]
root 31147 2 0 15:53 ? 00:00:00 [napi/wg-0]
root 31148 2 0 15:53 ? 00:00:00 [napi/wg-0]
root 31149 2 0 15:53 ? 00:00:00 [napi/wg-0]
root 31150 2 0 15:53 ? 00:00:00 [napi/wg-0]
root 31151 2 0 15:53 ? 00:00:00 [napi/wg-0]
root 31152 2 0 15:53 ? 00:00:00 [napi/wg-0]
root 31153 2 0 15:53 ? 00:00:00 [napi/wg-0]
root 31154 2 0 15:53 ? 00:00:00 [napi/wg-0]
root 31155 2 0 15:53 ? 00:00:00 [napi/wg-0]
root 31156 2 0 15:53 ? 00:00:00 [napi/wg-0]
root 31157 2 0 15:53 ? 00:00:00 [napi/wg-0]
root 31158 2 0 15:53 ? 00:00:00 [napi/wg-0]
root 31160 2 0 15:53 ? 00:00:00 [napi/wg-0]
root 31161 2 0 15:53 ? 00:00:00 [napi/wg-0]
If you see a lot of the same thing in there that you did not anticipate start by shutting that down individually. In our case we stopped it by stopping WireGuard all together on the host which took a lot of attempts to successfully complete while the host had just enough free cycles for us to get in there, so be patient. Once completed the host recovered instantly and we could continue our work. Once fully upgraded we re-enabled the WireGuard service without any further issues.
I have yet to find an explanation for what happened here. This might be a bug in WireGuard itself or the Linux kernel but I couldn’t explain it or knew where to get started with it so I figured I document what we observed and how we got us out of the situation.
15.10.2025