Automatically Restart dhcpcd for resiliency

Topics about the Software of Revolution Pi
Post Reply
kjkoster
Posts: 87
Joined: 12 Feb 2022, 10:42

Automatically Restart dhcpcd for resiliency

Post by kjkoster »

Dear All,

A while ago we had a strange outage, where the Revolution Pi would no longer have network connectivity, but the control systems would still work properly. Just the HMI and data logging were no longer accessible and we could not VPN into the system.

After some digging (and a bit of luck) we traced it to the DHCP daemon no longer running. The Linux out of memory killer had chosen to kill it. As it turned out, this was a freak combination of a DHCP lease expiring and memory pressure. We had been hammering the system, but I would not expect Linux to choose to kill a critical component instead of something more expendable.

For more information see https://www.baeldung.com/linux/memory-o ... oom-killer by Baeldung.

A bandaid for this problem is to instruct systemd to automatically restart dhcpcd if it is killed. We've followed https://ma.ttias.be/auto-restart-crashe ... e-systemd/ by Mattias Geniar.

Run te following command:

Code: Select all

 $ sudo systemctl edit --full dhcpcd.service
Add the StartLimitIntervalSec, StartLimitBurst lines under [Unit] and the Restart and RestartSec lines under [Service].

Code: Select all

[Unit]
Description=dhcpcd on all interfaces
Wants=network.target
Before=network.target

StartLimitIntervalSec=300 # try starting again after 5 minutes if it failed
StartLimitBurst=5 # stop after 5 attempts

[Service]
Type=forking
PIDFile=/run/dhcpcd.pid
ExecStart=/usr/lib/dhcpcd5/dhcpcd -q -b
ExecStop=/sbin/dhcpcd -x

Restart=on-failure # restart only on failure, not on user command
RestartSec=60s # check every 60 seconds

[Install]
WantedBy=multi-user.target
Alias=dhcpcd5.service
@Kunbus: happy to provide a patch, if you point me to the repository where you keep this file.
User avatar
dirk
KUNBUS
Posts: 2099
Joined: 15 Dec 2016, 13:19

Re: Automatically Restart dhcpcd for resiliency

Post by dirk »

Hello kjkoster,
thank you for your analysis and your thoughtfulness in submitting this patch here. With Bullseye we switch to NetworkManager so the error no longer occurs (occurs only with dhcpcd). Therefore it makes sense to tip this directly to Rpi Ltd. Probably the pi-gen repo is the most correct place: https://github.com/RPi-Distro/pi-gen
Post Reply