kjkoster
Topic Author
Posts: 85
Joined: 12 Feb 2022, 10:42
Answers: 2

Automatically Restart dhcpcd for resiliency

05 Feb 2023, 09:37

Dear All,

A while ago we had a strange outage, where the Revolution Pi would no longer have network connectivity, but the control systems would still work properly. Just the HMI and data logging were no longer accessible and we could not VPN into the system.

After some digging (and a bit of luck) we traced it to the DHCP daemon no longer running. The Linux out of memory killer had chosen to kill it. As it turned out, this was a freak combination of a DHCP lease expiring and memory pressure. We had been hammering the system, but I would not expect Linux to choose to kill a critical component instead of something more expendable.

For more information see https://www.baeldung.com/linux/memory-o ... oom-killer by Baeldung.

A bandaid for this problem is to instruct systemd to automatically restart dhcpcd if it is killed. We've followed https://ma.ttias.be/auto-restart-crashe ... e-systemd/ by Mattias Geniar.

Run te following command:
 $ sudo systemctl edit --full dhcpcd.service
Add the StartLimitIntervalSec, StartLimitBurst lines under [Unit] and the Restart and RestartSec lines under [Service].
[Unit]
Description=dhcpcd on all interfaces
Wants=network.target
Before=network.target

StartLimitIntervalSec=300 # try starting again after 5 minutes if it failed
StartLimitBurst=5 # stop after 5 attempts

[Service]
Type=forking
PIDFile=/run/dhcpcd.pid
ExecStart=/usr/lib/dhcpcd5/dhcpcd -q -b
ExecStop=/sbin/dhcpcd -x

Restart=on-failure # restart only on failure, not on user command
RestartSec=60s # check every 60 seconds

[Install]
WantedBy=multi-user.target
Alias=dhcpcd5.service
@Kunbus: happy to provide a patch, if you point me to the repository where you keep this file.
 
User avatar
dirk
KUNBUS
Posts: 1761
Joined: 15 Dec 2016, 13:19
Answers: 1

Re: Automatically Restart dhcpcd for resiliency

07 Feb 2023, 16:35

Hello kjkoster,
thank you for your analysis and your thoughtfulness in submitting this patch here. With Bullseye we switch to NetworkManager so the error no longer occurs (occurs only with dhcpcd). Therefore it makes sense to tip this directly to Rpi Ltd. Probably the pi-gen repo is the most correct place: https://github.com/RPi-Distro/pi-gen

Who is online

Users browsing this forum: No registered users and 0 guests