A while ago we had a strange outage, where the Revolution Pi would no longer have network connectivity, but the control systems would still work properly. Just the HMI and data logging were no longer accessible and we could not VPN into the system.
After some digging (and a bit of luck) we traced it to the DHCP daemon no longer running. The Linux out of memory killer had chosen to kill it. As it turned out, this was a freak combination of a DHCP lease expiring and memory pressure. We had been hammering the system, but I would not expect Linux to choose to kill a critical component instead of something more expendable.
For more information see https://www.baeldung.com/linux/memory-o ... oom-killer by Baeldung.
A bandaid for this problem is to instruct systemd to automatically restart dhcpcd if it is killed. We've followed https://ma.ttias.be/auto-restart-crashe ... e-systemd/ by Mattias Geniar.
Run te following command:
Add the StartLimitIntervalSec, StartLimitBurst lines under [Unit] and the Restart and RestartSec lines under [Service].
Code: Select all
$ sudo systemctl edit --full dhcpcd.service
@Kunbus: happy to provide a patch, if you point me to the repository where you keep this file.
Code: Select all
[Unit] Description=dhcpcd on all interfaces Wants=network.target Before=network.target StartLimitIntervalSec=300 # try starting again after 5 minutes if it failed StartLimitBurst=5 # stop after 5 attempts [Service] Type=forking PIDFile=/run/dhcpcd.pid ExecStart=/usr/lib/dhcpcd5/dhcpcd -q -b ExecStop=/sbin/dhcpcd -x Restart=on-failure # restart only on failure, not on user command RestartSec=60s # check every 60 seconds [Install] WantedBy=multi-user.target Alias=dhcpcd5.service