RevPi Core3 - OS hanging randomly

Topics about the Software of Revolution Pi
Post Reply
Projektraum
Posts: 17
Joined: 26 Oct 2018, 11:48
Answers: 0

RevPi Core3 - OS hanging randomly

Post by Projektraum »

Hi,
we received a RevPi Core3 + DIO back from a costumer, which claims that the system randomly hangs.
After doing some in-house tests, we got the error once. Our test consists in a TCP server running on the Core3,
which controls the DIO module via TCP messages, and another device that only keeps the TCP socket connection alive,
sending after different (incremental) waiting time, a keep-alive message to check if the socket is still there.

It is not possible to reproduce the error easily, but I have a copy of the kern.log which contains all error, I just want to
know if I should upload it here on the forum directly (and how can I do that, since it did not accepted the .log files) or if
I have to send it per private to someone at Kunbus.

Many thanks in advance!
Nathan

Here is the beggining from the error logs:
Sep 7 11:58:14 Shaker61541 kernel: [ 2611.623934] ------------[ cut here ]------------
Sep 7 11:58:14 Shaker61541 kernel: [ 2611.623959] WARNING: CPU: 1 PID: 22 at kernel/sched/core.c:7333 migrate_enable+0x320/0x4d4
Sep 7 11:58:14 Shaker61541 kernel: [ 2611.623964] Modules linked in: bcm2835_codec(C) raspberrypi_hwmon bcm2835_v4l2(C) v4l2_mem2mem hwmon bcm2835_mmal_vchiq(C) v4l2_common snd_bcm2835(C) videobuf2_dma_contig videobuf2_vmalloc videobuf2_memops snd_pcm videobuf2_v4l2 snd_timer videobuf2_common snd videodev media vc_sm_cma(C) uio_pdrv_genirq uio piControl(O) ti_dac082s085 mcp320x iio_mux mux_gpio mux_core fixed gpio_74x164 spi_bcm2835aux spi_bcm2835 ks8851 eeprom_93cx6 gpio_max3191x industrialio crc8 i2c_dev ip_tables x_tables ipv6
Sep 7 11:58:14 Shaker61541 kernel: [ 2611.624041] CPU: 1 PID: 22 Comm: rcuc/1 Tainted: G C O 4.19.95-rt38-v7+ #1
Sep 7 11:58:14 Shaker61541 kernel: [ 2611.624043] Hardware name: BCM2835
Sep 7 11:58:14 Shaker61541 kernel: [ 2611.624060] [<80112584>] (unwind_backtrace) from [<8010d2f0>] (show_stack+0x20/0x24)
Sep 7 11:58:14 Shaker61541 kernel: [ 2611.624069] [<8010d2f0>] (show_stack) from [<808900f4>] (dump_stack+0xd8/0x11c)
Sep 7 11:58:14 Shaker61541 kernel: [ 2611.624079] [<808900f4>] (dump_stack) from [<801216a8>] (__warn.part.1+0xcc/0xe8)
Sep 7 11:58:14 Shaker61541 kernel: [ 2611.624087] [<801216a8>] (__warn.part.1) from [<80121844>] (warn_slowpath_null+0x54/0x5c)
Sep 7 11:58:14 Shaker61541 kernel: [ 2611.624095] [<80121844>] (warn_slowpath_null) from [<80150ce4>] (migrate_enable+0x320/0x4d4)
Sep 7 11:58:14 Shaker61541 kernel: [ 2611.624103] [<80150ce4>] (migrate_enable) from [<80129014>] (__local_bh_enable+0x84/0xb0)
Sep 7 11:58:14 Shaker61541 kernel: [ 2611.624111] [<80129014>] (__local_bh_enable) from [<8019d2f0>] (rcu_cpu_kthread+0x47c/0x984)
Sep 7 11:58:14 Shaker61541 kernel: [ 2611.624118] [<8019d2f0>] (rcu_cpu_kthread) from [<8014a60c>] (smpboot_thread_fn+0x2dc/0x32c)
Sep 7 11:58:14 Shaker61541 kernel: [ 2611.624126] [<8014a60c>] (smpboot_thread_fn) from [<80145b98>] (kthread+0x16c/0x174)
Sep 7 11:58:14 Shaker61541 kernel: [ 2611.624134] [<80145b98>] (kthread) from [<801010b4>] (ret_from_fork+0x14/0x20)
Sep 7 11:58:14 Shaker61541 kernel: [ 2611.624137] Exception stack(0xb9d7ffb0 to 0xb9d7fff8)
Sep 7 11:58:14 Shaker61541 kernel: [ 2611.624142] ffa0: 00000000 00000000 00000000 00000000
Sep 7 11:58:14 Shaker61541 kernel: [ 2611.624147] ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Sep 7 11:58:14 Shaker61541 kernel: [ 2611.624149] ------------[ cut here ]------------
Sep 7 11:58:14 Shaker61541 kernel: [ 2611.624152] ffe0: 00000000 00000000 00000000 00000000 00000013 00000000
Sep 7 11:58:14 Shaker61541 kernel: [ 2611.624159] WARNING: CPU: 2 PID: 81 at kernel/cpu.c:330 unpin_current_cpu+0x64/0x8c
Sep 7 11:58:14 Shaker61541 kernel: [ 2611.624161] ---[ end trace 0000000000000002 ]---
Sep 7 11:58:14 Shaker61541 kernel: [ 2611.624162] Modules linked in:
Sep 7 11:58:14 Shaker61541 kernel: [ 2611.624163] ------------[ cut here ]------------
Sep 7 11:58:14 Shaker61541 kernel: [ 2611.624165] ------------[ cut here ]------------
Sep 7 11:58:14 Shaker61541 kernel: [ 2611.624166] bcm2835_codec(C)
Sep 7 11:58:14 Shaker61541 kernel: [ 2611.624173] WARNING: CPU: 3 PID: 82 at kernel/cpu.c:330 unpin_current_cpu+0x64/0x8c
Sep 7 11:58:14 Shaker61541 kernel: [ 2611.624180] WARNING: CPU: 0 PID: 80 at kernel/cpu.c:330 unpin_current_cpu+0x64/0x8c
Sep 7 11:58:14 Shaker61541 kernel: [ 2611.624181] raspberrypi_hwmon
Sep 7 11:58:14 Shaker61541 kernel: [ 2611.624182] Modules linked in:
Sep 7 11:58:14 Shaker61541 kernel: [ 2611.624183] Modules linked in:
Sep 7 11:58:14 Shaker61541 kernel: [ 2611.624184] bcm2835_v4l2(C)
Sep 7 11:58:14 Shaker61541 kernel: [ 2611.624186] bcm2835_codec(C)
Sep 7 11:58:14 Shaker61541 kernel: [ 2611.624187] bcm2835_codec(C)
Sep 7 11:58:14 Shaker61541 kernel: [ 2611.624188] v4l2_mem2mem
Sep 7 11:58:14 Shaker61541 kernel: [ 2611.624189] raspberrypi_hwmon
Sep 7 11:58:14 Shaker61541 kernel: [ 2611.624190] raspberrypi_hwmon
Sep 7 11:58:14 Shaker61541 kernel: [ 2611.624192] hwmon
Sep 7 11:58:14 Shaker61541 kernel: [ 2611.624193] bcm2835_v4l2(C)
Sep 7 11:58:14 Shaker61541 kernel: [ 2611.624194] bcm2835_v4l2(C)
Sep 7 11:58:14 Shaker61541 kernel: [ 2611.624195] bcm2835_mmal_vchiq(C)
Sep 7 11:58:14 Shaker61541 kernel: [ 2611.624196] v4l2_mem2mem
Sep 7 11:58:14 Shaker61541 kernel: [ 2611.624197] v4l2_mem2mem
Sep 7 11:58:14 Shaker61541 kernel: [ 2611.624198] v4l2_common
User avatar
nicolaiB
KUNBUS
Posts: 869
Joined: 21 Jun 2018, 10:33
Answers: 7
Location: Berlin
Contact:

Re: RevPi Core3 - OS hanging randomly

Post by nicolaiB »

Hi Nathan,

in order to investigate this issue, we would need the whole kernel log. Could you please send it to support@kunbus.com with the subject "SUP-5769"? Thanks.

Nicolai
Projektraum
Posts: 17
Joined: 26 Oct 2018, 11:48
Answers: 0

Re: RevPi Core3 - OS hanging randomly

Post by Projektraum »

Hello Nicolai,
thanks for the feedback. I have just sent the logs as requested.
Looking forward for Kunbus feedback.
Nathan
User avatar
nicolaiB
KUNBUS
Posts: 869
Joined: 21 Jun 2018, 10:33
Answers: 7
Location: Berlin
Contact:

Re: RevPi Core3 - OS hanging randomly

Post by nicolaiB »

Hi Nathan,

We have received the log file and will have a look. Is there a way how we can reproduce this error? I've noticed that you're running and older Kernel (4.19): Does this error also happen with the current 5.10.152?

Nicolai
Projektraum
Posts: 17
Joined: 26 Oct 2018, 11:48
Answers: 0

Re: RevPi Core3 - OS hanging randomly

Post by Projektraum »

Hi Nicolai,

I've benn running another test meanwhile, and again the system became unaccesible via LAN. (Also the Power LED on RevPi Core3 was ON RED).
I did a reset on the device, by removing the power supply and re-attach it, in order to access the system again and get the new log files.
Below you can find the error part at the kern.log, regarding this last test.

My current test consist in run our main software, which crates one thread for a RPC server and another thread for a TCP Server, and a simple
python TCP client, running in another RevPi that is connected via LAN cable, which connects to the TCP server and hold the connection alive, sending a keep-alive message
every 2s, 4s, 16s, 256s, 65536s, 4294967296s interval, and then from 2s over again.

I will try now something similar, but only with the TCP Server, to exclude the rest of the code.
Is it somehow possible for you to track the reason of the crash just by the logs, or at least point me to a direction ?

Regarding your question:
Currently this is a production image, that's why we kept the old kernel version.
Since the Core3 is not available anymore, the idea is to replace it with Core S 8GB, and then
a new image will be prepared with up-to-date packages and etc.. BUT we want/need to find out what is happening
with our Core3 devices, since more costumers also contacted us with the same problem.


Sep 8 14:55:41 Shaker61541 kernel: [99658.514251] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
Sep 8 14:55:41 Shaker61541 kernel: [99658.514266] rcu: 0-...!: (0 ticks this GP) idle=0f0/0/0x0 softirq=0/0 fqs=1
Sep 8 14:55:41 Shaker61541 kernel: [99658.514268] rcu: (detected by 3, t=2102 jiffies, g=10416885, q=13148)
Sep 8 14:55:41 Shaker61541 kernel: [99658.514278] Sending NMI from CPU 3 to CPUs 0:
Sep 8 14:55:41 Shaker61541 kernel: [99658.514471] NMI backtrace for cpu 0
Sep 8 14:55:41 Shaker61541 kernel: [99658.514474] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G C O 4.19.95-rt38-v7+ #1
Sep 8 14:55:41 Shaker61541 kernel: [99658.514476] Hardware name: BCM2835
Sep 8 14:55:41 Shaker61541 kernel: [99658.514478] PC is at arch_cpu_idle+0x34/0x4c
Sep 8 14:55:41 Shaker61541 kernel: [99658.514479] LR is at (null)
Sep 8 14:55:41 Shaker61541 kernel: [99658.514481] pc : [<80109978>] lr : [<00000000>] psr: 60000013
Sep 8 14:55:41 Shaker61541 kernel: [99658.514483] sp : 80d01f28 ip : 80d00000 fp : 80d01f34
Sep 8 14:55:41 Shaker61541 kernel: [99658.514484] r10: 80a80720 r9 : 80d95a40 r8 : 00000001
Sep 8 14:55:41 Shaker61541 kernel: [99658.514487] r7 : 00000001 r6 : 80d075f8 r5 : 80d075b0 r4 : ffffe000
Sep 8 14:55:41 Shaker61541 kernel: [99658.514489] r3 : 60000093 r2 : 60000093 r1 : 00000000 r0 : 80109974
Sep 8 14:55:41 Shaker61541 kernel: [99658.514491] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
Sep 8 14:55:41 Shaker61541 kernel: [99658.514493] Control: 10c5383d Table: 370c006a DAC: 00000055
Sep 8 14:55:41 Shaker61541 kernel: [99658.514495] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G C O 4.19.95-rt38-v7+ #1
Sep 8 14:55:41 Shaker61541 kernel: [99658.514496] Hardware name: BCM2835
Sep 8 14:55:41 Shaker61541 kernel: [99658.514499] [<80112584>] (unwind_backtrace) from [<8010d2f0>] (show_stack+0x20/0x24)
Sep 8 14:55:41 Shaker61541 kernel: [99658.514501] [<8010d2f0>] (show_stack) from [<808900f4>] (dump_stack+0xd8/0x11c)
Sep 8 14:55:41 Shaker61541 kernel: [99658.514503] [<808900f4>] (dump_stack) from [<80109c24>] (show_regs+0x1c/0x20)
Sep 8 14:55:41 Shaker61541 kernel: [99658.514505] [<80109c24>] (show_regs) from [<80896c78>] (nmi_cpu_backtrace+0xc0/0xc4)
Sep 8 14:55:41 Shaker61541 kernel: [99658.514508] [<80896c78>] (nmi_cpu_backtrace) from [<80110700>] (handle_IPI+0xd4/0x47c)
Sep 8 14:55:41 Shaker61541 kernel: [99658.514510] [<80110700>] (handle_IPI) from [<80102224>] (bcm2836_arm_irqchip_handle_irq+0xa4/0xa8)
Sep 8 14:55:41 Shaker61541 kernel: [99658.514512] [<80102224>] (bcm2836_arm_irqchip_handle_irq) from [<801019bc>] (__irq_svc+0x5c/0xa8)
Sep 8 14:55:41 Shaker61541 kernel: [99658.514514] Exception stack(0x80d01ed8 to 0x80d01f20)
Sep 8 14:55:41 Shaker61541 kernel: [99658.514517] 1ec0: 80109974 00000000
Sep 8 14:55:41 Shaker61541 kernel: [99658.514519] 1ee0: 60000093 60000093 ffffe000 80d075b0 80d075f8 00000001 00000001 80d95a40
Sep 8 14:55:41 Shaker61541 kernel: [99658.514521] 1f00: 80a80720 80d01f34 80d00000 80d01f28 00000000 80109978 60000013 ffffffff
Sep 8 14:55:41 Shaker61541 kernel: [99658.514523] [<801019bc>] (__irq_svc) from [<80109978>] (arch_cpu_idle+0x34/0x4c)
Sep 8 14:55:41 Shaker61541 kernel: [99658.514526] [<80109978>] (arch_cpu_idle) from [<808ab1a0>] (default_idle_call+0x40/0x54)
Sep 8 14:55:41 Shaker61541 kernel: [99658.514528] [<808ab1a0>] (default_idle_call) from [<80158368>] (do_idle+0x11c/0x18c)
Sep 8 14:55:41 Shaker61541 kernel: [99658.514530] [<80158368>] (do_idle) from [<801586dc>] (cpu_startup_entry+0x28/0x2c)
Sep 8 14:55:41 Shaker61541 kernel: [99658.514532] [<801586dc>] (cpu_startup_entry) from [<808a57dc>] (rest_init+0xd8/0xdc)
Sep 8 14:55:41 Shaker61541 kernel: [99658.514535] [<808a57dc>] (rest_init) from [<80c01004>] (start_kernel+0x4b4/0x4e4)
Sep 8 14:56:02 Shaker61541 kernel: [99679.764257] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
Sep 8 14:56:02 Shaker61541 kernel: [99679.764272] rcu: 0-...!: (1 GPs behind) idle=edc/0/0x0 softirq=0/0 fqs=1
Sep 8 14:56:02 Shaker61541 kernel: [99679.764273] rcu: (detected by 1, t=2102 jiffies, g=10416893, q=26438)
Sep 8 14:56:02 Shaker61541 kernel: [99679.764284] Sending NMI from CPU 1 to CPUs 0:
Sep 8 14:56:02 Shaker61541 kernel: [99679.764466] NMI backtrace for cpu 0
Sep 8 14:56:02 Shaker61541 kernel: [99679.764469] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G C O 4.19.95-rt38-v7+ #1
Sep 8 14:56:02 Shaker61541 kernel: [99679.764471] Hardware name: BCM2835
Sep 8 14:56:02 Shaker61541 kernel: [99679.764472] PC is at arch_cpu_idle+0x34/0x4c
Sep 8 14:56:02 Shaker61541 kernel: [99679.764474] LR is at (null)
Sep 8 14:56:02 Shaker61541 kernel: [99679.764476] pc : [<80109978>] lr : [<00000000>] psr: 60000013
Sep 8 14:56:02 Shaker61541 kernel: [99679.764477] sp : 80d01f28 ip : 80d00000 fp : 80d01f34
Sep 8 14:56:02 Shaker61541 kernel: [99679.764479] r10: 80a80720 r9 : 80d95a40 r8 : 00000001
Sep 8 14:56:02 Shaker61541 kernel: [99679.764481] r7 : 00000001 r6 : 80d075f8 r5 : 80d075b0 r4 : ffffe000
Sep 8 14:56:02 Shaker61541 kernel: [99679.764483] r3 : 60000093 r2 : 60000093 r1 : 00000000 r0 : 80109974
Sep 8 14:56:02 Shaker61541 kernel: [99679.764485] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
Sep 8 14:56:02 Shaker61541 kernel: [99679.764487] Control: 10c5383d Table: 370c006a DAC: 00000055
Sep 8 14:56:02 Shaker61541 kernel: [99679.764490] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G C O 4.19.95-rt38-v7+ #1
Sep 8 14:56:02 Shaker61541 kernel: [99679.764491] Hardware name: BCM2835
Sep 8 14:56:02 Shaker61541 kernel: [99679.764493] [<80112584>] (unwind_backtrace) from [<8010d2f0>] (show_stack+0x20/0x24)
Sep 8 14:56:02 Shaker61541 kernel: [99679.764496] [<8010d2f0>] (show_stack) from [<808900f4>] (dump_stack+0xd8/0x11c)
Sep 8 14:56:02 Shaker61541 kernel: [99679.764498] [<808900f4>] (dump_stack) from [<80109c24>] (show_regs+0x1c/0x20)
Sep 8 14:56:02 Shaker61541 kernel: [99679.764500] [<80109c24>] (show_regs) from [<80896c78>] (nmi_cpu_backtrace+0xc0/0xc4)
Sep 8 14:56:02 Shaker61541 kernel: [99679.764502] [<80896c78>] (nmi_cpu_backtrace) from [<80110700>] (handle_IPI+0xd4/0x47c)
Sep 8 14:56:02 Shaker61541 kernel: [99679.764505] [<80110700>] (handle_IPI) from [<80102224>] (bcm2836_arm_irqchip_handle_irq+0xa4/0xa8)
Sep 8 14:56:02 Shaker61541 kernel: [99679.764507] [<80102224>] (bcm2836_arm_irqchip_handle_irq) from [<801019bc>] (__irq_svc+0x5c/0xa8)
Sep 8 14:56:02 Shaker61541 kernel: [99679.764509] Exception stack(0x80d01ed8 to 0x80d01f20)
Sep 8 14:56:02 Shaker61541 kernel: [99679.764511] 1ec0: 80109974 00000000
Sep 8 14:56:02 Shaker61541 kernel: [99679.764513] 1ee0: 60000093 60000093 ffffe000 80d075b0 80d075f8 00000001 00000001 80d95a40
Sep 8 14:56:02 Shaker61541 kernel: [99679.764516] 1f00: 80a80720 80d01f34 80d00000 80d01f28 00000000 80109978 60000013 ffffffff
Sep 8 14:56:02 Shaker61541 kernel: [99679.764518] [<801019bc>] (__irq_svc) from [<80109978>] (arch_cpu_idle+0x34/0x4c)
Sep 8 14:56:02 Shaker61541 kernel: [99679.764520] [<80109978>] (arch_cpu_idle) from [<808ab1a0>] (default_idle_call+0x40/0x54)
Sep 8 14:56:02 Shaker61541 kernel: [99679.764522] [<808ab1a0>] (default_idle_call) from [<80158368>] (do_idle+0x11c/0x18c)
Sep 8 14:56:02 Shaker61541 kernel: [99679.764525] [<80158368>] (do_idle) from [<801586dc>] (cpu_startup_entry+0x28/0x2c)
Sep 8 14:56:02 Shaker61541 kernel: [99679.764527] [<801586dc>] (cpu_startup_entry) from [<808a57dc>] (rest_init+0xd8/0xdc)
Sep 8 14:56:02 Shaker61541 kernel: [99679.764529] [<808a57dc>] (rest_init) from [<80c01004>] (start_kernel+0x4b4/0x4e4)
Sep 8 14:56:02 Shaker61541 kernel: [99679.765296] rcu: rcu_preempt kthread starved for 2100 jiffies! g10416893 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
Sep 8 14:56:02 Shaker61541 kernel: [99679.765297] rcu: RCU grace-period kthread stack dump:
Sep 8 14:56:02 Shaker61541 kernel: [99679.765300] rcu_preempt R running task 0 11 2 0x00000000
Sep 8 14:56:02 Shaker61541 kernel: [99679.765319] [<808a6b74>] (__schedule) from [<808a7078>] (schedule+0x70/0x134)
Sep 8 14:56:02 Shaker61541 kernel: [99679.765328] [<808a7078>] (schedule) from [<808a9e34>] (schedule_timeout+0x218/0x4f8)
Sep 8 14:56:02 Shaker61541 kernel: [99679.765337] [<808a9e34>] (schedule_timeout) from [<8019e374>] (rcu_gp_kthread+0x5a4/0xac8)
Sep 8 14:56:02 Shaker61541 kernel: [99679.765345] [<8019e374>] (rcu_gp_kthread) from [<80145b98>] (kthread+0x16c/0x174)
Sep 8 14:56:02 Shaker61541 kernel: [99679.765353] [<80145b98>] (kthread) from [<801010b4>] (ret_from_fork+0x14/0x20)
Sep 8 14:56:02 Shaker61541 kernel: [99679.765356] Exception stack(0xb9d4ffb0 to 0xb9d4fff8)
Sep 8 14:56:02 Shaker61541 kernel: [99679.765360] ffa0: 00000000 00000000 00000000 00000000
Sep 8 14:56:02 Shaker61541 kernel: [99679.765366] ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Sep 8 14:56:02 Shaker61541 kernel: [99679.765370] ffe0: 00000000 00000000 00000000 00000000 00000013 00000000
Post Reply