thank you very much for your quick responses, very much appreciated.
I'm delighted to get some hardcore questions here.
The kernel module parses the configuration JSON file from the filesystem directly - and uses the set_fs(DS_KERNEL) trick to use the VFS API to read the file. Wouldn't it be better for the kernel module to provide a configfs-based system to configure the driver (see e.g. the LIO iSCSI target or the USB Gadget support for examples of the use of configfs for kernel configuration) and use a userspace helper to parse the current JSON configuration and map that to the configfs backend? (Note that set_fs is likely to be removed from the kernel at some point in the future, see https://lwn.net/Articles/722267/
One advantage of the JSON file is that it can be copied to other RevPi Cores, allowing for easy duplication of a configuration to multiple hosts. I'm not sure if the same could be achieved as easily with a configfs-based approach.
II just think that parsing JSON in the kernel is probably something to avoid. For example, I was a bit impatient and upgraded my Pi to Jessie manually (currently we have one at our company for development), but kept the kernel, since Debian Jessie officially comes with 3.16, and your Wheezy image already had 4.1, so I assumed that userland shouldn't be a problem. The JSON files generated by the updated Pictory 1.2 caused a NULL pointer dereference in the old piControl.ko module though, until I removed 3 fields ("GUID" : "...", "Connections":  and "extend" : ) from the JSON file. (Then it worked.) I'm confident that this specific issue won't occur with the Jessie version of the piControl.ko module, but this example illustrates the possible pitfalls of doing parsing in the kernel.
That said, I didn't suggest to get rid of the JSON file - I'm pretty sure you don't want to completely rewrite your Pictory configuration tool.
I merely suggested that a program is run at boot that parses the JSON file and then creates the corresponding configfs entries in order to configure the kernel module. This tool could also be run to reload the configuration. This offers a much cleaner interface from the perspective of the kernel module (while at the same time allowing for a better customization for people who want to do their own thing), and at the same time being compatible with the existing stack.
Obviously this will involve a certain amount of work, and I don't expect anything to happen on this front in the near future, my question was geared at pointing out how a long-term perspective could look like. If you do decide to work on this at some point, I'd be happy to discuss implementation details and provide feedback.
Why use a global lock piDev_g.lockPI for the process image for everything? Typically values that are read/written from/to that region are 4 bytes or smaller, where atomic operations can be used to ensure that these writes are consistent. Changing individual values could be sped up quite a bit by relying on atomic operations on the kernel <-> userland side. Sure, for large read/write calls one would still need the lock, and the gate thread would have to reimplement the copy operation in terms of a loop of atomic loads / saves (in addition to taking the lock) - and while I haven't measured that, I suspect that this could improve performance for the cases where the user only wants to set a single value.
Also a great suggestion. I guess the global lock is motivated by simplicity (and/or laziness
) and we could add fast paths if the values read/written are <= 4 bytes.
For the record, we would gladly accept pull requests on GitHub for contributions like this.
I can't guarantee that I'll find time for this, but I may take a jab at the issue.
In lieu of this: wouldn't it also be much faster to be able to mmap() the process image into a program's address space, and use direct memory operations from the program, instead of having to execute system calls? Sure, if an application wants to have consistency for areas larger than 4 bytes (the largest atomic op on 32bit ARM), it would still have to resort to read()/write() system calls, but for just switching a single input / output, for example, (or reading/writing single values of 4 bytes or less from/to field busses) this could speed up applications even further, because you'd save one or even two context switches to the kernel. Furthermore, the bounds checks could all happen on the initial mmap() system call (+ page faults by the CPU), further increasing performance.
As Volker has already said, this architecture is largely driven by the traditional EN61131 model where access to the process image is controlled by a single, central entity to avoid synchronization issues.
Well, but even with the current design you have synchronization issues in some cases: say you want to atomically updated two variables that are not
directly next to each other. Then you have three possibilities:
- Don't care about anything in between and do a single large write that encompasses both variables. This will overwrite anything in between, which might not be a good idea.
- Do two consecutive writes. However, that will not be atomic anymore.
- Do a read of the area containing both values, change both in the local copy, and write it back. However, this will again overwrite anything in between, this time maybe not with garbage (as in the first suggestion), but possibly with stale values.
Only for consecutive or single values will this provide an effective synchronization mechanism. And for single values you can easily use mmap() + atomic operations with much better performance characteristics.
On top of that, this architecture allows for fine-grained access control (though this is not implemented in piControl currently), i.e. granting read or write access for specific regions in the process image only to certain clients.
Ok, yes, that's not possible with mmap - because access control there is only possible with page granularity, and the entire process image fits into a single page. (And there's no reason to increase the size, as far as I can tell.)
However, with that in mind: why not create two char devices? One that supports mmap and one that doesn't? This way it is possible to implement access control later (on the char device that doesn't support mmap), and adjust the permissions on other char device in such a way that it's not accessible to those who are supposed to have restricted access. (Maybe default permissions for the mmap-able chardev to root:root:0600, so that users that want mmap need to adjust permissions themselves?)
Why implement waiting for events via a ioctl, KB_WAIT_FOR_EVENT? Why not make the file descriptor pollable via select/poll/epoll? This would allow easy integration into existing event loops.
I think the event support we currently have is not the last word and will be the subject of further discussions internally. We welcome comments like this to come up with an architecture that works for everyone.
If you want to react to events dynamically, then having a poll()able file descriptor is probably the easiest way to do so, because this will automatically integrate into any event loop. There's a reason why the kernel implements file descriptors for eventfd(), signalfd(), timerfd(), etc. That said: it might be useful to have a ioctl() to generate a new file descriptor specifically for polling events, and not reuse the file descriptor of the char device itself. I have to admit that this is only a minor detail I noticed while skimming the code, I don't think I'll be using events myself at the moment.
The Core 3 has an ARMv8 chip - but Raspbian only supports the 32bit architecture (and the chip in an effective ARMv7 mode). I would expect that the Revolution Pi would also only work in 32bit mode? Since there's only 1 GiB of RAM anyway, from that perspective this isn't a limitation - however, the ARMv8 architecture supports additional instructions (especially related to floating point arithmetics) in ARMv8 mode. Do you know if userspace applications are able to make use of the processor improvements even in 32 bit mode?
Unless I'm missing something, the 32-bit kernel is unable to execute 64-bit ELF binaries.
No, it isn't. But that wasn't my question. For example, ARMv8 has twice as many VFP registers as compared to ARMv7 - I wondered whether it is possible to use these even with ARM32 code (it's been quite a while since I've actually written ARM assembly). But from your reaction my guess is the answer is "no".
So the upcoming Jessie image is based on Raspbian, which means everything is compiled for 32-bit ARMv6. Not ARMv7, which isn't supported by the BCM2835 on the CM1. The userland is identical on CM1 and CM3, only the kernel is special for each model because the CPU is initialized differently.
Well, but the userland is not the main issue here - the performance-critical stuff is going to be in my own code, which I can compile for ARMv7.
(Btw.: it would totally be sufficient for Rasbpian to just create an aarch64 kernel, the 32bit userland will still work with that, but then one could have individual programs make use of advanced ARMv8 features.)
Alternatively, Debian itself appears to run natively (without any Raspbian patches) on the Raspberry Pi 3 in ARMv8 mode (with aarch64 as architecture), at least if I read the wiki correctly - haven't tried it myself. Are there any plans to support the Revolution Core 3 under aarch64 based directly off Debian (+ some firmware packages)? Especially since Debian Stretch (and by extension jessie-backports) now has kernel 4.9 + RT patch already included on some architectures, albeit not ARM?
The prospect of utilizing the 64-bit capabilities is tantalizing, but doing this properly and creating a fully-fledged, well-tested product, in addition to our Raspbian-based image, would require double the existing manpower. Of course if a customer shows up tomorrow and orders thousands of devices on the condition of 64-bit support, it would be a different story.
Sure, completely understandable. And sorry, while it's likely we'll order more devices in the future, it won't be on the order of 1000s.
(Note thought that 64bit itself is not really that interesting on the RevPI, as even the CM3's RAM and Flash fit into 32 bits each, the interesting part is the new instruction set, more capabilities by the CPU such as more registers, and the (k)ASLR possibilities when having a larger address space.)
I haven't tested Debian aarch64 either but cannot imagine that it will support the chips on the RevPi Core logic board out of the box.
i only got my information from https://wiki.debian.org/RaspberryPi3
, which doesn't mention USB at all, just that GPU, WiFi and Bluetooth doesn't work. WiFi and Bluetooth isn't available anyway in the RevPi, so only the GPU could be an isssue. (But at least not a huge deal for us, as we're not using it even with the current RevPi CM1 we have here.) Of course, people might not have tested the other parts at all and they might also not work.
Anyway, thanks again for your quick reply and thanks for the great product!