Symmetric multiprocessing in your keyboard
While my daughter sleeps during my parental leave I manage to get up to more than I thought I would. This time, a deep-dive into QMK.
Overview
This writeup is about how I enabled multicore processing on my keyboard, the structure is as follows:
- A short intro to
QMK
. - A dive into keyboards, briefly how they function.
- Microcontrollers and how they interface with the keyboard.
- Threading on Chibios.
- Multithread vs multicore, concurrency vs parallelism.
- Tying it together.
QMK and custom keyboards
QMK
contains open source firmware for keyboards, it provides implementations for most custom keyboard functionality,
like key presses (that one's obvious), rotary encoders, and oled screens.
It can be thought of as an OS for your keyboard, which can be configured by plain json
,
with online tools, and other
simple tools that you don't need to be able to program to use.
But, you can also get right into it if you want, which is where it gets interesting.
Qmk structure
Saying that QMK
is like an OS for your keyboard might drive some pedantics mad, since QMK
packages
an OS and installs it configured on your keyboard, with your additions.
Most features are toggled by defining constants in different make
or header files, like:
#pragma once
// Millis
#define OLED_UPDATE_INTERVAL 50
#define OLED_SCROLL_TIMEOUT 0
#define ENCODER_RESOLUTION 2
// Need to propagate oled data to right side
#define SPLIT_TRANSACTION_IDS_USER OLED_DATA_SYNC
It also exposes some API's which provide curated functionality, here's an example from the oled driver:
// Writes a string to the buffer at current cursor position
// Advances the cursor while writing, inverts the pixels if true
void oled_write(const char *data, bool invert);
Above is an API that allows you to write text to an oled
screen, very convenient.
Crucially, QMK
does actually ship an OS, in my case chibios.
Chibios is a full-featured RTOS. That OS contains
the drivers for my microcontrollers, and from my custom code I can interface with
the operating system.
Keyboards keyboards keyboards
I have been building keyboards since I started working as a programmer. There is much that can be said about them, but not a lot of it is particularly interesting. I'll give a brief explanation of how they work.
Keyboard internals
A keyboard is like a tiny computer that tells the OS (The other one, the one not in the keyboard) what keys are being pressed.
Here are three arbitrarily chosen important components to a keyboard:
- The Printed Circuit Board (PCB), it's a large chip that connects all the keyboard components. If you're thinking: "Hey that's a motherboard!", then you aren't far off. Split keyboards (usually) have two PCBs working in tandem, connected by (usually) an aux cable.
- The microcontroller, the actual computer part that you program. It can be integrated directly with the PCB, or soldered on to it.
- The switches, the things that when pressed connects circuits on the PCB, which the microcontroller can see and interpret as a key being pressed.
Back to the story
I used an Iris for years and loved it, but since some pretty impressive microcontrollers that aren't AVR, but ARM came out, surpassing the AVR ones in cost-efficiency, memory, and speed, while being compatible, I felt I needed an upgrade.
A colleague tipped me off about lily58, which takes any pro-micro-compatible microcontroller, so I bought it. Alongside a couple of RP2040-based microcontrollers.
RP2040 and custom microcontrollers
Another slight derailment, the RP2040 microcontroller is a microcontroller with an Arm-cortex-m0+ cpu. Keyboard-makers take this kind of microcontroller, and customize them to fit keyboards, since pro-micro microcontrollers have influenced a lot of the keyboard PCBs, many new microcontroller designs fit onto a PCB the same way that a pro-micro does. Meaning, often you can use many combinations of microcontrollers, with many combinations of PCBs.
The arm-cortex-m0+ cpu is pretty fast, cheap, and has two cores, TWO CORES, why would someone even need that? But, if there are two cores on there, then they should both definitely be used.
Back to the story, pt2
I was finishing up my keyboard and realized that oled
-rendering is by default set to 50ms, to not impact
matrix scan rate. (The matrix scan rate is when the microcontroller checks the PCB for what keys are being held down,
if it takes too long it may impact the core functionality of key-pressing and releasing being registered correctly).
Now I found the purpose of multicore, if rendering to the oled takes time, then that job could (and therefore should) be shoveled onto a different thread. My keyboard has 2 cores, I should parallelize this by using a thread!
Chibios and threading
Chibios is very well documented; it even has a section on threading, and it even has a convenience function for spawning a static thread.
It can be used like this:
static THD_WORKING_AREA(my_thread_area, 512);
static THD_FUNCTION(my_thread_fn, arg) {
// Cool function body
}
void start_worker(void) {
thread_t *thread_ptr = chThdCreateStatic(my_thread_area, 512, NORMALPRIO, my_thread_fn, NULL);
}
Since my CPU has two cores, if I spawn a thread, work will be parallelized, I thought, so I went for it. (This is foreshadowing).
After wrangling some mutex locks, and messing
with the firmware to remove race conditions, I had a multithreaded implementation that could offload rendering
to the oled
display on a separate thread, great! Now why is performance so bad?
Multithread != Multicore, an RTOS is not the same as a desktop OS
When I printed the core-id of the thread rendering to the oled
-display, it was 0
. I wasn't
actually using the extra core which would have core-id 1
.
The assumption that:
If I have two cores and I have two threads, the two threads should be running or at least be available to accept tasks almost 100% of the time.
does not hold here.
It would hold up better on a regular OS like Linux
, but on Chibios
it's a bit more explicit.
Note:
Disregarding that Chibios
spawns both a main-thread, and an idle-thread (on the same core) by default, so it's not just one,
although that's not particularly important to performance.
On concurrency vs parallelism
Threading without multiprocessing can produce concurrency, like in Python with the GIL enabled. A programmer can run multiple tasks at the same time and if those tasks don't require CPU-time, such as waiting for some io, the tasks can make progress at the same time, which is why Python with the GIL can run webservers pretty well. However, tasks that require CPU-time to make progress will not benefit from having more threads in the single-core case.
One more caveat are blocking tasks that do not park the thread, this will come down to how to the OS decides to schedule things: In a single-core scenario, the main thread offloads some io-work to a separate thread, the OS schedules (simplified) 1 millisecond to the io-thread, but that thread is stuck waiting for io to complete, the application will make no progress for that millisecond. One way to mitigate this is to park the waiting thread inside the io-api, then waking it up on some condition, in that case the blocking io won't hang the application.
In my case, SMP not being enabled meant that the oled-drawer-thread just got starved of CPU-time resulting in drawing to the oled being painfully slow, but even if it hadn't been, there may have been a performance hit because it could have interfered with the regular key-processing.
Parallelism
I know I have two cores, parallelism should therefore be possible, I'll just have to enable Symmetric multiprocessing(SMP). SMP means that the processor can actually do things in parallel. It's not enabled by default, Chibios has some documentation on this.
Enabling SMP is not trivial as it turns out, it needs a config flag for chibios,
a makeflag when building for the platform (rp2040), and some other fixing.
So I had to mess with the firmware once more,
but checking some flags in the code, and some internal structures, I can see that Chibios
is now compiled
ready to use SMP, it even has a reference that I can use to my other core's context &ch1
(&ch0
is core 0).
On Linux
multicore and multithreading is opaque, you spawn a thread, it runs on some core (also assuming that
SMP is enabled, but it generally is for servers and desktops). On Chibios, if you
spawn a thread, it runs on the core that spawned it by default.
Back to the docs, I see that I can instead create a thread from a thread descriptor,
which takes a reference to the instance-context, &ch1
. Perfect, now I'll spawn a thread on the other core, happily ever
after.
WRONG!
It still draws from core-0 on the oled.
Checking the chibios source code, I see that it falls back to &ch0
if &ch1
is null
, now why is it null
?
Main 2, a single main function is for suckers
Browsing through the chibios repo I find the next piece of the puzzle,
a demo someone made of SMP on the RP2040, it needs a separate main function where the instance context (&ch1
)
for the new core is initialized. I write some shim-code, struggle with some more configuration, and finally,
core 1 is doing the oled
work.
Performance is magical, it's all worth it in the end.
Conclusion
My keyboard now runs multicore and I've offloaded all non-trivial work to core 1 so that core 0 can do the time-sensitive matrix scanning, and I can draw as much and often as I want to the oled display.
I had to mess a bit with the firmware to specify that there is an extra
core on the RP2040, and to keep QMK
s hands off of oled state, since
that code isn't thread-safe.
In reality this kind of optimization probably isn't necessary for most users, but if there is work that the keyboard is doing that's triggered by key processing, such as rgb-animations, oled-animations, and similar. Offloading that to a separate core could improve performance, allowing more of that kind of work for a given keyboard.
The code is in my fork here,
with commits labeled [FIRMWARE]
being the ones messing with the firmware.
The keyboard-specific code is contained here, on the same branch.
I hope this was interesting to someone!