diff options
author | Vincent Palatin <vpalatin@chromium.org> | 2017-02-13 16:49:24 +0100 |
---|---|---|
committer | chrome-bot <chrome-bot@chromium.org> | 2019-02-19 22:15:32 -0800 |
commit | ebcfc5d45b349e0b5e6705e9813e4d7240264f49 (patch) | |
tree | 6a6c3592b9c7f3f196fe5336d3f5c95b853da7b4 | |
parent | 869f0477a16c7a41fb3e6128cf8c852cee0dd59a (diff) | |
download | chrome-ec-ebcfc5d45b349e0b5e6705e9813e4d7240264f49.tar.gz |
docs: add core runtime design documentation
Start documenting some of the Chromium EC runtime principles and
choices.
W.I.P: still a bunch of TODOs
BRANCH=none
BUG=none
TEST=~/trunk/chromium/src/tools/md_browser/md_browser.py -d docs
Signed-off-by: Vincent Palatin <vpalatin@chromium.org>
Change-Id: I9f3d27ab752a714626cc4ec312771367ff67fcea
Reviewed-on: https://chromium-review.googlesource.com/445941
Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com>
Tested-by: Nicolas Boichat <drinkcat@chromium.org>
Reviewed-by: Aaron Durbin <adurbin@chromium.org>
-rw-r--r-- | docs/core_runtime.md | 345 |
1 files changed, 345 insertions, 0 deletions
diff --git a/docs/core_runtime.md b/docs/core_runtime.md new file mode 100644 index 0000000000..9d81b807be --- /dev/null +++ b/docs/core_runtime.md @@ -0,0 +1,345 @@ +Chromium OS Embedded Controller runtime +======================================= + +Design principles +----------------- + + 1. Never do at runtime what you can do at compile time +The goal is saving flash space and computations. +Compile-time configuration until you really need to switch at runtime. + + 2. Real-time: guarantee low latency (eg < 20 us) +no interrupt disabling ... +bounded code in interrupt handlers. + + 3. Keep it simple: design for the subset of microcontroller we use +targeted at 32-bit single core CPU +for small systems : 4kB to 64kB data RAM, possibly execute-in-place from flash. + +Execution contexts +------------------ + +This is a pre-emptible runtime with static tasks. +It has only 2 possible execution contexts: + +- the regular [tasks](#tasks) +- the [interrupt handlers](#interrupts) + +The initial startup is an exception as described in the +[dedicated paragraph](#Startup). + +### tasks + +The tasks are statically defined at compile-time. +They are described for each *board* in the +[board/$board/ec.tasklist](../board/host/ec.tasklist) file. + +They also have a static fixed priority implicitly defined at compile-time by +their order in the [ec.tasklist](../board/host/ec.tasklist) file (the top-most +one being the lowest priority aka *task* *1*). +As a consequence, two different tasks cannot have the same priority. + +In order to store its context, each task has its own stack whose (*small*) size +is defined at compile-time in the [ec.tasklist](../board/host/ec.tasklist) file. + +A task can normally be preempted at any time by either interrupts or higher +priority tasks, see the [preemption section](#scheduling-and-preemption) for +details and the [locking section](#locking-and-atomicity) for the few cases +where you need to avoid it. + +### interrupts + +The hardware interrupt requests are connected to the interruption handling +*C* routines declared by the `DECLARE_IRQ` macros, through some chip/core +specific mechanisms (e.g. depending whether we have a vectored interrupt +controller, slave interrupt controllers...) + +The interrupts can be nested (ie interrupted by a higher priority interrupt). +All the interrupt vectors are assigned a priority as defined in their +`DECLARE_IRQ` macro. The number of available priority level is +architecture-specific (e.g. 4 on Cortex-M0, 8 on Cortex-M3/M4) and several +interrupt handlers can have the same priority. An interrupt handler can only be +interrupted by an handler having a priority **strictly** **greater** than +its own. + +In most cases, the exceptions (e.g data/prefetch aborts, software interrupt) can +be seen as interrupts with a priority strictly greater than all IRQ vectors. +So they can interrupt any IRQ handler using the same nesting mechanism. +All fatal exceptions should ultimately lead to a reboot. + +### Events + +Each task has a *pending* events bitmap[1] implemented as a 32-bit word. +Several events are pre-defined for all tasks, the most significant bits on the +32-bit bitmap are reserved for them : the timer pending event on bit 31 +([see the corresponding section](#Timers)), the requested task wake (bit 29), +the event to kick the waiters on a mutex (bit 30), along with a few hardware +specific events. +The 19 least significant bits are available for task-specific meanings. + +Those event bits are used in inter-task communication and scheduling mechanism, +other tasks **and** interrupt handlers can atomically set them to request +specific actions from the task. Therefore, the presence of pending events in a +task bitmap has an impact on its scheduling as described in the [scheduling +section](#scheduling-and-preemption). +These requests are done using the `task_set_event()` and `task_wake()` +primitives. + +The two typical use-cases are: + +- a task sends a message to another task (simply use some common memory + structures [see explanation](#single-address-space) and want it to process + it now. +- an hardware IRQ occurred and we need to do some long processing to respond to + it (e.g. an I2C transaction). The associated interrupt handler cannot do it + (for latency reason), so it will raise an event to ask a task to do it. + +The task code chooses to consume them (or a subset of them) when it's running +through the `task_wait_event()` and `task_wait_event_mask()` primitives. + +### Scheduling and preemption + +The system has a global bitmap[1] called `tasks_ready` containing one bit +per task and indicating whether or not it is *ready* *to* *run* +(ie want/need to be scheduled). +The task ready bit can only be cleared when it's calling itself one of the +functions explicitly triggering a re-scheduling (e.g. `task_wait_event()` +or `task_set_event()`) **and** it has no pending event. +The task ready bit is set by any task or interrupt handler setting an event +bit for the task (ie `task_set_event()`). + +The scheduling is based on (and *only* on) the `tasks_ready` bitmap +(which is derived from all the events bitmap of the tasks as explained above). + +Then, the scheduling policy to find which task should run is just finding the +most significant bit set in the tasks_ready bitmap and schedule the corresponding task. + +Important note: the re-scheduling happens **only** when we are exiting the interrupt context. +It is done in a non-preemptible context (likely with the highest priority). +Indeed, a re-scheduling is actually needed only when the highest priority task ready has changed. +There are 3 distinct cases where this can happen: + +- an interrupt handler sets a new event for a task. + In this case, `task_set_event` will detect that it is executed in interrupt + context and record in the `need_resched_or_profiling` variable that it might + need to re-schedule at interrupt return. When the current interrupt is going + to return, it will see this bit and decide to take the slow path making a new + scheduling decision and eventually a context switch instead of the fast path + returning to the interrupt task. +- a task sets an event on another task. + The runtime will trigger a software interrupt to force a re-scheduling at its + exit. +- the running task voluntarily relinguish its current execution rights by + calling `task_wait_event()` or a similar function. + This will call the software interrupt similarly to the previous case. + +On the re-scheduling path, if the highest-priority ready task is not matching +the currently running one, it will perform a context-switch by saving all the +processor registers on the current task stack, switch the stack pointer to the +newly scheduled task, and restore the registers from the previously saved +context from there. + +### hooks and deferred function + +The lowest priority task (ie Task 1, aka TASK_ID_HOOKS) is reserved to execute +repetitive actions and future actions deferred in time without blocking the +current task or creating a dedicated task (whose stack memory allocation would +be wasting precious RAM). + +The HOOKS task has a list of deferred functions and their next deadline. +Every time it is waken up, it runs through the list and calls the ones whose +deadline is expired. Before going back to sleep, it arms a timer to the closest +deadline. +The deferred functions can be created using the `DECLARED_DEFERRED()` macro. +Similarly the HOOK_SECOND and HOOK_TICK hooks are called periodically by the +HOOKS task loop (the *tick* duration is platform-defined and shorter than +the second). + +Note: be specially careful about priority inversions when accessing resources +protected by a mutex (e.g. a shared I2C controller) in a deferred function. +Indeed being the lowest priority task, it might be de-scheduled for long time +and starve higher priority tasks trying to access the resource given there is +no priority boosting implemented for this case. +Also be careful about long delays (> x 100us) in hook or deferred function +handlers, since those will starve other hooks of execution time. It is better +to implement a state machine where you set up a subsequent call to a deferred +function than have a long delay in your handler. + +### watchdog + +The system is always protected against misbehaving tasks and interrupt handlers +by a hardware watchdog rebooting the CPU when it is not attended. + +The watchdog is petted in the HOOKS task, typically by declaring a HOOK_TICK +doing it as regular intervals. Given this is the lowest priority task, +this guarantees that all tasks are getting some run time during the watchdog +period. + +Note: that's also why one should not sprinkle its code with `watchdog_reload()` +to paper over long-running routine issues. + +To help debugging bad sequences triggering watchdog reboots, most platforms +implement a warning mechanism defined under `CONFIG_WATCHDOG_HELP`. +It's a timer firing at the middle of the watchdog period if it hasn't been +petted by then, and dumping on the console the current state of the execution +mainly to help finding a stuck task or handler. The normal execution is resumed +though after this alert. + +### Startup + +The startup sequence goes through the following steps: + +- the assembly entry routine clears the .bss (uninitialized data), + copies the initialized data (and optionally the code if we are not executing + from flash), sets a stack pointer. +- we can jump to the `main()` C routine at this point. +- then we go through the hardware pre-init (before we have all the clocks to + run the peripherals normal) and init routines, in this rough order: + memory protection if any, gpios in their default state, + prepare the interrupt controller, set the clocks, then timers, + enable interrupts, init the debug UART and the watchdog. +- finally start tasks. + +For the tasks startup, initially only the HOOKS task is marked as ready, +so it is the first to start and can call all the HOOK_INIT handlers performing +initializations before actually executing any real task code. +Then all tasks are marked as ready, and the highest priority one is given +the control. + +During all the startup sequence until the control is given the first task, +we are using a speciak stack called 'system stack' which will be later re-used +as the interrupts and exception stack. + +To prepare the first context switch, the code in `task_pre_init()` is stuffing +all the tasks stacks with a *fake* saved context whose program counter is +containing the task start address and the stack pointer is pointing to its +reserved stack space. + +### locking and atomicity + +The two main concurrency primitives are lightweight atomic variables and +heavier mutexes. + +The atomic variables are 32-bit integers (which can usually be loaded/stored +atomically on the architecture we are supporting). The `atomic.h` headers +include primitives to do atomically various bit and arithmetic operations +using either load-linked/load-exclusive, store-conditional/store-exclusive +or simple depending what is available. + +The mutexes are actually statically allocated binary semaphores. +In case of contention, they will make the waiting task sleep +(removing its ready bit) and use the [event mechanism](#Events) to wake-up +the other waiters on unlocking. + +Note: the mutexes are NOT triggering any priority boosting to avoid the +priority inversion phenomenon. + +Given the runtime is running on single core CPU, spinlocks would be equivalent +to masking interrupts with `interrupt_disable()` spinlocks, but it's +strongly discouraged to avoid harming the real-time characterics of the runtime. + +Time +---- + +### time keeping + +In the runtime, the time is accounted everywhere using a +**64-bit** **microsecond** count since the microcontroller **cold** **boot**. + +Note: The runtime has no notion of wall-time/date, even though a few platform have +an RTC inside the microcontroller. + +These microsecond timestamps are implemented in the code using the `timestamp_t` +type and the current timestamp is returned by the `get_time()` function. + +The time-keeping is preferably implemented using a 32-bit hardware +free running counter at 1Mhz plus a 32-bit word in memory keeping track of +the high word of the 64-bit absolute time. This word is incremented by the +32-bit timer rollback interrupt. + +Note: as a consequence of this implementation, when the 64-bit timestamp is read +in interrupt context in an handler having a higher priority than the timer IRQ +(which is somewhat rare), the high 32-bit word might be incoherent (off by one). + +### timer event + +The runtime offers *one* (and only one) timer per task. +All the task timers are multiplexed on a single hardware timer. +(can be just a *match* *interrupt* on the free running counter mentioned in the +[previous paragraph](#time-keeping)) +Every time a timer is armed or expired, the runtime finds the task timer having +the closest deadline and programs it in the hardware to get an interrupt. +At the same time, it sets the TASK_EVENT_TIMER event in all tasks whose timer +deadline has expired. +The next deadline is computed in interrupt context. + +Note: given each task has a **single** timer which is also used to wake-up the +task when `task_wait_event()` is called with a timeout, one needs to be careful +when using directly the `timer_arm()` function because there is an eventuality +that this timer is still running on the next `task_wait_event()` call, the call +will fail due to the lack of available timer. + +Memory +------ + +### Single address space + +There is no memory isolation between tasks (ie they all live in the same address +space). Some architectures implement memory protection mechanism albeit only to +differentiate executable area (eg `.code`) from writable area (eg `.bss` or +`.data`) as there is a **single** **privilege** level for all execution contexts. + +As all the memory is implicitely shared between the task, the inter-task +communication can be done by simply writing the data structures in memory +and using events to wake the other task (given we properly thought the concurrent +accesses on thoses structures). + +### heap + +The data structure should be statically allocated at compile time. + +Note: there is no dynamic allocator available (e.g. `malloc()`), not due to +impossibility to create one but to avoid the negative side effects of +having one: ie poor/unpredictable real-time behavior and possible leaks +leading to a long-tail of failures. + +- TODO: talk about shared memory +- TODO: where/how we store *panic* *memory* and *sysjump* *parameters*. + +### stacks + +Each task has its own stack, in addition there is a system stack used for +startup and interrupts/exceptions. + +Note 1: Each task stack is relatively small (e.g. 512 bytes), so one needs to +be careful about stack usage when implementing features. + +Note 2: At the same time, the total size of RAM used by stacks is a big chunk +of the total RAM consumption, so their sizes need to be carefully tuned. +(please refer to the [debugging paragraph](#debugging) for additional input on +this topic. + +## Firmware code organization and multiple copies + +- TODO: Details the classical RO / RW partitions and how we sysjump. + +power management +---------------- + +- TODO: talk about the idle task + WFI (note: interrupts are disabled!) +- TODO: more about low power idle and the sleep-disable bitmap +- TODO: adjusting the microsecond timer at wake-up + +debugging +--------- + +- TODO: our main tool: serial console ... +(but non-blocking / discard overflow, cflush DO/DONT) +- TODO: else JTAG stop and go: careful with watchdog and timer +- TODO: panics and software panics +- TODO: stack size tuning and canarying + + +- TODO: Address the rest of the comments from https://crrev.com/c/445941 + +[1]: bitmap: array of bits. |