docs: add core runtime design documentation

Start documenting some of the Chromium EC runtime principles and choices. W.I.P: still a bunch of TODOs BRANCH=none BUG=none TEST=~/trunk/chromium/src/tools/md_browser/md_browser.py -d docs Signed-off-by: Vincent Palatin <vpalatin@chromium.org> Change-Id: I9f3d27ab752a714626cc4ec312771367ff67fcea Reviewed-on: https://chromium-review.googlesource.com/445941 Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com> Tested-by: Nicolas Boichat <drinkcat@chromium.org> Reviewed-by: Aaron Durbin <adurbin@chromium.org>
author: Vincent Palatin <vpalatin@chromium.org> 2017-02-13 16:49:24 +0100
committer: chrome-bot <chrome-bot@chromium.org> 2019-02-19 22:15:32 -0800
commit: ebcfc5d45b349e0b5e6705e9813e4d7240264f49 (patch)
tree: 6a6c3592b9c7f3f196fe5336d3f5c95b853da7b4
parent: 869f0477a16c7a41fb3e6128cf8c852cee0dd59a (diff)
download: chrome-ec-ebcfc5d45b349e0b5e6705e9813e4d7240264f49.tar.gz
1 files changed, 345 insertions, 0 deletions
diff --git a/docs/core_runtime.md b/docs/core_runtime.md
new file mode 100644
index 0000000000..9d81b807be
--- /dev/null
+++ b/docs/core_runtime.md
@@ -0,0 +1,345 @@
+Chromium OS Embedded Controller runtime
+=======================================
+
+Design principles
+-----------------
+
+  1. Never do at runtime what you can do at compile time
+The goal is saving flash space and computations.
+Compile-time configuration until you really need to switch at runtime.
+
+  2. Real-time: guarantee low latency (eg < 20 us)
+no interrupt disabling ...
+bounded code in interrupt handlers.
+
+  3. Keep it simple: design for the subset of microcontroller we use
+targeted at 32-bit single core CPU
+for small systems : 4kB to 64kB data RAM, possibly execute-in-place from flash.
+
+Execution contexts
+------------------
+
+This is a pre-emptible runtime with static tasks.
+It has only 2 possible execution contexts:
+
+- the regular [tasks](#tasks)
+- the [interrupt handlers](#interrupts)
+
+The initial startup is an exception as described in the
+[dedicated paragraph](#Startup).
+
+### tasks
+
+The tasks are statically defined at compile-time.
+They are described for each *board* in the
+[board/$board/ec.tasklist](../board/host/ec.tasklist) file.
+
+They also have a static fixed priority implicitly defined at compile-time by
+their order in the [ec.tasklist](../board/host/ec.tasklist) file (the top-most
+one being the lowest priority aka *task* *1*).
+As a consequence, two different tasks cannot have the same priority.
+
+In order to store its context, each task has its own stack whose (*small*) size
+is defined at compile-time in the [ec.tasklist](../board/host/ec.tasklist) file.
+
+A task can normally be preempted at any time by either interrupts or higher
+priority tasks, see the [preemption section](#scheduling-and-preemption) for
+details and the [locking section](#locking-and-atomicity) for the few cases
+where you need to avoid it.
+
+### interrupts
+
+The hardware interrupt requests are connected to the interruption handling
+*C* routines declared by the `DECLARE_IRQ` macros, through some chip/core
+specific mechanisms (e.g. depending whether we have a vectored interrupt
+controller, slave interrupt controllers...)
+
+The interrupts can be nested (ie interrupted by a higher priority interrupt).
+All the interrupt vectors are assigned a priority as defined in their
+`DECLARE_IRQ` macro. The number of available priority level is
+architecture-specific (e.g. 4 on Cortex-M0, 8 on Cortex-M3/M4) and several
+interrupt handlers can have the same priority. An interrupt handler can only be
+interrupted by an handler having a priority **strictly** **greater** than
+its own.
+
+In most cases, the exceptions (e.g data/prefetch aborts, software interrupt) can
+be seen as interrupts with a priority strictly greater than all IRQ vectors.
+So they can interrupt any IRQ handler using the same nesting mechanism.
+All fatal exceptions should ultimately lead to a reboot.
+
+### Events
+
+Each task has a *pending* events bitmap[1] implemented as a 32-bit word.
+Several events are pre-defined for all tasks, the most significant bits on the
+32-bit bitmap are reserved for them : the timer pending event on bit 31
+([see the corresponding section](#Timers)), the requested task wake (bit 29),
+the event to kick the waiters on a mutex (bit 30), along with a few hardware
+specific events.
+The 19 least significant bits are available for task-specific meanings.
+
+Those event bits are used in inter-task communication and scheduling mechanism,
+other tasks **and** interrupt handlers can atomically set them to request
+specific actions from the task. Therefore, the presence of pending events in a
+task bitmap has an impact on its scheduling as described in the [scheduling
+section](#scheduling-and-preemption).
+These requests are done using the `task_set_event()` and `task_wake()`
+primitives.
+
+The two typical use-cases are:
+
+- a task sends a message to another task (simply use some common memory
+  structures [see explanation](#single-address-space) and want it to process
+  it now.
+- an hardware IRQ occurred and we need to do some long processing to respond to
+  it (e.g. an I2C transaction). The associated interrupt handler cannot do it
+  (for latency reason), so it will raise an event to ask a task to do it.
+
+The task code chooses to consume them (or a subset of them) when it's running
+through the `task_wait_event()` and `task_wait_event_mask()` primitives.
+
+### Scheduling and preemption
+
+The system has a global bitmap[1] called `tasks_ready` containing one bit
+per task and indicating whether or not it is *ready* *to* *run*
+(ie want/need to be scheduled).
+The task ready bit can only be cleared when it's calling itself one of the
+functions explicitly triggering a re-scheduling (e.g. `task_wait_event()`
+or `task_set_event()`) **and** it has no pending event.
+The task ready bit is set by any task or interrupt handler setting an event
+bit for the task (ie `task_set_event()`).
+
+The scheduling is based on (and *only* on) the `tasks_ready` bitmap
+(which is derived from all the events bitmap of the tasks as explained above).
+
+Then, the scheduling policy to find which task should run is just finding the
+most significant bit set in the tasks_ready bitmap and schedule the corresponding task.
+
+Important note: the re-scheduling happens **only** when we are exiting the interrupt context.
+It is done in a non-preemptible context (likely with the highest priority).
+Indeed, a re-scheduling is actually needed only when the highest priority task ready has changed.
+There are 3 distinct cases where this can happen:
+
+- an interrupt handler sets a new event for a task.
+  In this case, `task_set_event` will detect that it is executed in interrupt
+  context and record in the `need_resched_or_profiling` variable that it might
+  need to re-schedule at interrupt return. When the current interrupt is going
+  to return, it will see this bit and decide to take the slow path making a new
+  scheduling decision and eventually a context switch instead of the fast path
+  returning to the interrupt task.
+- a task sets an event on another task.
+  The runtime will trigger a software interrupt to force a re-scheduling at its
+  exit.
+- the running task voluntarily relinguish its current execution rights by
+  calling `task_wait_event()` or a similar function.
+  This will call the software interrupt similarly to the previous case.
+
+On the re-scheduling path, if the highest-priority ready task is not matching
+the currently running one, it will perform a context-switch by saving all the
+processor registers on the current task stack, switch the stack pointer to the
+newly scheduled task, and restore the registers from the previously saved
+context from there.
+
+### hooks and deferred function
+
+The lowest priority task (ie Task 1, aka TASK_ID_HOOKS) is reserved to execute
+repetitive actions and future actions deferred in time without blocking the
+current task or creating a dedicated task (whose stack memory allocation would
+be wasting precious RAM).
+
+The HOOKS task has a list of deferred functions and their next deadline.
+Every time it is waken up, it runs through the list and calls the ones whose
+deadline is expired. Before going back to sleep, it arms a timer to the closest
+deadline.
+The deferred functions can be created using the `DECLARED_DEFERRED()` macro.
+Similarly the HOOK_SECOND and HOOK_TICK hooks are called periodically by the
+HOOKS task loop (the *tick* duration is platform-defined and shorter than
+the second).
+
+Note: be specially careful about priority inversions when accessing resources
+protected by a mutex (e.g. a shared I2C controller) in a deferred function.
+Indeed being the lowest priority task, it might be de-scheduled for long time
+and starve higher priority tasks trying to access the resource given there is
+no priority boosting implemented for this case.
+Also be careful about long delays (> x 100us) in hook or deferred function
+handlers, since those will starve other hooks of execution time. It is better
+to implement a state machine where you set up a subsequent call to a deferred
+function than have a long delay in your handler.
+
+### watchdog
+
+The system is always protected against misbehaving tasks and interrupt handlers
+by a hardware watchdog rebooting the CPU when it is not attended.
+
+The watchdog is petted in the HOOKS task, typically by declaring a HOOK_TICK
+doing it as regular intervals. Given this is the lowest priority task,
+this guarantees that all tasks are getting some run time during the watchdog
+period.
+
+Note: that's also why one should not sprinkle its code with `watchdog_reload()`
+to paper over long-running routine issues.
+
+To help debugging bad sequences triggering watchdog reboots, most platforms
+implement a warning mechanism defined under `CONFIG_WATCHDOG_HELP`.
+It's a timer firing at the middle of the watchdog period if it hasn't been
+petted by then, and dumping on the console the current state of the execution
+mainly to help finding a stuck task or handler. The normal execution is resumed
+though after this alert.
+
+### Startup
+
+The startup sequence goes through the following steps:
+
+- the assembly entry routine clears the .bss (uninitialized data),
+  copies the initialized data (and optionally the code if we are not executing
+  from flash), sets a stack pointer.
+- we can jump to the `main()` C routine at this point.
+- then we go through the hardware pre-init (before we have all the clocks to
+ run the peripherals normal) and init routines, in this rough order:
+   memory protection if any, gpios in their default state,
+   prepare the interrupt controller, set the clocks, then timers,
+   enable interrupts, init the debug UART and the watchdog.
+- finally start tasks.
+
+For the tasks startup, initially only the HOOKS task is marked as ready,
+so it is the first to start and can call all the HOOK_INIT handlers performing
+initializations before actually executing any real task code.
+Then all tasks are marked as ready, and the highest priority one is given
+the control.
+
+During all the startup sequence until the control is given the first task,
+we are using a speciak stack called 'system stack' which will be later re-used
+as the interrupts and exception stack.
+
+To prepare the first context switch, the code in `task_pre_init()` is stuffing
+all the tasks stacks with a *fake* saved context whose program counter is
+containing the task start address and the stack pointer is pointing to its
+reserved stack space.
+
+### locking and atomicity
+
+The two main concurrency primitives are lightweight atomic variables and
+heavier mutexes.
+
+The atomic variables are 32-bit integers (which can usually be loaded/stored
+atomically on the architecture we are supporting). The `atomic.h` headers
+include primitives to do atomically various bit and arithmetic operations
+using either load-linked/load-exclusive, store-conditional/store-exclusive
+or simple depending what is available.
+
+The mutexes are actually statically allocated binary semaphores.
+In case of contention, they will make the waiting task sleep
+(removing its ready bit) and use the [event mechanism](#Events) to wake-up
+the other waiters on unlocking.
+
+Note: the mutexes are NOT triggering any priority boosting to avoid the
+priority inversion phenomenon.
+
+Given the runtime is running on single core CPU, spinlocks would be equivalent
+to masking interrupts with `interrupt_disable()` spinlocks, but it's
+strongly discouraged to avoid harming the real-time characterics of the runtime.
+
+Time
+----
+
+### time keeping
+
+In the runtime, the time is accounted everywhere using a
+**64-bit** **microsecond** count since the microcontroller **cold** **boot**.
+
+Note: The runtime has no notion of wall-time/date, even though a few platform have
+an RTC inside the microcontroller.
+
+These microsecond timestamps are implemented in the code using the `timestamp_t`
+type and the current timestamp is returned by the `get_time()` function.
+
+The time-keeping is preferably implemented using a 32-bit hardware
+free running counter at 1Mhz plus a 32-bit word in memory keeping track of
+the high word of the 64-bit absolute time. This word is incremented by the
+32-bit timer rollback interrupt.
+
+Note: as a consequence of this implementation, when the 64-bit timestamp is read
+in interrupt context in an handler having a higher priority than the timer IRQ
+(which is somewhat rare), the high 32-bit word might be incoherent (off by one).
+
+### timer event
+
+The runtime offers *one* (and only one) timer per task.
+All the task timers are multiplexed on a single hardware timer.
+(can be just a *match* *interrupt* on the free running counter mentioned in the
+[previous paragraph](#time-keeping))
+Every time a timer is armed or expired, the runtime finds the task timer having
+the closest deadline and programs it in the hardware to get an interrupt.
+At the same time, it sets the TASK_EVENT_TIMER event in all tasks whose timer
+deadline has expired.
+The next deadline is computed in interrupt context.
+
+Note: given each task has a **single** timer which is also used to wake-up the
+task when `task_wait_event()` is called with a timeout, one needs to be careful
+when using directly the `timer_arm()` function because there is an eventuality
+that this timer is still running on the next `task_wait_event()` call, the call
+will fail due to the lack of available timer.
+
+Memory
+------
+
+### Single address space
+
+There is no memory isolation between tasks (ie they all live in the same address
+space). Some architectures implement memory protection mechanism albeit only to
+differentiate executable area (eg `.code`) from writable area (eg `.bss` or
+`.data`) as there is a **single** **privilege** level for all execution contexts.
+
+As all the memory is implicitely shared between the task, the inter-task
+communication can be done by simply writing the data structures in memory
+and using events to wake the other task (given we properly thought the concurrent
+accesses on thoses structures).
+
+### heap
+
+The data structure should be statically allocated at compile time.
+
+Note: there is no dynamic allocator available (e.g. `malloc()`), not due to
+impossibility to create one but to avoid the negative side effects of
+having one: ie poor/unpredictable real-time behavior and possible leaks
+leading to a long-tail of failures.
+
+- TODO: talk about shared memory
+- TODO: where/how we store *panic* *memory* and *sysjump* *parameters*.
+
+### stacks
+
+Each task has its own stack, in addition there is a system stack used for
+startup and interrupts/exceptions.
+
+Note 1: Each task stack is relatively small (e.g. 512 bytes), so one needs to
+be careful about stack usage when implementing features.
+
+Note 2: At the same time, the total size of RAM used by stacks is a big chunk
+of the total RAM consumption, so their sizes need to be carefully tuned.
+(please refer to the [debugging paragraph](#debugging) for additional input on
+this topic.
+
+## Firmware code organization and multiple copies
+
+- TODO: Details the classical RO / RW partitions and how we sysjump.
+
+power management
+----------------
+
+- TODO: talk about the idle task + WFI (note: interrupts are disabled!)
+- TODO: more about low power idle and the sleep-disable bitmap
+- TODO: adjusting the microsecond timer at wake-up
+
+debugging
+---------
+
+- TODO: our main tool: serial console ...
+(but non-blocking / discard overflow, cflush DO/DONT)
+- TODO: else JTAG stop and go: careful with watchdog and timer
+- TODO: panics and software panics
+- TODO: stack size tuning and canarying
+
+
+- TODO: Address the rest of the comments from https://crrev.com/c/445941
+
+[1]: bitmap: array of bits.
author	Vincent Palatin <vpalatin@chromium.org>	2017-02-13 16:49:24 +0100
committer	chrome-bot <chrome-bot@chromium.org>	2019-02-19 22:15:32 -0800
commit	ebcfc5d45b349e0b5e6705e9813e4d7240264f49 (patch)
tree	6a6c3592b9c7f3f196fe5336d3f5c95b853da7b4
parent	869f0477a16c7a41fb3e6128cf8c852cee0dd59a (diff)
download	chrome-ec-ebcfc5d45b349e0b5e6705e9813e4d7240264f49.tar.gz