Introduction to my RTOS for AVR Processors


What is a Real-Time Operating System

First of all an operating system can be just any sort of basic system that controls the action of various routines or tasks. This must not necessarily be a system that schedules individual programs or tasks. Sometimes finite state machines will be responsible to schedule the workload. In previous projects I have used such state machines, but at the end it was always a new version of it. So I decided to build a real scheduler with individual tasks, in my case called jobs.

Typically a Real-Time Operating System describes a system that has guaranteed reaction time to external events. This means a response to an external or internal event is within a given time, including the actions required to react and process the bare minimum of the tasks related to the event.

The response time required depends on the application and typical response time, although short, are mostly not required to be much less than 100µsec.

Another criteria is the time the operating system requires to start a task based on external or internal events. However if a higher priority task is active then the task will only be set as eligable and not really started.

There is also a requirement named hard real-time capable, this is mainly used for controlling movements, where the overall time for each event handling must be guaranteed. There are several approaches to this.

In my case I have one event, that must be finished within very tight constraints, the Q-Bus interface. To have at least one event that allows guaranteed response, regardless of the state of the RT-OS or jobs, I use the capability of the AVR128Dx and the ATmega480x to have one high priority interrupt for the Q-Bus interface, where bus cycles must finished within 10µsec. This requires that the RT-OS never disables interrupts and jobs may only disable interrupts for a few cycles.

History of my RTOS for AVR processors

In 2021 I got hold of the source code for a PDP-11 based tiny RTOS. I deciced to translate the PDP-11 code into AVR assembler. There are many differences in the architecture, especially regarding interrupt handling and processor status, which required changes to the basics but the features are the same.

Normally an OS consists of several building blocks, like a scheduler, a task switcher, queues, IO and timers. Sometimes an OS also provides functions for semaphores (aka mutex’es) and message passing.

As this is for an embedded controller no support for loading and removing tasks has been added. A simple job creation routine will start the scheduler and the first job created is responsible to create additional jobs.

Basic Implementation Concepts.

First I started to develop the RTOS on a ATMEGA architecture. Later I decided to concentrate on the AVR-DA, AVR-DB and megaAVR 0-Series architecture as they have many features that are very useful. Thus it will also run on a Arduino Nano Every. There are some important differences between old ATMEGA architecture and the new AVR cores. The third revision of the RTOS will not run on the ATMEGA processor, however it should be fairly easy to backport it. A later revision might inlcude the ATMEGA again, but this is not deciced. However the high-level interrupt capability is only available with the new AVR cores.

The RTOS implements the concept of a JOB. Each job has it’s own context, that is it has its own stack and registers, including the status register and the program counter.

Instead of disabling interrupts during RTOS processing I decided to use a PIN change interrupt. This is done by using a PIN which is configured as output and at the same time is configured as Low Level Interrupt. When the PIN is cleared by software the Pin Change Interrupt Vector of the associated Port is called.

The ISR will then just reset the PIN and acknowledge the interrupt and then simply return to the routine that cleared the interrupt. In other words processing continues at Level0 interrupt.

When the OS function has finished it will either exit the OS and continue with the program that called the OS, or execute the system return routinte that can schedule another program or activate the NULL job that executes whenever there is no pending job. Both OS exits will execute a reti instruction to return to the normal processing level and consequently allows any interrupt to be executed.

Using a software interrupt at Level0 allows to still have an interrupt at Level1. This Level1 interrupt can be used for very time critical tasks. A Level1 interrupt can interrupt a Level0 interrupt and hence also the OS. This allows to have a minimal interrupt latency even when the OS is executing. In my case a Q-Bus cycle of the RLV12 Emlator must be finished within 8µsec. When handling Q-Bus cycles in software we need to guarantee that the Q-Bus interrupt handler finishes within this time frame.

Of course the Level1 interrupt may not make any OS call. In the case of the RLV12 Emulator I use another PIN that is cleared by the Level1 interrupt. This PIN should be on a different PORT than the Level1 interrupt. This is because in the current version of the RT-OS the pin change interrupt cannot be shared. Of course the Q-Bus interrupt should use a third port as this pin change interrupt will be configured as the single Level1 interrupt. When the Level1 interrupt exits using reti the processor will be ready to process any pending Level0 interrupt or continue the OS if it has been interrupted by the Level1 interrupt. A Level0 interrupt service routine on the other hand may call the OS to unblock or resume a waiting job.

Device Libraries

Parallel to the OS I have written a small serial device driver with ring buffers and a library for the W5500 which uses interrupts to resume jobs.

We will follow the ABI for AVR processors. OS Functions will pass the parameters using register pairs r25:r24 and r23:r22 and will eventually put the return value into the register pair r25:r24.

Many times an OS includes support for a file system. I have already a simple FAT-32 library that was created for the RLV12 emulator. I will adopt it to work with the OS as the long term plan is to write the MSCP emulator based on this OS.

There is also a module with malloc() and free() that runs at interrupt level. This requires another port for the pin change interrupt for the time being. For the new AVR cores that have the pins distributed to four or more ports, even in low pin-count enclosures, this is not an issue.

More information about pin change interrupts is included in the description of the RT-OS internals.

Jobs

The RTOS defines the context of a job. A job is the only entity that can be scheduled. So for each thread you need to create a job. Each job has it’s own entry point, stack and priority. A job is either eligible for execution, hibernated or waiting for an IO. IOs are either linked to real IOs or to a record queue. Jobs eligible for execution are executed based on their priority. The highest priority job will be executing until another job with a higher priority gets eligible or the job is hibernated or waiting for an IO. Jobs can set their priority in case they need to make sure that their job is done in due time. However to get hold of an exclusive resource this is not sufficient. This is achieved by mutex’s, called locks, than can be acquired and released.

A job will be hibernated if it executes the delay() function, in this case the job will be hibernated for the amount of ticks specified as the parameter of delay(). The job will be kept hibernated until the timer has expired. Another option for a job is to wait for an event. The RTOS defines three ways to wait for an event

  • A job can block itself and wait until an external or internal event is performing the necessary action to unblock the 'block'.
  • A job can suspend itself and wait for an external or internal event to resume the job. This function also allows to specify a time which defines the maximum time a job is waiting for this event
  • A job can wait on a record queue. When another job signals a record to the queue the job will be released and receive the record. Again the job waiting for the queue can specify a maximum time to wait for a record. A record is just a 16-bit pointer to a data area.

All these options will take care of the case when the event occurs before the job is blocking itself. I.e. if an interrupt is unblocking a ‘block’ and no job is waiting for this block the OS will remember that the event has already occurred and will not block the job when it calls the block() function. The same is true for the suspend() and waitqueue() function that will immediately finish should the action that releases a job already have taken place, i.e. the IO already finished or a record has already been queued

Currently only one job may wait for an event. So no more than one job can wait for the same event.

In case you need to lock out multiple jobs we have so called ’locks’. A lock can be acquired and once it is acquired all other jobs requesting the lock will be removed from the queue of eligible jobs and put into the wait queue of the lock. When a job has acquired a ’lock’ it is required to release the ’lock’ else jobs that try to acquire the same ’lock’ will be locked forever. These ’locks’ build a mutex and can be used to safely perform modifications on common data structures.

Typically a job is not allowed to block interrupts, i.e. use the cli() instruction. However sometimes it is more efficient to use cli/sei instead of acquire/release and if you need to protect data structures shared with interrupts this is often the only way to make sure changes are atomic. There is no rule but typically blocking interrupts for ≤2µsec is acceptable. This also depends on the requirements of the level1 interrupt.

Data Structures

The RT-OS uses several data structures which in fact are just bytes in the SRAM. All datastructures must be initialised with zero. All data structures are defined in rtos-vX-Y.inc. This file must be inlcuded when assembling the firmware. It also defines the offsets and bit definitions used by the OS. Typically a job is not allowed to manipulate the data structures and must use the calls to the RTOS functions. Functions typically only pass the address of data structures in order to identify the lock, block, queue etc.

Internal Data

;--------------------------------------------------------------------------
;
;	A minimal set of routines that allow parallel jobs on a AVR
;	microcontroller. You need to provide at least the following
;	data section
;
		.dseg
		align	4
runjob:		.byte	2		; Initialise with 0
curjob:		.byte	2		; Initialise with 0
hibjob:		.byte	2		; Initialise with 0
ioqueue:	.byte	2		; Initialise with 0
iotime:		.byte	4		; Initialise with 0
tesoutptr:	.byte	2		; Test Output Pointer
tesoutent:	.byte	6		
intlevel:	.byte	1
jobid:		.byte	1
nguard:		.byte	1		; Null Job Stack Guard Byte
;
;	2022-01-26	The stack of the null job must be as large as all nested
;			interrupts may require. In case of two interrupt levels
;			the previous default of 19bytes was not sufficient!
;			Therefore null stack has been increased to 31bytes
;
		.byte	31		; minimum amount of stack for null job
		align	4		; align to nice boundary
nstack:
;
;	and of course data sections for the job control blocks (see create:)
;	and the individual stacks of each process and other resources you
;	need in your application (see other routines)

Job Control Blocks

;--------------------------------------------------------------------------
;
;	JCB	- Job Control Block
;
recordstart	jcb
record		jcb, link, 2		; Link header to queue JCBs
record		jcb, stack, 2		; Saved Stack Pointer
record		jcb, joblist, 2		; Address of queue to which this JCB is queued
record		jcb, priority, 1
record		jcb, flags, 1
;
	.equ	jcb__hibernate_bp = 0		; Task is in hibjob queue [delay()]
	.equ	jcb__hibernate_bm = 0x01
	.equ	jcb__suspend_bp	= 1		; Task is suspended
	.equ	jcb__suspend_bm	= 0x02
	.equ	jcb__wait_bp = 2		; Task is waitqueued
	.equ	jcb__wait_bm = 0x04
;
record		jcb, iostat, 1			; 
record		jcb, jobid, 1			; 
recordend	jcb, size

To create a job control block you need to include a statement like

jcb1:		.byte jcb_size

When creating a job the job control block must be filled with the following information

OffsetValue
jcb_link Not used, will be initialized by the create function
jcb_stack This must be the address of the first byte after the stack allocated to the job. Each job must be given it's own stack and the stack must be large enough to handle the requirements of any interrupt routine as interrupts are executed with the stack of the job currently running or the stack of the null job.
jcb_joblist This must be the address of the entry point of the code of the job. It is possible to have more than one job executing the same code, i.e. haveing the saem entry point.
jcb_priority Each job can have a priority between 0 and 255. Jobs are executed according to their priority, the higher the number the higher the jobs priority
jcb_flags Not used, will be initialized by the create function
jcb_iostat This is currently only the index for the serial output when calling the serin or serout routines of the serial driver. 0 means use USARTA, 1 means use USARTB etc.
jcb_jobid The job ID is used during logging, this is just a number between 1 and 255. Each job should have a unique job ID and the job ID should not be 0. The value is only used to identify job during the logging and test output. The value itself is not used but just copied to the logging or test output entry.

Note that the first job created will take over the processor and immediately execute. Therefore the creation of further jobs is the task of the first job created. When a job is created the current registers are saved as the context. I.e. when a job starts it will have the same register values as when the job was created using the create() function.

Locks and Blocks

Locks or Blocks are just 16-bit words in internal SRAM, e.g. the following can be used to get exclusive access to a hardware resource like a SPI interface

spilock:	.byte	2

This of course is only required if the SPI interface is shared between jobs.

IO Control Blocks

;
;	IOQ	- IO-Queue Control Block
;
;	The IO-Queue Control block is an 8-byte data structure that will be inserted    
;	into the ioqueue if a task suspends itself. It is intended for tasks that wait  
;	on a IO to finish within a certain amount of time. As with block/unblock it is  
;	intended to allow an ISR to resume a suspended task, in addition it can drop a  
;	byte into iostat offset to inform the job about the event that caused the       
;	resume. At the same time it allows a task to set a timeout. When a timeout      
;	occurs the IO-Queue Control block will be removed and the job will be resumed   
;	with the timeout flag set in the flags field informing the job that a timeout   
;	has occurred. It is the task of the ISR to check whether a timeout has occurred 
;	should the event trigger the interrupt before the job can take necessary        
;	actions. Typically the ISR should not resume the task anymore as a resume event 
;	will be stored in the flags field to avoid deadlocks when the event occurs      
;	before the job was able to suspend itself.                                      
;
recordstart	ioq
record		ioq, link, 2	
record		ioq, timer, 2
record		ioq, queue, 2
record		ioq, iostat, 1
record		ioq, flags, 1
;
;	Flags in the ioq parameter block 
;
	.equ	ioq__resume_bp	= 2
	.equ	ioq__resume_bm	= 0x04
	.equ	ioq__suspend_bp	= 1
	.equ	ioq__suspend_bm	= 0x02
	.equ	ioq__iodone_bp	= 0
	.equ	ioq__iodone_bm	= 0x01
	.equ	ioq__record_bp	= 3
	.equ	ioq__record_bm	= 0x08
	.equ	ioq__job_bp	= 4
	.equ	ioq__job_bm	= 0x10
recordend	ioq, size

Again to create such an IO control block you need to include a statement liek

ioq2:		.byte ioq_size