Multitasking and other fun on the PIC.
By Michael Watterson © 2010. All rights reserved. May be copied or printed for personal use only.
Embedded applications usually have some form of Real Time Operating system or Multitasking. We examine why the Microchip PIC microcomputer is ill-suited to this approach and an alternate methodology for Real Time Systems when programming the PIC with JAL. In passing various techniques of embedded programming using JAL will be illustrated and some suggestions regarding selection of PIC or if an alternate CPU should be used.
Which PIC and Why is Multitasking difficult?
There are a confusing variety of PIC Micro. 10F, 12F, 16F, 17F, 18F, dspPIC, PIC32. Initially we consider only the 8bit PIC family (see Choosing a PIC in appendix) as it has JAL (Just Another Language) and an architecture difficult for traditional Multi-tasking solutions.
The architecture is basically Harvard type, with Program in Flash and really all the RAM is the same address as the Registers. Microchip actually calls the RAM the “register file” and the operand mnemonic is “f”. Less than 18F version (10F, 12F, 16F) only small values for f with bank switching flags even on 256 RAM parts. The 18F series can have 8 bit or 12 bit address sizes for f. This means even the large 18F (80pins) can only really use external RAM for table operations, not general register instructions as it is in the same bus/ address space as the Program Instructions, not the register file. Also these PIC have almost all 4096 bytes of “register file” address space in use as the specific function registers are in the same address space as the 3905 bytes RAM.
The most serious issue is Stack. Stack is usually a block of RAM used to save and restore existing address, CPU state, parameters for Procedure, Function and Interrupt calls. Since the PIC doesn’t really have RAM in the conventional sense (only Register File), the Stack is implemented with dedicated RAM and pointer. The lowest spec PIC have only 1 or 2 levels for Return from Interrupt. The 16F series has 3 levels to 8 depending on part. The 18F has the most at 32 levels. Also only the 18F has Push and Pop instructions to manipulate the stack. Even the 18F stack is not enough for traditional Task Switching as you can only have one stack. There are of course awkward "work-arounds" which are slow and use a lot of scarce RAM.
If this seems strange, it’s because in 1977 the General Instruments PIC 1650 was a Programmable Interface Computer for their 16 bit part which had poor I/O features. This part “lives on” as the PIC16F54, which is an almost identical Flash version. The 17F was the first attempt to really enhance the architecture. The 18F is the successful 2nd attempt at that!
Review of Basic Multitasking concepts.
Even in the earliest days of microprocessors in mid 1970s it was realised that eventually a plateau of performance would be reached for a single CPU. Also there was 30 years experience of High Level language programming. UNIX was new and shiny and Batch Time Sharing Mainframe OSes looking a bit tired. But Fortran, C, Pascal, BASIC (originally a cut down ForTran) and Cobol (the main stream languages by late 1970s) had no “built in” concepts of multi-threading, multitasking, concurrency and such even though Multi-User operating systems with live terminals rather than batch card/Tape jobs appearing.
Analysing systems using Data Flow Diagrams (DFD) and also Digital Signal Processing (DSP) naturally lead to solutions that are more obvious to implement as Parallel Tasks or Processes. Industrial control and other embedded applications naturally are thought of as “real time systems” with known response times, by parallel tasks. In contrast UNIX was not (and even Linux today still is not) a “Real Time System”. This is nothing to do with how fast your CPU is. Even 2ms tasks requiring only us to execute can’t be reliably executed in a timely fashion on OS X, Linux, or NT (aka Win2000, XP, Vista, Win7) no matter how fast the CPU is or how little else the user is doing, without writing a hardware level device driver. Writing a Device driver is not trivial, and on some OS, may require “signing”, expensive tools, SDK, Non-Disclosure Agreements and much study.
In this discussion we don’t consider the different levels of concurrency on Intel CPU, Java or Windows. We will use Task, Process and Thread as synonymous descriptions of parallel execution, with inter-task communication via signal or other shared memory. Generally we will use the more Generic “Task”. A Signal is assumed to be “atomic”, i.e. no matter when an Interrupt occurs the Signal is either set or not. Larger area of shared memory is not Atomic and requires a Signal setup as a Mutex (Mutual Exclusion). A bit or Boolean variable may not be “Atomic”. What kind of variable is suitable to use as a signal is Compiler and Target dependant. Signals are required to allow inter-task communication and synchronisation safely.
How we got where we are?
Early solutions were Co-operative Multitasking using Co-Routines as implemented by Modula-2 (1978, PIM2 in 1983, ISO version 1996) and also Occam (released 1983) based on David May’s work on “Experimental Language for Distributed Computing” aka EPL and Tony Hoare's (Communicating Sequential Processes) aka CSP started in the late 1970s.
Both have a similar concept of process synchronisation, the Signal. Occam goes much further being a fully concurrent language.
However it’s implemented, or the syntax of language or library lets you program it, the underlying assumption is usually that each task (thread, process) has its own separate memory area, especially the Stack (used for calls and parameters) and Heap (used for temporary variables). With a conventional CPU (i.e. not a PIC) we can have a scheduler that has a table of Stack pointers. Switching task is as simple as saving current Program Counter, Heap Pointer, CPU state on the Stack, save Stack pointer in the table for that task, and restore a previously saved Stack pointer, then restore CPU state, heap pointer and Program counter from that stack.
Each Task (aka process, thread, co-routine) thus has its own non-overlapping block of memory, except perhaps for Signal (Mutex) protected communications / Message or other shared memory.
Each Task runs until it sends or awaits a signal. If no task awaits the signal, the sending task is suspended till one is awaiting it. If a task Awaits a signal, it’s suspended till the signal is received. This is task synchronisation by semaphore
A Mutex is simply created by using a signal. A task would initialise the resource and “send” the signal, that task can then waits for a signal when the first user consumes the signal. Each task that requires the resource Awaits the signal before use and Sends the signal after completion. The Resource task eats that and sends a new signal so that the task using the resource does not wait till someone else needs the task.
Depending on Signal/Semaphore implementation and the language or library, a mutex may be differently implemented.
Penalties on a regular CPU
-
Compared with a purely sequential program, a parallelised program may use much more RAM.
-
On a CPU without Virtual Memory and a MMU, the programmer may have to decide how much memory to allocate to a task at creation. It may be difficult without MMU/Virtual Memory to change the allocation at run time.
-
Without VM/MMU, ensuring that array bound violation, Stack or Heap overflow don’t destroy more than one process is nearly impossible.
-
Deadlock is possible in any parallelised system. This requires different programming skills to avoid.
-
Any Shared resources must have Mutual Exclusion (Mutex) so only one process at a time has access. Like a toilet cubical.
-
Any shared code must be “re-entrant”. If it uses a global value to hold state (static variable), this needs to somehow be unique to each process.
Advantages
-
Digital Signal Processing (DSP) much simpler to implement.
-
Data Flow Diagrams (DFDs) especially based on realtime sampling of I/O is much easier to implement.
-
A delay “loop” (or While Busy do Nothing) does not “waste” CPU or block other tasks. It takes a fixed overhead no matter how long the delay.
-
Real-time response / Embedded control is much easier to write.
-
Even interrupts could become unnecessary, implemented by Signals instead.
-
If you have unknown number of cores or distributed CPUs and this is more than one, then a well designed parallelised Program with a suitable Real Time Operating system can map different Tasks
Problems with Multitasking on PIC.
-
There is simply not the RAM for the conventional model.
-
No RAM based Stack. It’s a Hardware stack, you can only have one of it. You would have to empty and fill the entire stack on a task switch. On less than 18F (10, 12, 16) you can’t Push and Pop items on Stack at all. It’s only for the Interrupt return.
-
Even on 18F, you could only task easily on main level of a “task” not within a procedure/Function call.
Looks bad? Take a break from Multitasking for a moment and consider JAL.
JAL
Why “Just Another Language” (JAL) rather than C, Basic, Forth or even Pascal or Modula-2?
The design of these languages assumes ample RAM, an accessible Stack and such that “regular” CPUs have. C, Basic and Forth do exist for the 16F and 18F. Forth obviously uses a fabricated software stack rather than the real Stack pointer. Parameter passing and Function returns are normally on the Stack. But on the PIC they have to be RAM based, i.e. “general purpose register file”. JAL and the JAL compiler in contrast was especially designed for the quirky architecture of the 16F. JAL also has some features not part of standard C, Basic or Forth for embedded programming:
1) The ChipDef / Chip Include. This allows JAL compiler and JAL programmer to use the hundreds of different PIC, some 10F, 12F, most 16F and most 18F. This is currently about 345 cpu models. There is now an automated process to create these from new Microchip datasheets.
2) ALIAS <some Identifier> IS <some existing variable> This allows pins, ports and registers to be given meaningful application dependent names
3) AT, a variable declaration can be “at” a Register file “location”, or another variable. AT connects a variable name to particular pin, port or register address. Or “AT” can act like a “union”.
var dword fred
var byte bill[4] at fred
This means bill[0] is 1st byte of fred and bill[3] is last byte of fred, dword is always 32 bits.
if you need a 64 bit variable you can
var byte[8] bill
There oddly is NO main program, in sense of main() in C. The logic is that when you declare a variable that is a port you may want to set it up with a value or a direction.
var byte bill[4] = “Hello”
alias GLCD_LED is pin_E1
alias GLCD_LED_direction is pin_E1_direction
GLCD_LED = on
Typically, textually the last part of your file of JAL will be
forever loop
— do stuff
— do more stuff
end loop
All embedded programs generally have this in the last part of the main program or else the CPU would either halt, or do nops (no operation) instructions till program counter wrapped around to reboot.
if you had a Real Time Operating system or Scheduler, it might be started here, or the loop might just have “suspend” in it.
Basics of Real Time
An embedded system or Real Time system is not about speed, per se, but about timely response. It is best to consider it in terms of data flow and sampling. The inputs read, outputs must be updated at a certain rate and processing between have a maximum latency.
1. Timing and Sampling.
Not all tasks need the same time constraints. Not all inputs or outputs require the same response time. First identify the highest speed input. If it must be polled rather than generating an interrupt, the sample clock must be at least twice as fast (Nyquist). This is the minimum speed then of the master Real Time Clock Interrupt. There may be a limited number of Hardware counters / Interrupts. But really we only need one, and that is more efficient than several in use of Stack, as otherwise they may need to overlap.
Identify all the slower events, sampling and timing required. See if there is a simple common denominator as a faster, but still reasonable clock interrupt. It may be that some intervals required are not an exact integer multiple of this Master Interrupt. In this case the ISR (Interrupt Service Routine) will use two software counters to achieve fractional multiplication. All the higher speed events should be an exact integer multiple. In some cases an inexact multiple is fine.
Example
We need to sample 1200Hz external clock on a pin and read a data pin, but 2400Hz is 416.666..7 micro seconds. Our clock is 64us. In this case simply over-sample faster at 256us, thus 4 counts in software of interrupt. This is 3906.25 Hz which is about 3.25 times, exceeding the minimum x2. Counting 5 ticks would be 320us which would also work and also 6 at 384us. But x7 at 448us is too slow, the 1200Hz is not maybe exactly 1200Hz and not synchronized to the CPU clock. Also the CPU clock may not be exactly as expected. So over-sampling at higher speed reduces jitter and increases margin if the remote system is in error. USART (PC serial Ports) typically over-sample at x4 to x 16.
For a non-integer multiplication we need to periodically change the software counter that counts the “ticks” between two values such that the average count is correct. This does introduce some jitter, equivalent to the (difference in counts) x (tick rate).
When the software counter/multiplier has counted.
Then the counter is reloaded with default value. The nature of the task decides what happens next. If you have an 18F, then you can call a procedure or function that takes longer than tick time but less than the (tick time) x (software count) period as the HW stack is 31 deep. If it is 10F, 12F, 16F, then due to lack of HW stack (1 to 3 only), the task is ideally inline and must be completed before next tick. If the task repetition time is longer than time to execute the main “forever loop” then you increment a semaphore. A section in the main loop skips if the semaphore is zero, otherwise executes extra code and decrements the semaphore (Must be a single byte on 8 bit PIC).
2. Functions that wait internally
This also includes Procedures that must wait for external event or have “out” parameters.
We are essentially implementing Data Flow architecture. So everything is in the main forever loop or repeatedly called by an Interrupt. The solution is for the routine (function, procedure) to either use a shared (“private”global”) variable if several use the same resource (say Serial I/O) or a non-shared routine specific “private” global. The first action then is for the semaphore or mutex variable to be tested. If it is not in use, we set it and do the first part of the task. If it is in use we check is it “our” value and if the resource required (time, serial port, whatever) is ready. If it it’s ready then we service it and set the semaphore/Mutex back to “empty”. If it is for “us” but the resource isn’t ready we return.
All data return has to be via “out” parameters. The Function return is always false if the function only did first part of “task” or the “resource” to complete the task was busy, or some other function has the “resource”. The Function only returns true when the resource was accessed and the Function reset the “semaphore/Mutex” to “empty”.
Thus in our procedures called by the main “forever loop” the code after the Function call is repeatedly skipped till there is valid data in the “out” parameters.
This method makes “blocking” I/O non-blocking for the main “forever loop” and makes “While busy do nothing” and “delay (xxx)” take almost no overhead in the main loop. We have turned the main “forever loop” into a basic round robin scheduler without the overhead of multiple stacks and a RTOS kernel.
example Function:
function CAT_FreqBCDRd(byte out ModMode, byte out bcdfreq[4]) return bit is
var bit status = false
var byte digit
bcdfreq[0] =0
bcdfreq[1] =1
bcdfreq[2] =0
bcdfreq[3] =0
modMode = 16
if ( _CAT_QueuedCmd == CMD_NONE ) then
CAT_SendCmd(CAT_NO_PARAM, CMD_FreqRd, true)
status = false
--suspend
elsif (_CAT_QueuedCmd == CMD_FreqRd) then
if (serial_hw_data_available) then
digit = 0 -- get back the frequency
bcdfreq[digit]= serial_hw_data
for 3 loop
digit = digit +1
if RigResponds() then
bcdfreq[digit]= serial_hw_data
else
exit loop
end if
end loop
if RigResponds() then -- read the Mode
modMode = serial_hw_data
status = true
end if
_CAT_QueuedCmd = CMD_NONE
elsif ( _CAT_retryCount < 1) then
_CAT_QueuedCmd = CMD_NONE
else
_CAT_retryCount = _CAT_retryCount -1
end if
end if
return (status)
end function
In this case ( _CAT_QueuedCmd == CMD_NONE ) is testing the shared Mutex /semaphore to see if the resource is free (it’s a serial port). The Function has to send 5 bytes via serial and wait up to 350ms for the remote device to return 5 bytes. Once the remote responds, the 5 returned bytes come in a block, but we have tests in case the communication link is lost or remote is turned off in the middle of transmission.
The CAT_FreqBCDRd thus could be called many times before it does anything. Since we know there is no point in interrogating the remote device more than 3 or 4 times a second, we have a timer that sends a “signal” to part of our main loop, so many times this part of main loop (shown below) is skipped:
part of main loop
— loop
if CheckRadio > 0 then
case rigModeNow of
RIGMODE_RX: block
if !(rigModeNow == rigModeOld) then
if ! clock_on then
DrawClockFace(19,35,17, on, off)
clock_on = true
DisplayWait()
DrawMeterFace(42, 16, S_METER_HEIGHT,S_METER_SCALE, on)
end if
pttStatus = off
if CAT_PTT(pttStatus) then
rigModeOld = rigModeNow
linkFailTime = LINK_RETRIES
end if
else
RIG_RxPoll()
end if
end block
CheckRadio is incremented by the Timer ISR.
Then in Rig_RXPoll() we interrogate the Remote Radio :
Procedure DisplayFrequency() is
var volatile byte rigFreq[4]
var byte rigFreqText[10] = "430.125,00"
var byte modmodeText[3]
var byte newMode
if CAT_FreqBCDRd(newMode, rigfreq) then
if BCD4toString10 (rigFreq, rigFreqText) then
timedOut = false
if newMode < 15 then
modMode = newMode
end if
ScreenCharXY(0,0)
CharStyleDouble = on
print_string(ScreenChar,rigFreqText)
CharStyleDouble = off
ScreenCharXY(20,1)
CharStyleBold = on
ScreenChar ="0"
CharStyleBold = off
end if
ScreenCharXY(15,7)
if RIG_ModeText(modMode,modmodeText) then
ScreenChar = "n"
else
ScreenChar = " "
end if
print_string(ScreenChar,modmodeText)
linkFailTime = LINK_RETRIES
elsif (linkFailTime < 1) then
if ! timedOut then
DisplayWait()
timedOut = true
end if
else
linkfailtime = linkfailtime -1
end if
end procedure
If after a reasonable time the CAT_FreqBCDRd doesn’t return “true” we assume the communications link is broken or the Remote unit is turned off.
To Be Continued! …
Appendix
Complete “Blink LEDs”
for the 18F4550. Always four parts at least:
1) RTC ISR
2) Low level routines that "multitask" using dataflow sampling
3) The Tasks
4) The main forever loop calls the tasks.
include 18f4550
— even though the external crystal is 20 MHz, the configuration is such that
— the CPU clock is derived from the 96 Mhz PLL clock (div2), therefore set
— target frequency to 48 MHz
pragma target clock 48_000_000
— fuses
pragma target PLLDIV P5 — divide by 5 – 20MHZ_INPUT
pragma target CPUDIV P2 — OSC1_OSC2_SRC_1_96MHZ_PLL_SRC_2
pragma target USBPLL F48MHZ — CLOCK_SRC_FROM_96MHZ_PLL_2
pragma target OSC HS_PLL
pragma target FCMEN DISABLED
pragma target IESO DISABLED
pragma target PWRTE ENABLED — power up timer
pragma target VREGEN ENABLED — USB voltage regulator
pragma target VOLTAGE V20 — brown out voltage
pragma target BROWNOUT DISABLED — no brownout detection
pragma target WDTPS P32K — watch dog saler setting
pragma target WDT DISABLED — no watchdog
pragma target CCP2MUX pin_C1 — CCP2 pin
pragma target PBADEN DIGITAL — digital input port<0..4>
pragma target LPT1OSC LOW_POWER — low power timer 1
pragma target MCLR EXTERNAL — master reset on RE3
pragma target STVR DISABLED — reset on stack over/under flow
pragma target LVP DISABLED — no low-voltage programming
pragma target XINST ENABLED — extended instruction set
pragma target DEBUG DISABLED — background debugging
pragma target CP0 DISABLED — code block 0 not protected
pragma target CP1 DISABLED — code block 1 not protected
pragma target CP2 DISABLED — code block 2 not protected
pragma target CP3 DISABLED — code block 3 not protected
pragma target CPB DISABLED — bootblock code not write protected
pragma target CPD DISABLED — eeprom code not write protected
pragma target WRT0 DISABLED — table writeblock 0 not protected
pragma target WRT1 DISABLED — table write block 1 not protected
pragma target WRT2 DISABLED — table write block 2 not protected
pragma target WRT3 DISABLED — table write block 3 not protected
pragma target WRTB DISABLED — bootblock not write protected
pragma target WRTD DISABLED — eeprom not write protected
pragma target WRTC DISABLED — config not write protected
pragma target EBTR0 DISABLED — table read block 0 not protected
pragma target EBTR1 DISABLED — table read block 1 not protected
pragma target EBTR2 DISABLED — table read block 2 not protected
pragma target EBTR3 DISABLED — table read block 3 not protected
pragma target EBTRB DISABLED — boot block not protected
enable_digital_io()
— ——————— Main Program ———————-
alias led1 is pin_b1
pin_b1_direction = output
alias led2 is pin_b2
pin_b2_direction = output
alias led3 is pin_b2
pin_b3_direction = output
; private variables for the low level procedures
const byte NUM_DELAYS = 3
var byte CheckDelay[NUM_DELAYS] = {0, 0, 0} — semaphores
var byte timer[NUM_DELAYS] = {0, 0, 0} –internal delay timer
— 1 to 255 x the RTC tick of 1024 usec.
function Delay(byte in instance, byte in ms) return bit is
var bit timedout = false
if instance < Count (CheckDelay) then — only bother if it exists
if CheckDelay[instance] > 0 then
if timer[instance] < 1 then — 1st time ever call
timer[instance] = ms +1
elsif timer[instance] ==1 then — timeout
timer[instance] = ms +1
timedout = true
else
timer[instance] = timer[instance] -1
end if
CheckDelay[instance] = checkDelay[instance] -1
end if
end if
return (timedout)
end function
; the tasks.
procedure BlinkLed1() is
If Delay(0, 100) then — delay in 1.024ms steps
led1 = ! Led1
end if
end procedure
procedure BlinkLed2() is
If Delay(1,120) then
led2 = ! Led2
end if
end procedure
procedure BlinkLed3() is
If Delay(2,200) then
led3 = ! Led3
end if
end procedure
— The RTC always increments a Semaphore by 1 if it's time to do a task
— The Task always decrements the Semaphore by 1 when it has completed.
const TICK_INIT = 48 — 1024us = 1.024ms
var byte ticks = TICK_INIT — timer
procedure RTC() is
pragma interrupt
var byte instance
if INTCON_TMR0IF then
INTCON_TMR0IF = off — clear the timer 0 interrupt flag
ticks = ticks -1 — 256us counter
if (ticks < 1) then
ticks = TICK_INIT
for Count (CheckDelay) using instance loop
CheckDelay[instance] = CheckDelay[instance] +1
end loop
end if
end if
end procedure — end of ISR
— Main Program
block
–RTC setup
T0CON_T0CS =0 — TMR0 on internal clock
T0CON_PSA = 1 — prescaler
— so no prescaler for TMR0 (= default)
INTCON_TMR0IE = on — if your PIC freezes, move these lines
INTCON_GIE = on — to see if the ISR causes trouble
forever loop ;round robin tasks!
BlinkLed1()
BlinkLed2()
BlinkLed3()
end loop
end block
Choosing a PIC
10F, 12F, 16F or 18F?
Less than the 18F series has only 1 to 8 entries possible in Stack and only up to 350byte approximately of RAM. The 18F series has up to 31 stack levels (still a fixed Hardware stack, i.e. you can’t change the Stack Pointer to relocate it elsewhere in RAM).
There are about 10 models each of 6, 8 and 14 pin PIC in the 10F, 12F and 16F series.
The 18F series is not available in less than an 18 pin package. The lesser 10, 12 and 16 series are available in as little as 6 pins but no 8 bit HW multiply, no PUSH/POP and only 1 to 3 stack levels and less than 350 byte RAM. Most are 2K words to 4K words. The 18F are typically 8K words to 64k words. Because an instruction “word” is more than 8 bits but less than 16bits the PIC 10, 12, 16 series don’t store lookup tables efficiently. Most lookup tables use 8 bit or 16 bit values. The 18F uses a 16bit word so lookup tables are more efficient. There are hundreds of 18F models
JAL currently supports over 350 PIC from 10F, 12F, 16F and 18F family, including 18FxxJxx and 18FxxKxx devices
We don’t consider here the 17F as it’s obsolete version of the 18F. Nor do we consider the higher end dspPIC or 24F series as the ARM or PIC32 (really a MIPS core) is a better choice. Some people mistakenly think the 18F is a 16 bit cpu as Microchip refers to it as 16 bit core, but they do group it as an 8bit cpu. It’s less 16 bit than an 8bit Z80! The 16 bits is only the instruction size. All the PIC 10, 12, 16, 17 and 18 are 8bit CPU slightly similar to 8051 rather than 8080, Z80 or even 6502/6800 type family.
For 18 pins or more only consider the 18F family
http://www.microchip.com/ParamChartSearch/chart.aspx?branchID=1004&mid=10&lang=en&pageId=74
Selection and Parametric search of all 8bit PIC
http://www.microchip.com/stellent/idcplg?IdcService=SS_GET_PAGE&nodeId=2696¶m=en537796
Get JAL here http://code.google.com/p/jallib/ (Downloads on Right of page)
see also
http://groups.google.com/group/jallib
http://tech.groups.yahoo.com/group/jallist/
http://www.casadeyork.com/jalv2/
http://groups.google.com/group/jaledit
http://groups.google.com/group/jaluino
http://justanotherlanguage.org/