Author Topic: z64 designs an RTOS; madness, awesomeness ensues (Read 3168 times)

z64555 · **on:** August 15, 2012, 08:02:43 pm

About a year ago, I failed at making a fully functional quadrotor. At that time, I determined that I hadn't created a good enough operating system for the self-stabilization, and set upon a quest to make my very own real-time operating system.

Goals:
-Single Application, Multi-processing, Multi-tasking operating system
-Able to execute tasks in "parallel"
-Written in ISO C 9899 (C99) for maximal compatibility across MCU systems, compilers.
-As immune to ****ups as much as possible

As for the C99 requirement, I'm pretty much limited to using C99 vs. C++ or even Java because of the chip I'm using (The HCS12). And before you say "Get a MCU that's in this century, asshole," I must point out that the University I graduated from is using the HCS12 as its ~~torture device~~ learning tool for EEEN's and CSEN's alike, and wish to get them involved in the world of ~~hurt~~ embedded systems programming.

Sooo. yeah. So far I've gotten some designs for the kernel itself. It's likely going to be a cross between an Exokernel and a Hybrid kernel, due to the fact that I haven't figured out the framework to use for the hardware interface just yet.

There shall be two lists that the kernel maintains: a Process list (which will handle the objects/systems) and a Task list (which will handle the threads/functions/methods).

The process list will be prioritized to some extent, while the task list is non-prioritized. Both shall be a round-robin style in implementation.

Now, one thing that I may be taking too far/seriously, is the ability for processes to have a set of tasks to be run in parallel. In this case, the tasks that are "parallel" fill up the task list when they can. The tasks in the process that need the "parallel" tasks to be complete first has to wait until they are.

Usage of void pointers appears to be inescapable now.

Nuke · **Reply #1 on:** August 15, 2012, 08:37:47 pm

ever thought of using a parallax propeller. its essentially a multi-core mcu. i also saw an arm chip in a dip 28 package that i wanted to try out.

z64555 · **Reply #2 on:** August 15, 2012, 09:25:22 pm

Regardless of the MCU I select, it's going to need some form of OS in order to run the program I have in mind. ArudPilot got away with it by setting up a whole bunch of loops that run at different times, but it has no preemption abilities for its tasks (since they're rolled into a single thread) and any crash in the system means that the COP timer has to reset the entire thing.

Granted, I haven't heard of any particular catastrophic failures with ArduPilot, but I'm still going the route of making an RTOS (since I should be able to use the same RTOS for other things).

Nuke · **Reply #3 on:** August 15, 2012, 10:00:56 pm

prop just sounds like a good chip for this kind of system. you could dedicate a bunch of cogs to just keeping the imu samples up to date. while others can manage motor control. but the chip's architecture is so exotic that code is unlikely to be very portable. but yea having an os that you can compile for whatever chip you have on hand is a good idea.

you could probibly use software driven interrupts (if your mcu supports this) to run priority tasks. so if process a needs to complete to start process b then you could have process a trigger an interrupt when it completes and use the isr to start the next process. of course that kinda thing depends on what the hardware can do (such as the number of software interrupts you can use). you have 3 processes dealing with each sensor in an imu (accelerometer, gyro, magnetometer) whos job is simply to read and log the data from each sensor. and then another process that uses the data from that report to update the current position and orientation of the vehicle, which all 3 of the read and log processes to be complete before running. you could use some kind of conditional priority system whereby the process is only run when all 3 parts of the imu have been logged. so when each process completes it triggers a software interrup, where the isr determines which process is next. and there the position computation process only runs if all 3 of the imu read processes completed successfully, otherwise it is skipped and goes on to the next process. of course idk ive never though about os design all that much.

z64555 · **Reply #4 on:** August 16, 2012, 02:33:44 am

Yeah, the SWI command will most likely be used, along with the RTI module to provide a OS clock signal by which the kernel times everything by. Sure, it's not particularly accurate to the nanosecond, but it's good enough to the millisecond (I think at one time I managed to set it up to be 1.024ms, or something like that). There's also this thing about external clock sources (like a RTC chip), wherein you can hook it up to one of the pins of the MCU and setup to interrupt whenever that pin changes it's state.

The OS will definitely make usage of the ISR and interrupt vector, although, again, at this point I haven't figured out how the hardware interfaces will work with the kernel (other than act like a high priority process).

Nuke · **Reply #5 on:** August 16, 2012, 04:59:51 am

how critical is an rtc for a quad copter? can you just get by with a good oscillator and a timer? not really an os question i was just kinda curious how accurate your time log has to be to get the imu to work right.

z64555 · **Reply #6 on:** August 16, 2012, 09:52:53 pm

Quote from: Nuke on August 16, 2012, 04:59:51 am

how critical is an rtc for a quad copter? can you just get by with a good oscillator and a timer? not really an os question i was just kinda curious how accurate your time log has to be to get the imu to work right.

I don't think having a dedicated RTC is critical, you should be able to get by with any sort of oscillator/timer/software setup.

But then again, that really depends on the size of your quad... I've read that quad's larger than 2ft or so in diameter have a bit of trouble being stable with a 50Hz loop, while the so-called micro-quads are really stable with the same freq.

So I've come to realize that not only void pointer must be invoked, but also function pointers... that have a void pointer as an argument...
and even (heaven forbid) a void pointer as a return.

The function pointers will have to be used when the kernel needs to start a new thread. Each process will have an internal 2D const array of func pointers that will point to the tasks that the process needs done. The kernel will grab one or more func pointers from the process, and load them into the task list, and from there the kernel will begin to run them.

So far, this is what I've come across regarding function pointers:

Code: [Select]

//Function Pointer declaration
return_type ( *func_point) ( argument_type, argument_type );

//Function Pointer assignment
//Note: the function that the pointer is pointing to must have the same return and argument types
//Note2: unless the void * is used as its argument type...
func_point_name = &function;  //preferred
func_point_name = *function;  // also possible, but not preferred

//Execute function that the pointer is pointing to...

(*func_point) ( 0 , 0);

Another tricky part I have to overcome is how to perform context switching on the task list: wherein, the kernel has to save all the states of the registers, including the program counter, and save them somewhere until the thread is allowed to execute again.

[Edit]
I'm now split on whether to design the OS to be multi-tasking, or just multi-processing. Multi-tasking OS's can pause a thread if they haven't been completed by a deadline, and then swap the context (all of the CPU registers, essentially) so that it can start processing another thread... whereas Multi-processing OS's can't (besides the normal hardware interrupt threads).

This indecision largely lies on the context swapping: for multi-tasking RTOS's, this is pretty much routine... and unfortunately it's also costly for CPU's that have slow clock (since it takes at 2 cycles per register to swap their contents). Additionally, it's a bit more complicated than with a multi-process OS, because I'd have to write the actual swapping routine in assembler (C, nor C++, does not have high-level commands to manipulate registers).

z64555 · **Reply #7 on:** August 27, 2012, 03:18:22 pm

Well, I've settled on multitasking, because even if I where to do a multiprocessing OS, I'd still have to do a context swap whenever a process was paused/canceled.

So, a bit of news on context swapping:

Variables: I might not actually have to save the status of variables, since for the most part they're stored onto the stack until they're no longer needed. However, I might have to worry about them whenever the stack size is different from when the thread was exited, because as far as I know, the compiler tries to stick in as many constants as it can.

Interrupts:
There's no doubt about it, I have to use interrupts in order to access the program counter position, the good thing is that interrupts on the MCU I'm working with automatically save the statuses of the registers in addition to the program counter, so it's just a matter of using some pointer arithmetic to stash the registers into a structure per thread. I have two ASM commands that I can use to achieve a software driven interupt: SoftWare Interrupt (SWI) and the unimplemented Opcode trap (TRAP)

SWI:
It does as it's marketed: saves the registers onto the stack and the program counter before jumping to the SWI interrupt service routine. An RTI (Return from Interrupt) instruction restores the registers and program counter, thereby returning to wherever it jumped from.

TRAP trapnum:
Exactly the same as SWI, but is automatically done whenever a unimplemented opcode is happened upon by the CPU. Some CPU's may have an individual ISR per TRAP, but it's more likely they have just one for all of them. In any case, the trap ISR should try to recover the CPU from the errant thread. In simpler programs that don't have an OS, this can mean forcing a software reset or immediately stopping execution (if it is in a debug mode).

For programs that do have an OS, it may be possible to terminate the thread/process that's caused it to wander and signal that and error had occurred over communications lines and/or output devices. In a debug mode, it may also be able to pull out the stack contents of that errant thread and shoot them over to the monitoring PC or device.

OK, with those points in mind, I plan on using the SWI command to kick out execution of whatever thread it is in. This well be used for functions that manipulate thread/process behavior such as pause_thread, stop_thread, etc. and will most notably used during times that a thread needs to wait on hardware for data. i.g. :

Code: [Select]

void pause_thread( void )
{
  // Some other stuff to tell the kernel to only pause this thread, such as a semiphore or flag of some sort
  asm( SWI );
}

void Thread_Foo( void )
{
  start_atd_converstion();
  while ( !atd_is_done )
      pause_thread();
}

As for the unimplemented opcode trap, that's been put on the "To Do" list for now.

[Update]

Ok, here's an example structure of the CPU context: (The order, from top to bottom, is in the same order that the MC9S12 saves the context when an interrupt occurs)

Code: [Select]

struct Thread_context
{
  uint16 PC;
  uint16 Y;
  uint16 X;
  uint16 D;     // (B:A), where B and A accumulators are 8bit
  uint8  CCR;
}

I don't know at this moment whether I should include the Stack Pointer or not, and I'm also not sure whether this should be in C or in ASM... but I'll think about it for some time.

Nuke · **Reply #8 on:** August 29, 2012, 11:23:02 am

seems like this kinda thing would make your code very cpu specific. you would have to rewrite some important sections to use for other architectures. of course thats nothing that cant be done with a few #ifdefs.

i take it the os would be compiled and flashed onto the cpu, and then your tasks would then be loaded as separate "files" with some kind of crude filesystem. im not sure if your mcu allows you to load precompiled programs from an external memory device like an sd card. avr can kinda do that with a bootloader. so i would assume you have a bootloader built in which could transfer your applications to local memory for execution. is that kind what you are going for?

z64555 · **Reply #9 on:** August 29, 2012, 06:57:35 pm

There's just no way to make the OS independent of the CPU architecture, especially if you're making one that does context swapping. This is most high level languages don't have a high level mechanism to access the registers themselves. C and C++ provides the __asm keyword so you can inline asm commands, but again that's still machine specific.

My RTOS is going to be a single-application, multi-processing, multi-tasking OS. It will be statically linked/compiled along with the application it will be running, and main() will most likely be the boot-up process, ending with something like "run_application()"

I'm not going to mess with making a file system now, that's its own headache.

[Edit 2012-9-1]
Something that came to my attention was the matter of stack management per thread, which was the result of me wondering whether or not I should include the stack pointer along with the registers.

Put simply, the traditional programming model assumes that there's only going to be one thread/process/application that will have access to the stack at any given time, and therefore allows said thread/process/application to modify the stack to their liking.

Multi-application/process/threaded operating systems have to have some method of maintaining the stack, either by enforcing strick memory usage/allocation runs on program development or by provided a service/mechanism to do this automatically.

...Since I'm not a big enough dueche to play RAM nazi, I'm going to have to come up with some method of stack management.

So far, I'm looking at two options:

1. Partions:

The RAM is divided up into partians, and each partition contains stack space and a space for register context.[/li]

Pro's:
- Simple to encode
- Sufficient for programs that are mostly inlined
Con's:

Wastes lots of memory when programs/threads hardly use their stack
Programs/threads that are heavily sub-routined will rapidly overflow their stack and cause errors

2. Contengeous

The register contexts are saved as an array of structures, and do not include a stack space. Therefore, programs have full access to the stack and can use any amount that they wish.

Pro's:
- Programs/threads can be as sub-routinized/spaghetti coded as much as they like
- Can achieve optimal memory usage efficiency
Con's:
- Wastes memory when programs/threads need to reallocate stack resources than when first instantiated
- Not simple to encode

Research into C's malloc, calloc, realloc, and free functions are needed.

P.S. The first con for the contigious memory reminded me of an odd requirement by strict ANSI-C - The requirement essentially stated that ALL variables used within a function be immediately declared in the opening lines of the function definition, including variables and iterators that are to be used "only" within a for or while loop further down. Dumb compilers might try to perform a stack reallocation when they reach these loop variables...

Nuke · **Reply #10 on:** September 01, 2012, 05:23:34 pm

you kinda have to be a ram nazi on some of these microcontrollers. i did that with my seven segment array firmware. i tested it on an atmega328 (on an arduino, but using none of the ardiono api aside from the bootloader), but the target chip will be an atitiny2313.i got it down in size where the core functionality fits but i have to change firmware to switch the communications interface from serial to i2c and vise versa. 512 bytes of ram doesnt really go very far, and i can only afford one buffer or the other. il need to come up with some kind of shared buffer if im going to include both interfaces, which means hacking ripping and tearing and merging the libraries. flash space is also an issue, but i can just use the eeprom for some of the static data.

have you established a memory map for your system yet? it helps to draw a line in the sand between the os and the stuff running on it, and then partition the remaining space aaccording to the requirements of your tasks. one idea you could use is to allocate memory in blocks. im not sure of what kind of memory you have but say you make a block of say 64 bytes. this block would actually be part of a structure, and would have a header to identify process assignment and have pointers to associated blocks (blocks could be chained together to form a memory space for each task). so assuming a 24 bit memory address space thats 6 bytes for the pointers and probibly a byte for process id and a byte for any flags you might need. so that means a 72 byte block.

every program starts with one block and would allocate memory as needed. if it needs more memory it would ask the os to borrow another block, and if there are blocks available the os would give it another one, otherwise pass an error. in the event of an error your tasks would need to be tolerant of this, possibly halting operations until a block is freed to continue. a program keeps the block until the task frees it. you would likely have your garbage collector check to see how much memory a task is actually using, and if its smaller than the size of all your blocks, it would move things around so it could free a block (and making sure all your pointers are remapped as their data are moved), so that task dont whore up blocks that are mostly empty. a large block size would be more efficient. and the number of available blocks would limit the number of concurrent tasks. if memory is an issue you can make blocks smaller, and smaller memory addresses means less header. you can store process id or flags in unused bits of the pointers (k thats seriously being a ram nazi).

z64555 · **Reply #11 on:** September 05, 2012, 07:11:44 pm

Quote from: Nuke on September 01, 2012, 05:23:34 pm

have you established a memory map for your system yet? it helps to draw a line in the sand between the os and the stuff running on it, and then partition the remaining space aaccording to the requirements of your tasks. one idea you could use is to allocate memory in blocks. im not sure of what kind of memory you have but say you make a block of say 64 bytes. this block would actually be part of a structure, and would have a header to identify process assignment and have pointers to associated blocks (blocks could be chained together to form a memory space for each task). so assuming a 24 bit memory address space thats 6 bytes for the pointers and probibly a byte for process id and a byte for any flags you might need. so that means a 72 byte block.

I need a least 9 bytes in reserve for register state saves (2 bytes each for D, X, Y, SP, 1 byte for CCR/control flags), but I also need a little more space for any variables allocated during ISR's.

The MC9S12DG256 has 18K of RAM, but I'm more certain than not that some of it is used up by the debugger/bootloader. In fact, that thing is very likely to get in the way of the OS.

Linked sectors, although novel, isn't possible to do in C/C++, simply because the CPU has no way of determining the sector bounderies outside of the kernel. Segmented/partitioned memory system is still possible, but threads can only use/be assigned sectors that directly next to each other.

The only way to determine if a thread has gone past or is still in it's RAM sector is by doing a pointer comparisons. M$'s threading system gets past the issue of stack collision by inserting a buffer sector in between thread stacks. This way, the OS can determine when the thread has overflowed it's stack, and can safely reallocate the threads as needed.

Garbage collection should be done during the page/sector reallocation time, as it's operation is essentially identical (copying the contents of one sector to others) and expensive.

Quote

every program starts with one block and would allocate memory as needed. if it needs more memory it would ask the os to borrow another block, and if there are blocks available the os would give it another one, otherwise pass an error.

The only way the program can ask the OS for more memory is via a function such as malloc or new, which unfortunately both are vaguely defined by the C/C++ standards and are up to the compiler's to do as they please. This is perhaps why protocols such as MISRA, JSF AV Rules, and NASA's JPL C rules advise against their use outside thread creation. If you want to make sure that such functions don't **** up the memory system, then your going to have to make your own malloc implementations. (Which in my case I'm hoping that I won't have to come to that).

z64555 · **Reply #12 on:** October 18, 2012, 06:16:51 pm

Pardon the

Progress has been stymied a bit due to life dramas that will be kept offline. (oh boo hoo

)

I did find out that there's a possible way of determining the memory requirements per process/task/thread/etc. and that's by making liberal use of the sizeof() function in conjunction with a semaphore in the scope of the parent application/process/task. As discussed before, having a fixed memory allocation scheme for each application/process doesn't use memory efficiently, and heavy weight applications/processes (those that have higher memory needs) run the risk of overflowing with such a scheme.

For a single-application OS, this means that each process contains a semaphore that keeps track of its memory requirements. This semaphore is used in conjunction with a pair of macros that are used within each task. Each task, then, uses these macros to increase value of the semaphore when it is instantiated, and decrease when it is completed.

The kernel then checks this semaphore against the amount of memory it has allotted for that process, and allots more memory if needed. Since processes are often a loop of some sort, the kernel can determine the maximum value of the process's semaphore and more or less lock-in their memory requirements. Obviously processes that have a dynamic memory requirement should have a label of some sort and be given a wider buffer space than others. (The buffer space is there to prevent threads from accidentally spilling out of their allotted space, which is often a very, very bad thing).

[Edit] Now that I'm thinking a bit more about it, the macro that increases the semaphore may be trapped so as to schedule an allotment by the kernel as soon as the task is finished processing within its timeshare. If the semaphore exceeds the memory allotment and is dangerously close to exceeding the memory buffer, then the thread is immediately paused and the kernel performs the allotment. Once done, the thread is resumed.

[Edit2] Yet another possible idea is to play around with a PID scheme, or otherwise a predictive algorithm that mathmatically monitor's the process/task memory needs and adjusts allotments that way.

News:

Author Topic: z64 designs an RTOS; madness, awesomeness ensues (Read 3168 times)

z64555

z64 designs an RTOS; madness, awesomeness ensues

Nuke

Re: z64 designs an RTOS; madness, awesomeness ensues

z64555

Re: z64 designs an RTOS; madness, awesomeness ensues

Nuke

Re: z64 designs an RTOS; madness, awesomeness ensues

z64555

Re: z64 designs an RTOS; madness, awesomeness ensues

Nuke

Re: z64 designs an RTOS; madness, awesomeness ensues

z64555

Re: z64 designs an RTOS; madness, awesomeness ensues

z64555

Re: z64 designs an RTOS; madness, awesomeness ensues

Nuke

Re: z64 designs an RTOS; madness, awesomeness ensues

z64555

Re: z64 designs an RTOS; madness, awesomeness ensues

Nuke

Re: z64 designs an RTOS; madness, awesomeness ensues

z64555

Re: z64 designs an RTOS; madness, awesomeness ensues

z64555

Re: z64 designs an RTOS; madness, awesomeness ensues