on my various radio control projects i have always had a problem with on the fly configuration. the job is mostly about taking data from an input (transmitted data from the radio receiver, sensor data, feedback from motor controllers, imu data, etc), doing some basic transformation, and then mapping it to an output (servo or motor controller, led, telemetry output, etc).
my first attempt was pretty much a state machine, it took its configuration from an eeprom, which could be configured over a serial terminal. this worked wonderfully but at a cost, the system was huge. hardly any flash was available for program code to drive more inputs and outputs and other interfaces. it worked with what i had hooked up at the time, but trying to expand on it was proving fruitless. when i tried adding an imu i just couldn't integrate it because there was no code room left.
so time for a paradigm shift. the mcu im working on has 1k of eeprom. i cant put compiled code there, but i can use that to store code for an interpreted language. so i went about designing an instruction format. what i ended up with is a 4-byte bytecode language. this lets me run a 256 instruction program. this is not a lot but its enough to do some basic mapping and even complex maths. i thought about dropping the instructions down to 3 bytes (using the extra 2 bits of the instruction to address registers) storing results to one of 4 registers. but then i realized you would need additional instructions to move data into and out of registers, making code size bigger, and it would only give me an extra 85 instructions over what i already had. so i decided to stick with the 4-byte layout.
| opcode | param1 | param2 | param3 |
i figure i need at least 64 instructions so opcodes can be represented with 6 bits. but for my initial run im just using a byte, wasting the 2 bits. parameters can be data or "pointers" to 16-bit memory locations. all memory is in a 16-bit integer format. the pointers need to be at least 7 bits. this addresses 64 16-bit words of ram + 64 i/o locations. like with the instructions there is an extra bit that i dont know what to do with. i will probibly reserve it for future expansion to 128 words (this would use up about an 8th of the 2k of ram on the mcu) and 128 i/o locations (i only need 56 right now, but i may have overlooked something).
the interpreter has no stack, and no cpu registers. the parameters can either contain a pointer, or constant. to make decoding a little bit easier i have created union that lets me look up each of the 3 parameters in a dozen or so formats, most of them look like this
| op | destination pointer | source pointer a | source pointer b |
this takes both sources, does an operation, like addition, and stores the value to the destination. though i have other formats that like this:
| op | destination pointer | data msb | data lsb |
which puts a 16 bit constant into memory,it looks something like this:
struct instruction{
uint8_t opcode;
union{
uint8_t raw[3];
struct{ interpPtr dptr; interpPtr sptA; interpPtr sptB; }ptr3;
struct{ interpPtr dptr; int16_t data; }dptr_data;
...
};
};
the instruction interpreter is just a big switch statement, with each instruction define and one line of code to handle that instruction. i have about 40 instructions so far. this is kind of a large switch so i might break it up into ranges find the range and then the instruction. i will have to experiment with this later to see what is faster. most instructions increment the fake program counter (with the exception of jumps), this allows the next instruction to be fetched from the eeprom (another performance limiter which i might need to devise a solution).
the cpu loop is just a while statement that will run so long as the program counter is less than 256, which is the maximum number of instructions that can be stored on the eeprom. at the start of the loop the program counter is set to zero and the loop begins. some instructions can terminate the program loop to end the program.
anyway the test compile revealed that the whole interpreter comes into about 4k, which is much smaller than the 20k+ state machine that i was using before. this leaves more space for code to do other functions (like drive more i2c devices and other interfaces).
will let you know when i write some bytecode programs and see how they run. following that i might try to come up with a compiler to make the whole thing easier to work with.