Author Topic: designing a bytecode interpteter (Read 8227 times)

Nuke · **on:** July 23, 2013, 09:31:28 pm

on my various radio control projects i have always had a problem with on the fly configuration. the job is mostly about taking data from an input (transmitted data from the radio receiver, sensor data, feedback from motor controllers, imu data, etc), doing some basic transformation, and then mapping it to an output (servo or motor controller, led, telemetry output, etc).

my first attempt was pretty much a state machine, it took its configuration from an eeprom, which could be configured over a serial terminal. this worked wonderfully but at a cost, the system was huge. hardly any flash was available for program code to drive more inputs and outputs and other interfaces. it worked with what i had hooked up at the time, but trying to expand on it was proving fruitless. when i tried adding an imu i just couldn't integrate it because there was no code room left.

so time for a paradigm shift. the mcu im working on has 1k of eeprom. i cant put compiled code there, but i can use that to store code for an interpreted language. so i went about designing an instruction format. what i ended up with is a 4-byte bytecode language. this lets me run a 256 instruction program. this is not a lot but its enough to do some basic mapping and even complex maths. i thought about dropping the instructions down to 3 bytes (using the extra 2 bits of the instruction to address registers) storing results to one of 4 registers. but then i realized you would need additional instructions to move data into and out of registers, making code size bigger, and it would only give me an extra 85 instructions over what i already had. so i decided to stick with the 4-byte layout.

| opcode | param1 | param2 | param3 |

i figure i need at least 64 instructions so opcodes can be represented with 6 bits. but for my initial run im just using a byte, wasting the 2 bits. parameters can be data or "pointers" to 16-bit memory locations. all memory is in a 16-bit integer format. the pointers need to be at least 7 bits. this addresses 64 16-bit words of ram + 64 i/o locations. like with the instructions there is an extra bit that i dont know what to do with. i will probibly reserve it for future expansion to 128 words (this would use up about an 8th of the 2k of ram on the mcu) and 128 i/o locations (i only need 56 right now, but i may have overlooked something).

the interpreter has no stack, and no cpu registers. the parameters can either contain a pointer, or constant. to make decoding a little bit easier i have created union that lets me look up each of the 3 parameters in a dozen or so formats, most of them look like this

| op | destination pointer | source pointer a | source pointer b |

this takes both sources, does an operation, like addition, and stores the value to the destination. though i have other formats that like this:

| op | destination pointer | data msb | data lsb |

which puts a 16 bit constant into memory,it looks something like this:

Code: [Select]

struct instruction{
 uint8_t opcode;
 union{
  uint8_t raw[3];
  struct{ interpPtr dptr; interpPtr sptA; interpPtr sptB; }ptr3;
  struct{ interpPtr dptr; int16_t data; }dptr_data;
 ...
 };
};

the instruction interpreter is just a big switch statement, with each instruction define and one line of code to handle that instruction. i have about 40 instructions so far. this is kind of a large switch so i might break it up into ranges find the range and then the instruction. i will have to experiment with this later to see what is faster. most instructions increment the fake program counter (with the exception of jumps), this allows the next instruction to be fetched from the eeprom (another performance limiter which i might need to devise a solution).

the cpu loop is just a while statement that will run so long as the program counter is less than 256, which is the maximum number of instructions that can be stored on the eeprom. at the start of the loop the program counter is set to zero and the loop begins. some instructions can terminate the program loop to end the program.

anyway the test compile revealed that the whole interpreter comes into about 4k, which is much smaller than the 20k+ state machine that i was using before. this leaves more space for code to do other functions (like drive more i2c devices and other interfaces).

will let you know when i write some bytecode programs and see how they run. following that i might try to come up with a compiler to make the whole thing easier to work with.

Nuke · **Reply #1 on:** July 24, 2013, 03:59:35 am

playing around with the interpreter i decided to expand my pointers to have 7 byte addresses, with the 8th bit slicing it between memory and system+i/o. this now gives me 128 words to work with. i also decided to reserve the upper 64 bytes of the eeprom for nonvolatile storage. this still gives me a 240 instruction program. i had intended to just store data in constants in the instructions, but that wastes space. so what i did was map it to the top 32 bits of the i/o address space, using the same 16-bit word format as the memory area. this gives me the following memory map:

Code: [Select]

-----0x00-----
|--reserved--| - 16 words/32 bytess
-----0x10-----
|-----i/o----| - 80 words/160 bytes
-----0x60-----
|---eeprom---| - 32 words/64 bytes
-----0x80-----
|----ram-----| - 128 words/256 bytes
-----0xFF-----

the reserved section represents the null pointer, and a number of system memory locations for status and control.

i decided that it might be possible that i would need more i/o than 64 words, so i added another 16 words on top of it, should keep my board happy.

the program may now read/write to the top 64 bytes of the eeprom. if the program needs to store any config data. the eeprom is kind of slow to write to, so this will be used sparingly.

extra ram is good too. i also went a head and set up defines so i can remap the memory map at will, such as if i move to a more spacious mcu.

next job is to add some fixed point instuctions. i used a lot of fixed point maths in my state machine. i kinda want to simplify things and stick to a single format, like s10.5 or s9.6. though s7.8 would probibly be the easiest to implement. fixed point is neccisary to get smooth scaling with small values. id like some math functions like trig stuff and sqrt, but those would require allocating a chunk of flash for the neccisary lookup tables. its doable but expensive (probibly adding another k or two to the size of the interpreter).

e

finished most of the fixed point instructions. i decided to make it a variable format system with conversion. a nibble of the cpu status flags now controls what format the fixed point instructions use. and most of the fixed point instructions will use that. there are a couple special instructions, one sets the format to any thing between 15:0 (same as int) to 0:15 (-almost 1 to almost 1, though its not possible to get to 1thanks to the absent of a ones place).

the other special instruction can convert any fixed point format to any other fixed point format. this means if you need to change formats, you can convert your numbers over. some of the instructions can operate on int and result in fixed point numbers of the current format. though it just occured to me i dont have an easy way to convert them back to int. but i got some space for a couple more instructions in there.

many of the fixed point operations will loose data if you get too close to the edges. especially things like multiplies and divides. i cast the operations as int32 and convert it back to int16 after they are done, to prevent myself from left shifting off any bits during the operation. since most of the operations shift it back anyway this should prevent some data loss, especially on the divides. i dont count the sign bit and consider another bit no mans land to prevent overflows, but this doesnt work for multiplies.

the whole thing is adequate because most of the data is usually 10 or 12 bit anyway. servo positions are 10 bit, motor controllers are usually 8 bit speed + 2 bit direction, and input from remote joysticks is also 10 bit. the imu might give me some problems, the gyro is 16 bit, and the magnetometer is 12 bit, and the pressure sensor is the worst at 19 bit pressure, 16 bit temperature, but i might just compute altitude in machine code and then pipe the altitude value to the i/o. the data to and from the radio will also operate in 16 bit chuncks.

thing i like about the radio is i can use the same i/o range for both transmit and receive. writes would be applied to the send buffer, and the reads will come from the receive buffer, which gives me more ios for other functions. i guess the next job will be to work on my radio protocol, i need to move a structure that is about 2x the maximum frame size, and i also need to have a code transfer mode. to write data to the eeprom. before i do that im going to actually verify that my interpreter can interpret code.

Nuke · **Reply #2 on:** July 25, 2013, 10:51:35 pm

now that i got the language i need a way to convert textual codes to byte codes. sorta like an assembler. but i decided i didnt just want an asmalike, i want to include some high level stuff like operators and some basic control structures (loops, conditionals, functions), and variable handling with qualification. i decided to stick with what i know and write the compiler in lua. its string manipulation facilities will no doubt come in handy. i tried working on the variable parsing code last night but my brain wasnt working. but i think i hammered out the jist of the way it will work.

you can use the var or romvar statement to create a named reference to a variable (in ram/rom respectively), which may or may not be initialized. this statement does a couple things, first thing it does is create a pointer in the appropriate memory range. i just have to count up addresses as vars are created and count down as they are destroyed (such as at the end of a control structure). if you initialize the variable it creates a ASN instruction with the data in place of the statement. initialized or not it can only be used in any instruction parameter that call for a pointer.

there will also be a con statement, while using the same format as var, will be required to be initialized. the usage of constants will copy into instructions by value and not by reference. so it behaves more like a #define statement as far as the compiler is concerned. both cons and vars are type qualified by the compiler somewhat, though the number of types supported by the language is going to be very small. the variable (or constant) is stored by the compiler in a table which stores the variable's name, its pointer address, its datatype, its value, whether or not it is initialized. its also stored in the table by named keys so its easy to look up. this should make it easier to assemble instructions.

need to code this up tonight and figure out how to qualify my language's data types in lua. im probibly going to need to move up lua 5.2 to gain access to bitwise operatons, or use a library, and i will probibly also need the pack library so i can import/export binary data. lua for windows should have all this stuff. once if finish writing the variable handling code (the hard part is figuring out how to convert variable format fixed point numbers to hex in lua), its just a matter of some basic parsing and i should be able to assemble basic programs. once that works and is tested i can start adding more features to the programming language.

Phantom Hoover · **Reply #3 on:** July 27, 2013, 12:25:36 pm

In bytecode aren't variables basically just glorified pointers anyway?

Nuke · **Reply #4 on:** July 27, 2013, 01:32:57 pm

in mine, yea pretty much. it really depends on the opcode though. each opcode can take up to 3 parameters:

command..parameter1..parameter2..parameter3 (where the command and each parameter is one byte).

most of the byte codes operate on 3 pointers. in the format p1 <- p2 (some operator) p3, though there are some that take a pointer and a constant, and others that take only constants. consider:

a=b+5

it doesnt make much sense to allocate memory for the 5. you would need another opcode just to load the 5 into memory. it is easier just to encode the data into a parameter (2 in case of a 16-bit constant) in the format:

command, pointer, datamsb, datalsb

this is also an inline operation since data is retrieved from a pointer, operated on, and then returned to the same pointer, so it looks like:

a = a (operator) b, where a is pointed to by p1 and b is the constant encoded into the bytecode. this is so i can operate on constants without having an extra instruction to load them into memory. oddly enough this is also the format used by the opcode to load data into memory. so it helps keep code size smaller. machine code for actual cpus take a few instructions to load data into memory, you usually load a constant into a register and them move the contents of the register to ram, and thats usually 2 or 3 (or more) instructions depending on architecture, though the other side of it is that machine instructions are more tightly encoded. my bytecodes are twice the width of the avr instructions, so its not as compact.

most of the commands use all the parameters, but there are a couple that dont take data at all. i considered doing variable length codes, but that would make the code size for the interpreter noticeably bigger, since i would need more complex decode logic. i want the whole thing to come in under less than 6k with the lookup tables for the trig and sqrt functions. its 3k now.

Phantom Hoover · **Reply #5 on:** July 27, 2013, 02:17:50 pm

Ooh, please tell me you're using the two bits left over in the opcode byte to encode the type of the operands.

Nuke · **Reply #6 on:** July 27, 2013, 09:00:04 pm

im not sure how many opcodes i will end up using (last i counted them it was like 40), it looks like it might be less than 64, but it might not be. but i dont see a the reason of encoding the types in the extra bits. every instruction takes one set of types for its 3 parameters. if i want the same instruction to use different types, i just make a new opcode. when the interpreter looks at an opcode, it just goes through a switch statement (though it would probibly be faster to use function pointers or something), and finds the one line of code that executes the instruction. i am currently using the spare bits to spread out the instructions into different conditional branches to speed things up. this is not optimal but it works for now. the "compiler" im working on will make sure you are putting the right data into the right bytecode and throw an error otherwise.

im also not yet precluding the possibility of reducing the size of the bytecodes. getting them down to 2 or 3 bytes would mean more code can fit in the eeprom. if i do this then the 2 bits will index operation registers. this lets me operate on two 16-bit types, or 4 8-bit types (including pointers), with just the opcode. while this mean less space taken up in rom, this does mean it would take more operations to move data into registers and into and out of memory. with a 2 byte code it now takes 2 operations to load 16 bits into memory. this has the potential of making the person writing the compiler very angry and should be avoided. but it might be faster. 3 byte code may be a happy medium. 2 parameters still lets me do inline operations, such as a=a+b instead of a=b+c. to load 16 bit data into memory i just need to load a constant into a register, and then move it into memory, or i can operate on it in place and forget about moving it anywhere else.

Polpolion · **Reply #7 on:** July 27, 2013, 09:39:26 pm

Quote from: Nuke on July 27, 2013, 09:00:04 pm

when the interpreter looks at an opcode, it just goes through a switch statement (though it would probibly be faster to use function pointers or something), and finds the one line of code that executes the instruction.

A good compiler (from what I've been told) can turn a switch statement into a binary search but using the opcode as an index for an array of function pointers should easily yield better CPI, provided you're doing what I think you're doing. I've always been yelled at for doing stuff like that but since this is embedded stuff you have more leeway with expressiveness.

Nuke · **Reply #8 on:** July 28, 2013, 12:51:14 am

its my first programming language. its not going to be very good.

while im pretty confident that the interpreter will work (compaired to the compiler its tiny), im not going to try running anything on it till i get the compiler to a somewhat usable state (it can assemble lines of text into opcodes and map memory to address space without error, things like control structures will come later). when i do it will just take setting up a timer in machine code to fire off an interrupt after a certain period of time, at which point the interpreter will be suspended and the number of cycles counted so i can compute an instructions/sec. i can then use that as the basis for testing of all the possible variations in the bytecode format, the interpreter itself and even he compilation process. im putting the science back into computer science.

EvidenceOfFault · **Reply #9 on:** July 28, 2013, 02:57:21 am

Quote from: Polpolion on July 27, 2013, 09:39:26 pm

Quote from: Nuke on July 27, 2013, 09:00:04 pm
when the interpreter looks at an opcode, it just goes through a switch statement (though it would probibly be faster to use function pointers or something), and finds the one line of code that executes the instruction.

A good compiler (from what I've been told) can turn a switch statement into a binary search but using the opcode as an index for an array of function pointers should easily yield better CPI, provided you're doing what I think you're doing. I've always been yelled at for doing stuff like that but since this is embedded stuff you have more leeway with expressiveness.

You mean a jump table? I assume the bytecode interpreter will be written in C(++?) and copied over to the device as a binary. A good C compiler will implement a larger switch statement as a jump table automatically, especially if the jump condition codes are about equidistant, and the code in each switched to block is about the same size.

Nuke · **Reply #10 on:** July 28, 2013, 09:18:11 am

its in c++, but i might optimize it in asm once i start testing it.

im kinda using the arduino environment for testing (which i think uses avr-gcc), but i will probibly use avrstudio in the final version. i will probibly just build the interpreter as a library that i can include in a project as needed (arduino or otherwise).

for those interested this is what the interpreter will run on:

hardware wise this is everything you need to control a self stabilizing (but not autonomous) r/c quad copter. radio is a nrf24l01+ module. its not very long range, but they sell a 1000m version with the same pinout at $20 a pop. this imu is a cheap chinese module, with a 3 axis accelerometer (ADXL345), a 3-axis gyro (L3G4200D), a 3 axis magnetometer (HMC5883L), and a pressure/temperature sensor (BMP085).

the board is just the prototype. its build on one of those radio shack stripboards. chip is an atmega 328p @ 16 mhz. its got a reset circuit, 3.3v regulator for the radio, headers for the radio, the imu, icsp programming header, and 4 servo headers. also has a couple jumpers to select master or slave operation on the i2c bus. thats pretty much it

will probibly etch a custom board later on, i figure it will have its own switch mode supply (i want it to 7-12 volts, down to 5, and then have a better 3.3v ldo supply for the radio module, which will be a higher power version of the one im using now), mcu will be in a surface mount package, with a faster clock, and more of the interfaces will be broken out. i was also looking at i2c eeproms and i can get a 256 kbit for a couple bucks, which would hold about 8192 bytecodes.

im kinda working on the gui for the compiler. its going to be fairly basic with an open button a compile button and an output log. the layout is kinda done i just need to write some file loaders. then i can test the compiler to see if its outputting properly formatted bytecodes.

EvidenceOfFault · **Reply #11 on:** July 30, 2013, 10:17:35 am

I've got a question about this project: Do you need to create a bytecode interpreter at all? Can't you simply stream the native machine instructions of the chip? After all, a vm/bytecode is usually used for platform independence (not necessary here), or security (like the ~~xbox one~~ original xbox's xcodes). You'd probably need a buffer on the device, and a loader to jump to it, but it seems easier than an interpreter.
Just an idea...

Nuke · **Reply #12 on:** July 30, 2013, 02:35:42 pm

the architecture is kinda weird. you cant run code from anywhere but flash. you can however build a bootloader and flash the chip on the fly. the bootloader would take a few k (especially if it needed the radio library in that part of the code), and then the code would also take up flash, precious flash which i need for all the possible input and output driver code, and lookup tables for math libraries. the bytecode interpreter lets me run code from other sources, such as the internal eeprom, but also an external one, and even an sd card (thought thats a little overkill for this project), directly, without rebooting the chip. it also uses up resources on the chip which would otherwise go unused.

i figure 90% of the bytecode will just be CPYs, moving data from the radio packet into one of the output channels, or one of the sensor channels going to the output channel for telemetry. but i could just as easily load up a pid controller if i need to. the interpreter itself isn't very complex. the build tools are really the hard part. they are almost working. one or two more functions and its mostly usable. i just need to play less ksp and do more code.

EvidenceOfFault · **Reply #13 on:** July 31, 2013, 10:50:48 am

Um. My brain just crashed on ASSERT sane_architecture.
Just to be safe, that system does have RAM, right? Cause you've only talked about EEPROM and FLASH so far, and now I'm kinda worried... After all, for networking you'd probably use something like a ring buffer?

I can't really imagine C++ being much more useable than C for that kind of memory model. Do you even have a C library in there? A heap?

Well, either you're a glutton for punishment, or that device is too awesome for words. I can't decide which.

Polpolion · **Reply #14 on:** July 31, 2013, 11:32:49 am

well if he's working in the arduino environment he probably has an AVR of some sort, which does indeed have the better part of the C library: http://www.nongnu.org/avr-libc/

C++ standard libraries are unsupported but since you're that strained on memory I can't see them being particularly useful anyway. (not to mention the nature of the task...)

Nuke · **Reply #15 on:** July 31, 2013, 06:26:37 pm

it has 32k flash, 2k sram, and 1k eeprom. this is actually the same chip used on a lot of the older arduinos.

old configuration scheme used up 20k flash, used up at least 1k of sram, and use 256 bytes of eeprom to save the configuration. it also required the serial library which is a load of i could do without

interpreter will use 6k of flash, < 300 bytes of sram, and all of the eeprom. this frees up code space for libraries to drive all the i2c, spi, and gpio modules i want to use.

at some point i will move over to arm mcus, but for know i use what i got. and what i got is mega328s, tiny2313s, tiny84s and tiny85s (certainly not using a tiny) and one mega1284 which is loaded with coll stuff but is physically too big for this application. i could buy new silicon but that would cut into my beer money.

i am a wrapping the interpreter into a class, but thats about it as far as oop goes. most of the other libraries are in the form of classes too. but the program code is mostly:

read radio in packet(s)
sample ALL the sensors
run the interpreter
write to ALL the output devices
write to the radio out packet and transmit

Tomo · **Reply #16 on:** August 01, 2013, 03:58:24 pm

The chip has a true Harvard architecture, it's perfectly sensible for a microcontroller-scale device.
- Actually, pretty much everything that size is Harvard.

Eg ARM Cortex M*, AVR, PIC etc.
- Only information physically stored in Program Memory can be executed. Everything else is just data.

That's why Nuke's looking at bytecode - the AVR simply cannot execute anything not stored in its internal Flash, but a bytecode interpreter could load its bytecode from anywhere.

The larger ARMs and x86 are a Modified Harvard (crossed with Von Neumann), where external Program Memory and Data Memory are the same thing but program and data caches are different.

That said - I'd suggest changing MCU to an ARM. There are a lot of ARM Cortex M*s that are cheap ($2-5) and have really huge onboard Flash and SRAM.

Nuke · **Reply #17 on:** August 04, 2013, 08:32:25 pm

im thinking about getting an arm dev board. theres the arduino due (though its a tad big for what i want to do) and about 50 other arm dev boards out there. there are a few dip packaged arm chips that would be useful to me. so i will go there eventually. having a 32 bit processor makes things easier for me. i can increase the resolution of my fixed point maths somewhat. which will be neccisary for the intense physics maths i will need to use for anything fancy. but for now i will stick to the avrs i have on hand.

Tomo · **Reply #18 on:** August 05, 2013, 01:50:10 pm

I've used the MBed range for a few prototypes, it's definitely the easiest dev kit to get into that I've ever seen!
(30sec from unbox to 'blinking lights')

The LPC11U24 is a rather short on Flash in my opinion (more than half of it vanishes into the mbed libraries), but the LPC1768 is really quite nice.

I've not used the new FRDM-KL25Z, but if it'd been available when I bought mine last year I'd have leapt at it!
- 48MHz core, 128KB FLASH, 16KB RAM. Onboard 3-axis accelerometer... For £10.

I paid more than that for my accelerometer/gyro board alone!

[Edit] Blimey, it's even Arduino-shield compatible. Hadn't realised that! Freescale, you beauties!

Mika · **Reply #19 on:** September 10, 2013, 02:05:34 pm

Holy cow!

How did I miss this thread?

Anyways - I suppose it will be easier to do this in C language, but my original thought would have been to stick with Assembly and the provided instruction set and ditch the bytecode interpreter.

Since it's over a month, how has your interpreter progressed?

News:

Author Topic: designing a bytecode interpteter (Read 8227 times)

designing a bytecode interpteter

Re: designing a bytecode interpteter

Re: designing a bytecode interpteter

Re: designing a bytecode interpteter

Re: designing a bytecode interpteter

Re: designing a bytecode interpteter

Re: designing a bytecode interpteter

Re: designing a bytecode interpteter

Re: designing a bytecode interpteter

Re: designing a bytecode interpteter

Re: designing a bytecode interpteter

Re: designing a bytecode interpteter

Re: designing a bytecode interpteter

Re: designing a bytecode interpteter

Re: designing a bytecode interpteter

Re: designing a bytecode interpteter

Re: designing a bytecode interpteter

Re: designing a bytecode interpteter

Re: designing a bytecode interpteter

Re: designing a bytecode interpteter