Actions

Difference between revisions of "Design of sxgo"

From Just in Time

Line 118: Line 118:
 
However, since I explicitly do not want to write the decoder by hand and case statements like the above one are hard to produce with template metaprogramming, I've chosen a slightly less efficient, but otherwise very similar approach: a binary decision tree. The implementation resembles the one above. Instead of creating nested case-statements, nested if-statements will test one bit at a time and as soon as a decision can be reached, the right member function will be called.
 
However, since I explicitly do not want to write the decoder by hand and case statements like the above one are hard to produce with template metaprogramming, I've chosen a slightly less efficient, but otherwise very similar approach: a binary decision tree. The implementation resembles the one above. Instead of creating nested case-statements, nested if-statements will test one bit at a time and as soon as a decision can be reached, the right member function will be called.
  
For example, suppose we have a simple microcontroller that uses three-bit instruction words and that nows only three instructions:
+
For example, suppose we have a simple microcontroller that uses three-bit instruction words and that knows only three instructions:
 
{| border="1"
 
{| border="1"
 
|-
 
|-

Revision as of 22:08, 31 October 2010

This page describes the main design ideas behind sxgo. The implementation of this microcontroller emulator revolves around a few main ideas:

  • a meta-programmed list of instructions
  • using meta programming to create a binary decision tree of those instructions for use in an instruction decoder.
  • precompilation of instructions into 'function pointers'.

I presume that the reader is familiar with the C++ programming language. Note that sxgo makes use of Template metaprogramming wherever suitable. Readers that are not familiar with MP may want to look through some of the examples in the wikipedia link...

Instruction Implementation

implementation of the SX emulator

The idea is to have one class that consists of an SX 'state' (ram, rom, special registers, program counter) and that implements the instruction set. See the snippet (of file sx_controller.hpp)below:

<source lang="cpp">

struct clr_fr { /*...*/ }; struct clr_w { /*...*/ }; struct mov_w_not_fr { /*...*/ }; // etc...

struct sx_controller_impl {

   // snip...
   // note that this is simplified source code, this does not set any 
   // flags yet.
   void execute( const clr_fr &, int arg_register)
   {
   	ram( arg_register) = 0;
   }
   void execute( const clr_w &)
   {
   	w = 0;
   }
   void execute( const mov_w_not_fr &, int arg_register)
   {
   	w = ~ram( arg_register);
   }
   // etc...

}; </source>

As you can see, most instructions become very simple member functions and their implementation is quite readable. You may notice something odd: instead of naming member functions clr_frw or clr_w, they're all called execute and take some--otherwise ignored--first argument of some type with names like clr_frw. The reason for this is best explained after some explanation of the implementation of the instruction set and the instruction decoder. It has all to do with the fact that the instruction set is defined independently of it's implementaton and in fact there is more than one implementation of the instructions.

Alternative instruction implementations

Once we've decided the interface for SX instruction implementations, it becomes relatively easy to make different implementations. Sxgo defines several implementations, such as one implementation that does nothing more than return a string representation of the instructions (a disassembler), or one that 'compiles' instructions into a representation that will execute faster (a precompiler). All of these implementations can be fed through a single instruction decoder implementation.

Instruction set

Defining the SX instruction set boils down to two task: describing each SX instruction, specifically their bit patterns, and secondly compiling those instruction descriptions into a list of all SX instructions.

Instruction definitions

As could be seen in the previous Section, the implementation of the sx instruction set was all in terms of an execute method that is overloaded for several types, each type representing one instruction. Defining the instruction set becomes therefore a matter of defining a set of types, one for each instruction. These are the instruction tag types and the execute method is similar to a technique called tag dispatching.

In order to facilitate instruction decoding, the instruction types will be enhanced with information about their bit-patterns. This is in contrast with normal tag dispatching, where the tag types are completely empty types. The instruction list for sxgo can be found in sx_instruction_list.hpp and has the following form: <source lang="cpp"> // ... // define some operand (argument) types. struct register_ : masked_argument< 000011111> {}; struct bit_ : masked_argument< 011100000> {};

// ... // define some instruction word patterns struct clr_w : word< 000001000000> {}; struct add_w_fr : word< 0001110, register_> {}; struct sb_fr_bit : word< 0111 , register_, bit_> {};

// ... </source>

The code above encodes the following information:

operands (masked_argument<...>)
The code shows two operand types (there are more). In an SX microcontroller, operands are always encoded in the same position, registers are encoded in the least significant 5 bits of the instruction word and bit-positions (0-7) are encoded in bits 5, 6 and 7 of an instruction word.
word bit patterns (word<...>)
As can be seen above, the clr w instruction takes no arguments and is recognized by the bit-pattern "000001000000" (SX instructions are always 12 bit words). The add w, fr instruction takes one argument (the register, or ram location where the value can be found to add to the w register). Finally, the sb fr.bit instruction (set a single bit in a ram location) takes an 'fr' argument (the ram location) and a bit number. Note also that, although the bit number is more to the 'left' in the instruction word, the register argument is mentioned first for this instruction word.

Another way of describing this is, that the sb_fr_bit instruction codes an instruction word that looks like this: 0111bbbrrrrr, where bbb represents the bit number and rrrrr represents the ram location.

List of instructions

Before we go deeper into the definition of the word and masked_argument templates, I should first mention that it is not enough to define the structs as written above. For the decoder to do it's work, there must exist a list somewhere that enumerates all instruction words. This list is the true definition of the sx instruction set as it is used by the decoder: <source lang="cpp"> struct sx_instruction_list { // define a list of sx instructions. typedef mpl::vector< iread, mov_w_m, mov_m_w,

// etc, etc, ...

movsz_w_inc_fr > instructions; }; </source>

Instruction decoder

As stated before, it is the task of the instruction decoder to take an instruction word (a number) and based on the value of that word, call an execute function with the right parameters. SX instructions are all 12 bits, and those bits are typically divided in a number of opcode bits (the instruction) and some operand bits (the arguments). The number of opcode bits is not the same for every instruction. The opcode for "jmp" for instance has only three bits (and they have the values 101), the other nine bits are reserved for the 9-bit address to jump to.

There are several ways to go about implementing an instruction decoder. One could group all instructions according to opcode size and start checking for the shortest opcodes first, and then, if no opcode was recognized start looking in the next group of opcodes, etc. This would look somewhat like this: <source lang="cpp"> void decode( int instruction, implementation &impl) {

   switch (instruction & 0xf00) // look at top 4 bits
   {
       case 0x800: impl.execute( retw(), ...); break;
       case 0x900: impl.execute( call(), ...); break;
       // etc...
       default:
           switch (instruction & 0xfe0) // look at top 7 bits
           {
                case 0x320: impl.execute( rr_fr(), ...); break;
                // etc.
           }; break
   }

} </source>

Decision tree

However, since I explicitly do not want to write the decoder by hand and case statements like the above one are hard to produce with template metaprogramming, I've chosen a slightly less efficient, but otherwise very similar approach: a binary decision tree. The implementation resembles the one above. Instead of creating nested case-statements, nested if-statements will test one bit at a time and as soon as a decision can be reached, the right member function will be called.

For example, suppose we have a simple microcontroller that uses three-bit instruction words and that knows only three instructions:

instruction semantics
1xx store xx in accumulator
00b reset bit b of the accumulator
01b set bit b of the accumulator

Then a decision tree in the form of if-statements would look like this: <source lang="cpp"> void decode( int instruction, implementation &imp) {

   if (instruction & 4) // test bit 2
   {
       imp.execute( store(), ...);
   }
   else
   {
       if (instruction & 2) // test bit 1
       {
           imp.execute( setbit(), ...);
       }
       else
       {
           imp.execute( resetbit(), ...);
       }
   }

} </source>

Decision tree metaprogram

Meta programming is employed to translate a list of instructions into a decision tree. The end result is a type that is build up of two types of templates:

decide_nodes
These act as branches of the tree.
call_nodes
these act as leafs of the tree.

The algorithm recursively splits up the instruction set and works as follows:

  1. start decoding at bit number d (discriminator) = 11 (the most significant bit) and the full instruction set
  2. for a given bit number, d, partition the instruction set into those instructions that have a '1' at position d, and those that have a '0'. This is done in instruction_decoder.hpp in the following lines:

<source lang="cpp">

   // split the instructions into two sets: the ones that have a zero at bit 'discriminating_bit' and the
   // ones that have a 1 in that position.
   typedef typename mpl::copy_if< instructions, has_at< discriminating_bit, 0>, mpl::front_inserter< mpl::list<> > >::type zeros;
   typedef typename mpl::copy_if< instructions, has_at< discriminating_bit, 1>, mpl::front_inserter< mpl::list<> > >::type ones;

</source>

  1. for each of the two sets, recurse into step 2 with bit position d-1.
  2. now create a decide_node (an instantiation of a template with three arguments: the discriminating bit d and the left and right result types) putting the result of the recursions in the left and right parts of this node.
  3. The recursion ends when the instruction set contains only one instruction, we don't have to examine any more bits to determine which instruction to call. For such an instruction set, the algorithm will return a call_node.

For the simple three-instruction microcontroller of the previous example, this creates a datastructure that looks like the following: <source lang="cpp">

  decide_node< 2, decide_node< 1, call_node<reset>, call_node<set> >, call_node< store> >;

</source>

Decision tree for the SX instruction decoder, click here to see full image.

So, after all of this, the instruction decoder has a single typedef, instruction_tree that defines a decision tree type that would look pretty much like the one above. Only for the full SX instruction set, the tree contains 53 instructions--and one BREAKPOINT instruction that I added myself--as. The typedef can be found near the bottom of file instruction_decoder.hpp and looks like this: <source lang="cpp"> typedef typename make_tree< typename instruction_list::instructions, 11, 0 >::type instruction_tree; </source>

The picture on the right shows a graphical representation of the SX instruction tree. It shows for instance that an instruction that starts with '101' (has '101' as the left-most digits) is a jmp-instruction (follow the tree from left to right along the path '1'->'0'->'1'). If the decoder encounters the pattern '100' it still needs to examine an extra bit to determine whether it is dealing with a call or a retw instruction.

From meta program to actual function calls

Once we have such a tree, decoding an instruction becomes reasonably easy, we only need a function template, decode_and_call that takes a decide_node instance, examines the bit at the position mentioned in that decide_node, and delegates further decoding to either the left or the right branch of the tree, depending of the value of the examined bit:

<source lang="cpp"> // // This function takes an instruction word, a pointer to an implementation and // a decision tree. It will examine the bit indicated by the template argument // 'bit' and based on its value will delegate to either the left- or the right // side of the tree. template< int bit, typename on_zero, typename on_one> static void decode_and_call( instruction_type word, implementation_type &implementation, const call_tag<decide_node< bit, on_zero, on_one> > &) { if (word & (1<<bit)) { decode_and_call( word, implementation, call_tag<on_one>()); } else { decode_and_call( word, implementation, call_tag<on_zero>()); } } </source> Of course there is an overload to end the recursion when a call_node is reached. This overload will finally call a member function that is supposed to imlement the instruction: <source lang="cpp"> // // If we've arrived at a leaf of the tree, we can call the right member function // of the implementation. The right member function is encoded in the leaf (the call_node) template< class tag> static void decode_and_call( instruction_type word, implementation_type &implementation, const call_tag< call_node< tag> > &) { dispatch( word, implementation, tag()); } </source> An optimizing compiler will inline most of these function calls, resulting in one big function, comprised of many if-statements.

Precompilation

The emulator spends a considerable amount of time in figuring out what instruction to execute, given the instruction word. It does this every time an instruction word at a given ROM location is executed, even though that instruction word will never change! It does make sense to try to do the instruction decoding once for every rom location before running the program.

Sxgos precompilation feature does just that: when a new rom is loaded, the emulator precompiles every instruction and puts the precompiled instruction in a shadow rom. The original rom contents are preserved for instructions like iread to do their work. Then, when executing a program, instructions are fetched from the precompiled shadow rom and executed.

Due to the generic way the instruction decoder was implemented, we shall see that this can be implemented with just a few hundred lines of code (approximately 300 LOC to be precise).

The shadow rom is filled with precompiled instructions and this is what one such compiled instruction looks like: <source lang="cpp"> template< typename implementation> struct compiled_instruction {

   // typedef for the function interface of the 'precompiled' functions.
   typedef void (function_type)( unsigned short, unsigned short, implementation *);

compiled_instruction() :f(0),a1(0), a2(0) { }

explicit compiled_instruction( function_type *interface_, unsigned short arg1 = 0, unsigned short arg2 = 0) :f(interface_),a1(arg1), a2( arg2) { }

void execute( implementation *imp) const { f( a1, a2, imp); }

   bool points_to( function_type *comp) const
   {
       return f == comp;
   }

private: function_type *f; unsigned short a1; unsigned short a2; };

</source> As you can see, an instruction is nothing more than a pointer to a function and two instruction arguments (the operands). Executing an instruction is nothing more than invoking the function pointer with the two stored arguments, a1 and a2 and a pointer to an implementation of the sx instruction set. The function f will then take care that the right member function is called on the implementation object.

At this point, you can probably imagine that there are function implementations for every instruction in the SX instruction set. Sxgo implements these by defining three templates: one for nullary instructions (having no arguments), one for instructions with one argument and one for instructions with two arguments. Below, you can see the implementation for instructions with one argument. This implementation will ignore the second argument call and pass the first argument to an implementation of an SX emulator.

The actual instruction that this object represents is determined by the tag template argument.

<source lang="cpp"> template< typename implementation, typename tag> struct ins1 { static void execute( unsigned short arg1, unsigned short, implementation *imp) { imp->execute( tag(), arg1); } }; </source>

Now that we have the compiled_instruction and the functions that these compiled instructions point to defined, it's time to show the full code of the compiler. Warning: don't blink, this will be a short one

<source lang="cpp"> /// \brief class template that can translate sx-instructions to /// instances that will call a handler with the right arguments. /// /// This class template is used to translate a rom full of sx-instructions into an array full /// of member function pointers each member function points to the implementation of the corresponding /// instruction. template< typename implementation> class sx_compiler { public: typedef compiled_instruction<implementation> slot_type;

sx_compiler( slot_type &location_) :location( location_) { }

template<typename tag> void execute( const tag &) { typedef ins0< implementation, tag> ins; location = slot_type( &ins::execute); }

template<typename tag> void execute( const tag &, int arg1) { typedef ins1< implementation, tag> ins; location = slot_type( &ins::execute, arg1); }

template<typename tag> void execute( const tag &, int arg1, int arg2) { typedef ins2< implementation, tag> ins; location = slot_type( &ins::execute, arg1, arg2); } private: slot_type &location; }; </source> That's it! This is the whole source code of the compiler. This class can be used to translate one SX instruction (a 12-bit number) into a pointer to a function that will execute the instruction. The three execute member functions must be a hint to the sleight of hand that is taking place here: All the hard work of decoding the number into a real function call is performed by the same instruction decoder that was created before...

Below, you'll find an example that shows how the compiler is used together with an instruction decoder to compile one single SX instruction.

<source lang="cpp"> /// compile an sx instruction into a function pointer void compile( address_t address, sx_rom::register_t instruction) {

   // a decoder type that decodes sx instructions (from the sx_instruction_list)
   // and translates them to calls to the compiler (the compiler_type)
       typedef micro_emulator::instruction_decoder<
               sx_instruction_list,
               compiler_type
           > compiler_decoder_type;
       compiler_type c( precompiled[ address]);
       compiler_decoder_type::feed( instruction, c);

}

</source>

Template:Comments