Actions

Difference between revisions of "Ws2811 driver code explained"

From Just in Time

 
(6 intermediate revisions by the same user not shown)
Line 1: Line 1:
This page gives a detailed description of the WS2811@8Mhz driver code and the methodology used to create that code. The intent of the code is to send a serial signal to a given output pin in a strict 10 clockcycles/bit timing. Below follows a step-by-step explanation. You should already know a little AVR assembly language to follow the descriptions, although the code uses a very small subset of the AVR instruction set.
+
This page gives a detailed description of the first version of the WS2811@8Mhz driver code mentioned on the page "[[Driving the WS2811 at 800 kHz with an 8 MHz AVR]]" and the methodology used to create that code. Note that I've created a much smaller implementation since, which is described on that page. I'm leaving this text as a reference, because the method described here to create the old version of the driver is actually very similar to the one used to create the smaller one.
 +
 
 +
The intent of the code is to send a serial signal to a given output pin in a strict 10 clockcycles/bit timing. Below follows a step-by-step explanation. The text assumes some familiarity with AVR assembly, although the code uses a very small subset of the AVR instruction set.
  
 
==General description==
 
==General description==
 
The code contains an outer loop that loops over bytes that need to be sent. Bytes are expected in the correct order in memory, that is: in GRB order.
 
The code contains an outer loop that loops over bytes that need to be sent. Bytes are expected in the correct order in memory, that is: in GRB order.
An inner loop sends out 2 bits at a time. Essentially there are 4 variants of the inner loop, one each for the combinations ''00'', ''01'', ''10'' and ''11''.
+
An inner loop sends out 2 bits at a time. Essentially there are 4 variants of the inner loop, one each for the combinations ''00'', ''01'', ''10'' and ''11''. The code will quickly determine in which of those four loops it needs to be and then perform all other tasks around the OUT-instructions that are necessary at fixed points in those four loops.
  
 
Normally, when creating timed loops, the extra clocktick that a jump-taken uses is compensated by adding a NOP in the not-taken code path. We cannot afford such NOP instructions here, so instead each instruction is annotated with the ''phase'' within the 20-clock cycle at which it executes and care is taken that all the output signalling is done at exactly the right ''phase'', while other instructions are performed whenever possible.
 
Normally, when creating timed loops, the extra clocktick that a jump-taken uses is compensated by adding a NOP in the not-taken code path. We cannot afford such NOP instructions here, so instead each instruction is annotated with the ''phase'' within the 20-clock cycle at which it executes and care is taken that all the output signalling is done at exactly the right ''phase'', while other instructions are performed whenever possible.
Line 13: Line 15:
 
|
 
|
 
===Skeleton===
 
===Skeleton===
This image shows code that will continuously (and endlessly) send the bit-sequence "''10''" in a strict 20-clockcycle loop. This will be the skeleton that is used to add the rest of the code. Note that the '''OUT''' instructions can't be moved to any other point in time, so these form the pillars around which the rest of the code will be added.
+
This image shows code that will continuously (and endlessly) send the bit-sequence "''10''" in a strict 20-clockcycle loop. This will be the skeleton that is used to add the rest of the code. The code starts out as 4 of those skeletons (one for "''00''", "''01''", "''10''" and "''11''") fitted with jumps from one to the other to create the right bit sequence.
 +
 
 +
Note that the '''OUT''' instructions can't be moved to any other point in time, so these form the pillars around which the rest of the code will be added.
  
 
The code is presented in 4 columns:
 
The code is presented in 4 columns:
Line 30: Line 34:
 
|[[Image:jump on second bit value.png]]
 
|[[Image:jump on second bit value.png]]
 
|
 
|
 +
 
===Determine second bit===
 
===Determine second bit===
 
Let's assume that at label ''L1x00'' we already know that the first bit of the 2-bit sequence is a ''1'' and that there is a register called ''data'' that contains the second bit value in its most significant bit (and that may have even more bit values in bits 6-1). The only thing we have to do now is to shift that bit into the carry flag. If that second bit is zero, we're already at the right track, but if that bit is one, then we should switch tracks to a "11"-sequence. That switch is exactly what the highlighted code in this image does.
 
Let's assume that at label ''L1x00'' we already know that the first bit of the 2-bit sequence is a ''1'' and that there is a register called ''data'' that contains the second bit value in its most significant bit (and that may have even more bit values in bits 6-1). The only thing we have to do now is to shift that bit into the carry flag. If that second bit is zero, we're already at the right track, but if that bit is one, then we should switch tracks to a "11"-sequence. That switch is exactly what the highlighted code in this image does.
Line 64: Line 69:
 
|
 
|
 
===Fetch the next byte===
 
===Fetch the next byte===
Now we're ready to load the next byte and to reset the bit counter to 4. Notice here that the single instruction ''LD data, X+'' takes two clock cycles, we mark that with the "^^^" in the next clock cycle. Notice also that we have to hurry, because particularly in the L1104 block we're quickly approaching the end of the cycle. Luckily there's only one thing left to do: see if the byte that we're currently transmitting the last bits of is actually the last byte of the sequence.
+
Now we're ready to load the next byte and to reset the bit counter to 4. Notice here that the single instruction ''LD data, X+'' takes two clock cycles, we mark that with the "^^^" in the next clock cycle. Notice also that we have to hurry, because particularly in the L1104 block we're quickly approaching the end of the cycle. Luckily there's only one thing left to do: see if the current byte is actually the last byte of the sequence.
 
|-
 
|-
 
|[[Image:final byte of the sequence.png]]
 
|[[Image:final byte of the sequence.png]]
 
|
 
|
 +
 
===The final bit===
 
===The final bit===
Now it's time to see if we're done completely. The code decreases a 16-bit byte counter–again a 2 cycle instruction–and jumps out if it is zero. Because by now were in the final clock ticks of the last bit, when we jump out there are only a few clock ticks left to play out before we jump to some end label.
+
Now it's time to see if we're done completely. The code decreases a 16-bit byte counter–again a 2 cycle instruction–and jumps out (to Hx017) if it is zero.  
  
 
|}
 
|}
Line 77: Line 83:
  
 
[[Image:ws2811 instruction table.png]]
 
[[Image:ws2811 instruction table.png]]
 +
 +
==Comments? Questions?==
 +
{{ShowComments|show=True}}
 +
 +
[[Category:AVR]]

Latest revision as of 22:30, 11 March 2014

This page gives a detailed description of the first version of the WS2811@8Mhz driver code mentioned on the page "Driving the WS2811 at 800 kHz with an 8 MHz AVR" and the methodology used to create that code. Note that I've created a much smaller implementation since, which is described on that page. I'm leaving this text as a reference, because the method described here to create the old version of the driver is actually very similar to the one used to create the smaller one.

The intent of the code is to send a serial signal to a given output pin in a strict 10 clockcycles/bit timing. Below follows a step-by-step explanation. The text assumes some familiarity with AVR assembly, although the code uses a very small subset of the AVR instruction set.

General description

The code contains an outer loop that loops over bytes that need to be sent. Bytes are expected in the correct order in memory, that is: in GRB order. An inner loop sends out 2 bits at a time. Essentially there are 4 variants of the inner loop, one each for the combinations 00, 01, 10 and 11. The code will quickly determine in which of those four loops it needs to be and then perform all other tasks around the OUT-instructions that are necessary at fixed points in those four loops.

Normally, when creating timed loops, the extra clocktick that a jump-taken uses is compensated by adding a NOP in the not-taken code path. We cannot afford such NOP instructions here, so instead each instruction is annotated with the phase within the 20-clock cycle at which it executes and care is taken that all the output signalling is done at exactly the right phase, while other instructions are performed whenever possible.

Steps

Empty loop.png

Skeleton

This image shows code that will continuously (and endlessly) send the bit-sequence "10" in a strict 20-clockcycle loop. This will be the skeleton that is used to add the rest of the code. The code starts out as 4 of those skeletons (one for "00", "01", "10" and "11") fitted with jumps from one to the other to create the right bit sequence.

Note that the OUT instructions can't be moved to any other point in time, so these form the pillars around which the rest of the code will be added.

The code is presented in 4 columns:

  1. a column that shows the phase within the 2-bit cycle (00-19)
  2. a graphical representation of the waveform shape that is being emitted
  3. jump labels, these are structurally named, more on that later
  4. the actual instructions. For readability, I'm going to omit the NOP instructions in following images.

Labels are named Labcd, Mabcd, Habcd or Pabcd. The digits cd encode the phase at that point (the 00 in L1x00). The digits ab represent the two-bit sequence that is being sent, with x representing "don't know" (The 1x in L1x00 means that we know that the first bit is 1 and we don't know the second bit value yet). The starting letters encode where we are in the byte: L means we may be at the last 2 bits, M means we're definately not in the last two bits, and H means that we're finishing of the last bit of the complete byte sequence.

Sending a continous stream of "10101010101010..." is of limited use, so let's see if we can bring some variation in the bit sequence.


Jump on second bit value.png

Determine second bit

Let's assume that at label L1x00 we already know that the first bit of the 2-bit sequence is a 1 and that there is a register called data that contains the second bit value in its most significant bit (and that may have even more bit values in bits 6-1). The only thing we have to do now is to shift that bit into the carry flag. If that second bit is zero, we're already at the right track, but if that bit is one, then we should switch tracks to a "11"-sequence. That switch is exactly what the highlighted code in this image does.

Note that the code jumps from an instruction that is in phase 02. Because a jump takes 2 clock ticks, it will arrive at an instruction that is in phase 04 (two steps further in generating the wave form). Hence the label L1104. The 11 means we know we're emitting the sequence 11 and the 04 means we're at phase 04.

Also notice that at the end of the "L1104"-block, we don't have time to jump to the "L1x00" label, because we need to push down the line (in phase 18) and quickly move it up again (in phase 00). We therefore jump sooner, in phase 16. But we cannot land in L1x00, because then we'd skip two clock cycles (we're jumping from phase 16, which means it's phase 18 where we land). We therefore fit two extra clock cycles to represent phases 18 and 19 before L1x00 and call these P1x18.

Now that we've adapted the sequence to the value of the second bit of the 2-bit sequence, let's adapt to the first bit as well. Adapting to the first bit of the sequence must be done before the sequence start, therefore, at the end of the sequence we'll now add the shift-and branch instructions that will jump depending on the first bit of the next sequence. This is shown in the next picture.

Determine first bit of next sequence

Jump on first bit of next sequence.png

The image above shows all the blocks necessary to output all four combinations of 2 bits. Every 2-bit sequence starts at either L1x00 or L0x00, depending on the value of the first bit, then depending on the value of the second bit, the code may jump to L1104 or L0005 respectively. At the end of each block the value of the first bit of the next sequence is determined and the code jumps to L1x00, L0x00 or one of its preambles (the Pnx-labels).

The leftmost two blocks are the same ones as in the previous image. The only difference is the added jumps to the rightmost block in case the next sequence starts with a 0. The two blocks on the right form a mirror image of those on the left.

This code is enough to emit the right signals for a complete byte of data, but at some point we will have run out of bits in register data and we'll have to replenish them. That's what we'll concentrate on in the next step. We'll zoom in again on the L1x00 block, but essentially all the steps that we'll describe need to be done for all four blocks shown here.

End of byte.png

Last bits of a byte

Now that both bits of the sequence have been determined, it becomes high time to see if these might be the the last bits of the byte. For that purpose we keep a count of bit pairs in a register, designated here with bits. The code decreases the bit counter and jumps to M1006 if these are not the last bits of the byte. In this case we don't have that much to do and the block starting at M1006 is pretty much complete; it just plays out the signals of the bit sequence, checks the first bit of the next bit and jumps appropriately.

If these are the last bits, we have quite a lot to do, so it is a good thing the code continues directly after the BRNE instruction, without having to spend an extra clock cycle in the jump. At this point the code has to:

  1. load the next byte value
  2. reset the bit counter (to 4, we're counting pairs of bits)
  3. decrease the byte counter and determine whether we're actually at the end of the byte sequence as well.

This is what is shown in the next figures. Note again that whatever is described in the next steps must be reproduced in all four variants of this loop.

Loading the next byte.png

Fetch the next byte

Now we're ready to load the next byte and to reset the bit counter to 4. Notice here that the single instruction LD data, X+ takes two clock cycles, we mark that with the "^^^" in the next clock cycle. Notice also that we have to hurry, because particularly in the L1104 block we're quickly approaching the end of the cycle. Luckily there's only one thing left to do: see if the current byte is actually the last byte of the sequence.

Final byte of the sequence.png

The final bit

Now it's time to see if we're done completely. The code decreases a 16-bit byte counter–again a 2 cycle instruction–and jumps out (to Hx017) if it is zero.

Complete code

Below is the complete assembly code arranged in blocks. The orange-colored blocks M0008 and M0107 are left out of the actual code, because they are functionally equivalent to blocks starting at Mx008 and Mx107 respectively. This code is available as a spreadsheet on github. The final assembly code is also available on github. The order of the blocks in that final assembly source has been determined by a dedicated C++ program, so that all of the conditional jumps were in the allowed range.

Ws2811 instruction table.png

Comments? Questions?

nopreview

{{#set: |Article has average rating={{#averagerating:}} }} {{#showcommentform:}}

{{#ask: Belongs to article::Ws2811 driver code explainedModification date::+

 | ?Has comment person
 | ?Has comment date
 | ?Has comment text
 | ?Has comment rating
 | ?Belongs to comment
 | ?Comment was deleted#true,false
 | ?Has comment editor
 | ?Modification date
 | ?Has attached article
 | format=template
 | template=CommentResult
 | sort=Has comment date
 | order=asc
 | link=none
 | limit=100

}}