Difference between revisions of "Driving the WS2811 at 800 kHz with an 8 MHz AVR"

Revision as of 13:53, 11 February 2013

WS2811 LED controllers are hot. HackaDay has mentioned them three times in the last two months. Reason enough to order a WS2811 led string on ebay and start researching.

Normally I'd go straight to the datasheet and start working from there, but in this particular case the datasheets are not so very informative. Luckily, the HackaDay links provide some excellent discussions. This one by Alan Burlison is especially helpful. That article not only explains in great detail why a library like FastSPI isn't guaranteed to work, but it comes with working code for a 16Mhz AVR that appears rock solid in its timing. Another good source was a post by "Cunning_Fellow", although I only took a cursory glance at his article, since his software solution was the last item in a longish text.

Small problem: I didn't have any 16Mhz crystals on stock, so I ordered a few, on ebay again and sat back for the 25 day shipping time to pass. 25 Days is a long time. The led strip had arrived and was sitting on my desk. 25 Days is a really long time. Maybe it could work off an AVR on its internal 8Mhz oscillator? It would be a lot of work. But 25 days is a very, very, long time.

So, that is how I got to sit down and write my 8Mhz version of a WS2811@800Khz bit banger. The challenge is of course that I have 10 clock cycles for every bit, no more no less, and 80 cycles for every byte, no more no less. I wanted the timing to be as rock-steady as Alans, give-or-take the imprecise nature of the AVR internal oscillator.

You can see the result below. For the impatient: example source code can be found here. The assembly is absolutely unreadable, but I explain everything as best as I can after the video. I'll also explain how a spreadsheet came to be the best IDE for this type of coding.

And of course, if you're creating a hardware project that controls more than 1 LED, you're going to have to demonstrate it with a Knight Rider display...

The challenge

For a full description of the required protocl to communicate with a WS2811, please refer to either Alans page or the datasheet. In summary, the microcontroller should send a serial signal containing 3 bytes for every LED in the chain, in GRB-order. The bits of this signal are encoded in a special way. See the figure below.

This image shows a sequence of a "0" followed by a "1". Every bit starts with a rising flank. For zeros, the signal drops back to low "quickly" while for ones the signal stays high and drops nearer the end of the bit. I've chosen the following timing, in line with Alans observations and recommendations:

Zero: 250ns up, 1000ns down
One: 1000ns up, 250ns down

Giving a total duration of 1250ns for every bit, or 10μs per byte. These timings do not fall in the ranges permitted by the data sheet, but Alan describes clearly why that should not be a problem. 1250ns means 10 clock ticks per bit. That is not a lot. A typical, naive implementation would need to do the following things at every bit:

determine whether the next bit is a 1 or a 0
decrease a bit counter and determine if the end of a byte has been reached, if at the end:
1. determine if we're at the end of the total sequence
2. load a new byte in the data register
3. decrement the byte counter
4. reset the bit counter
jump back to the first step

Oh yes, and that is of course in addition to actually switching the output levels.

All of that does not fit into a single 10-clock time frame. Luckily, it doesn't have to: why not, instead of having a 10-clock loop over one bit, use a 20 tick loop over 2 bits? That is the central idea behind the code: at the cost of extra program space, create a sequence of assembly instructions that falls into 4 different states for every possible two-bit combination, instead of 2 states for every possible bit value. As it turns out, adding the end-of-byte and end-of-data criteria increases the number of states to 10.

Defining the puzzle

Juggling with so many states, jumping from one to the other without introducing phase errors turns out to be interesting. I spent a couple of lonely lunch breaks and several pages in my little (paper!) notebook before I even figured out how to describe the problem. When a notation became clear, however, the going was easy enough and this exercise turned into one of the nicer kinds of pastimes. Because there is no way I can relate the amount of work that went into solving the puzzle, here comes the full code in my notation system of choice:

The image above shows pseudo assembly code in the yellow blocks. To the left of each yellow block is a graphic representing the wave form being generated. Tilt your head to the right to see the more conventional waveform graphic. Each horizontal row in the yellow blocks represents a clock tick, not necessarily an instruction word. To the left of each waveform graphic there are numbers from 00 to 19 that represent the "phase" at the corresponding clock tick.

The yellow code blocks are organized into 4 columns, one for every two-bit combination. As can be seen from the waveforms, these columns, from left to right, represent the combinations "10", "11", "00" and "01" respectively.

What makes this notation so convenient is the fact that I can now easily determine the waveform phase at each point in the code and can also check whether a jump lands in the correct phase. Each jump at phase n (0 < n < 20) should land at a label which is placed at phase n + 2 (modulo 20). Put differently: each jump should be to a label that is two lines down from the jump location (or 18 lines up).

The drawn waveforms make it easy to verify that when I jump from the middle of a wave, the code lands in a place where that same wave form is continued. It also shows clearly where the 'high' and 'low' need to go.

Reading the code

The code starts (not shown in the tables above) with the first data byte in the 'data' register, left-shifted so that the first bit (the most significant bit) is in the Carry flag. From there, depending on the carry flag the code should then continue to the label L0x or L1x. These labels are in the top left and top right of the illustration. This is where the code in the tables takes off.

At labels L0x and L1x, the first bit of the bit-pair is known, but the second isn't (hence the 'x' in L1x). The code will now determine the value of the second bit, and if necessary switch tracks (take a jump to a block of code that is one column to the left or right). An example of such a track switch is the jump to L00 right after label L0x.

Now that we're in the right column, we know how the coming 2 bits will play out and we can concentrate on the other tests that need to be performed. The first test is whether we're at the final two bits of the byte. If that isn't the case, the code will remain in its yellow block and only needs to determine the value of the next bit (the first bit of the next bit-pair). At the end of these blocks, code will return to L1x and L0x again.

If we detect that we're at the end of a byte, the code jumps one row of blocks down, to one of the M-labels. At this point we're well underway in the first bit and a lot of things need to be done before we can jump to one of the L0x/L1x labels again. The following tasks are done in whatever order fits best in between the "up"s and "down"s:

reset the bit counter to 4 (we're counting bit pairs)
load the next byte from memory. This takes 2 clock cycles and therefore this instruction takes two rows in the spreadsheet (remember that a row represents a clock cycle, not an instruction word).

After we've done that, it is high time to establish if we are maybe at the end of the byte sequence. In that case the code jumps to the H1 or H0 label where it just finishes the waveform for the final bit and terminates. We're still not finished, because we need to determine the value of the first bit of the next bit-pair. This is done just in time before jumping to either L0x or L1x again. Actually, the M11 column was one cycle too short to perform the jump and so this block must be positioned right before L1x (P1x actually) so that we can transition to that state without losing a clock tick.

{{#set: |Article has average rating={{#averagerating:}} }} {{#showcommentform:}}

{{#ask: Belongs to article::Driving the WS2811 at 800 kHz with an 8 MHz AVR Modification date::+

 | ?Has comment person
 | ?Has comment date
 | ?Has comment text
 | ?Has comment rating
 | ?Belongs to comment
 | ?Comment was deleted#true,false
 | ?Has comment editor
 | ?Modification date
 | ?Has attached article
 | format=template
 | template=CommentResult
 | sort=Has comment date
 | order=asc
 | link=none
 | limit=100

}}

@@ Line 1: / Line 1: @@
 WS2811 LED controllers are hot. [http://hackaday.com/?s=ws2811 HackaDay] has mentioned them three times in the last two months. Reason enough to order a WS2811 led string on [http://www.ebay.com/sch/i.html?LH_BIN=1&_sop=15&_nkw=ws2811+led+string&LH_PrefLoc=2 ebay] and start researching.
-Normally I'd go straight to the datasheet and start working from there, but in this particular case the datasheets are [www.nooelec.com/files/WS2811.pdf not so very informative]. Luckily, the HackaDay links provide some excellent discussions. [http://bleaklow.com/2012/12/02/driving_the_ws2811_at_800khz_with_a_16mhz_avr.html This one] by Alan Burlison is especially helpful. That article not only explains in great detail why a library like FastSPI isn't guaranteed to work, but it comes with working code for a 16Mhz AVR that appears rock solid in its timing. Another good source was [http://www.instructables.com/id/My-response-to-the-WS2811-with-an-AVR-thing a post] by "Cunning_Fellow", although I only took a cursory glance at his article, since his software solution was the last item in a longish text.
+Normally I'd go straight to the datasheet and start working from there, but in this particular case the datasheets are [http://www.nooelec.com/files/WS2811.pdf not so very informative]. Luckily, the HackaDay links provide some excellent discussions. [http://bleaklow.com/2012/12/02/driving_the_ws2811_at_800khz_with_a_16mhz_avr.html This one] by Alan Burlison is especially helpful. That article not only explains in great detail why a library like FastSPI isn't guaranteed to work, but it comes with working code for a 16Mhz AVR that appears rock solid in its timing. Another good source was [http://www.instructables.com/id/My-response-to-the-WS2811-with-an-AVR-thing a post] by "Cunning_Fellow", although I only took a cursory glance at his article, since his software solution was the last item in a longish text.
 Small problem: I didn't have any 16Mhz crystals on stock, so I ordered a few, on ebay again and sat back for the 25 day shipping time to pass. 25 Days is a long time. The led strip had arrived and was sitting on my desk. 25 Days is a really long time. Maybe it could work off an AVR on its internal 8Mhz oscillator? It would be a lot of work. But 25 days is a very, very, long time.
@@ Line 15: / Line 15: @@
 ==The challenge==
-For a full description of how to communicate with a WS2811, please refer to either [http://bleaklow.com/2012/12/02/driving_the_ws2811_at_800khz_with_a_16mhz_avr.html Alans page] or the [http://www.nooelec.com/files/WS2811.pdf datasheet]. In summary, the microcontroller should send a serial signal containing 3 bytes for every LED in the chain, in GRB-order. The bits of this signal are encoded in a special way. See the figure below.
+For a full description of the required protocl to communicate with a WS2811, please refer to either [http://bleaklow.com/2012/12/02/driving_the_ws2811_at_800khz_with_a_16mhz_avr.html Alans page] or the [http://www.nooelec.com/files/WS2811.pdf datasheet]. In summary, the microcontroller should send a serial signal containing 3 bytes for every LED in the chain, in GRB-order. The bits of this signal are encoded in a special way. See the figure below.
 [[File:Ws2811 waveform.png|illustration of a WS2811 waveform]]
@@ Line 22: / Line 22: @@
 *Zero: 250ns up, 1000ns down
 *One: 1000ns up, 250ns down
-Giving a total duration of 1250ns for every bit, or 10&mu;s per byte. 1250ns means '''10 clock ticks per bit'''. That is not a lot. A typical, naive implementation would need to do the following things at every bit:
+Giving a total duration of 1250ns for every bit, or 10&mu;s per byte. These timings do not fall in the ranges permitted by the data sheet, but Alan describes clearly why that should not be a problem. 1250ns means '''10 clock ticks per bit'''. That is not a lot. A typical, naive implementation would need to do the following things at every bit:
 # determine whether the next bit is a 1 or a 0
 # decrease a bit counter and determine if the end of a byte has been reached, if at the end:
@@ Line 35: / Line 35: @@
 ==Defining the puzzle==
-Juggling with so many states, jumping from one to the other without introducing phase errors turned out to be ''interesting''. I spent a couple of lonely lunch breaks and several pages in my little (paper!) notebook before I even figured out how to describe the problem. When a notation became clear, however, the going was easy enough and this exercise turned into one of the nicer puzzles. Because there is no way I can relate the amount of work that went into solving the puzzle, here comes the full code in the notation system of my choice: a spreadsheet:
+Juggling with so many states, jumping from one to the other without introducing phase errors turns out to be ''interesting''. I spent a couple of lonely lunch breaks and several pages in my little (paper!) notebook before I even figured out how to describe the problem. When a notation became clear, however, the going was easy enough and this exercise turned into one of the nicer kinds of pastimes. Because there is no way I can relate the amount of work that went into solving the puzzle, here comes the full code in my notation system of choice:
 [[File:Ws2811 instruction table.png|572px]]
-The image above shows pseudo assembly code in the yellow blocks. To the left of each yellow block is a graphic representing the line state. Tilt your head to the right to see the more conventional waveform graphic. Each line in the yellow blocks represents a clock tick, not necessarily an instruction word. To the left of each waveform graphic there are numbers from 00 to 19 that represent the "phase" at the corresponding clock tick.
+The image above shows pseudo assembly code in the yellow blocks. To the left of each yellow block is a graphic representing the wave form being generated. Tilt your head to the right to see the more conventional waveform graphic. Each horizontal row in the yellow blocks represents a clock tick, not necessarily an instruction word. To the left of each waveform graphic there are numbers from 00 to 19 that represent the "phase" at the corresponding clock tick.
+The yellow code blocks are organized into 4 columns, one for every two-bit combination. As can be seen from the waveforms, these columns, from left to right, represent the combinations "10", "11", "00" and "01" respectively.
 What makes this notation so convenient is the fact that I can now easily determine the waveform phase at each point in the code and can also check whether a jump lands in the correct phase. Each jump at phase n (0 < n < 20) should land at a label which is placed at phase n + 2 (modulo 20). Put differently: each jump should be to a label that is two lines down from the jump location (or 18 lines up).
-The waveforms make it easy to verify that when I jump in the middle of a wave, the code lands in a place where that same wave form is continued.
+The drawn waveforms make it easy to verify that when I jump from the middle of a wave, the code lands in a place where that same wave form is continued. It also shows clearly where the 'high' and 'low' need to go.
 ==Reading the code==
-<to be continued>
+The code starts (not shown in the tables above) with the first data byte in the 'data' register, left-shifted so that the first bit (the most significant bit) is in the Carry flag. From there, depending on the carry flag the code should then continue to the label L0x or L1x. These labels are in the top left and top right of the illustration. This is where the code in the tables takes off.
+At labels L0x and L1x, the first bit of the bit-pair is known, but the second isn't (hence the 'x' in L1x). The code will now determine the value of the second bit, and if necessary switch tracks (take a jump to a block of code that is one column to the left or right). An example of such a track switch is the jump to L00 right after label L0x.
+Now that we're in the right column, we know how the coming 2 bits will play out and we can concentrate on the other tests that need to be performed. The first test is whether we're at the final two bits of the byte. If that isn't the case, the code will remain in its yellow block and only needs to determine the value of the next bit (the first bit of the next bit-pair). At the end of these blocks, code will return to L1x and L0x again.
+If we detect that we're at the end of a byte, the code jumps one row of blocks down, to one of the M-labels. At this point we're well underway in the first bit and a lot of things need to be done before we can jump to one of the L0x/L1x labels again. The following tasks are done in whatever order fits best in between the "up"s and "down"s:
+* reset the bit counter to 4 (we're counting bit ''pairs'')
+* load the next byte from memory. This takes 2 clock cycles and therefore this instruction takes two rows in the spreadsheet (remember that a row represents a clock cycle, not an instruction word).
+After we've done that, it is high time to establish if we are maybe at the end of the byte sequence. In that case the code jumps to the H1 or H0 label where it just finishes the waveform for the final bit and terminates.
+We're still not finished, because we need to determine the value of the first bit of the next bit-pair. This is done just in time before jumping to either L0x or L1x again. Actually, the M11 column was one cycle too short to perform the jump and so this block must be positioned right before L1x (P1x actually) so that we can transition to that state without losing a clock tick.
+<to be continued>
 {{ShowComments|show=True}}

Difference between revisions of "Driving the WS2811 at 800 kHz with an 8 MHz AVR"

From Just in Time

Revision as of 13:53, 11 February 2013

The challenge

Defining the puzzle

Reading the code