Actions

Difference between revisions of "Driving the WS2811 at 800 kHz with an 8 MHz AVR"

From Just in Time

(Attempt to correct syntax highlighting.)
 
(85 intermediate revisions by the same user not shown)
Line 1: Line 1:
WS2811 LED controllers are hot. [http://hackaday.com/?s=ws2811 HackaDay] has mentioned them three times in the last two months. Reason enough to order a WS2811 led string on [http://www.ebay.com/sch/i.html?LH_BIN=1&_sop=15&_nkw=ws2811+led+string&LH_PrefLoc=2 ebay] and start researching.
+
This page describes how to drive a WS2811 from an 8Mhz or 9.6Mhz AVR like an Atmel ATmega88, ATtiny2313 or ATtiny13 '''without added components''' such as an external oscillator. With only 30 instructions for the driver loop (110 bytes for the complete function), the driver code presented here is likely to be the [https://github.com/DannyHavenith/ws2811/blob/master/ws2811/ws2811_8.h#L49 shortest code] that delivers the exact timing (i.e. exactly 10 clock cycles for 1 byte, exactly 60 clock cycles for 6 bytes, etc.). With even less code a "sparse driver" for 9.6Mhz MCUs can drive long led strings with only a few bytes of buffer in RAM.
  
Normally I'd go straight to the datasheet and start working from there, but in this particular case the datasheets are [http://www.nooelec.com/files/WS2811.pdf not so very informative]. Luckily, the HackaDay links provide some excellent discussions. [http://bleaklow.com/2012/12/02/driving_the_ws2811_at_800khz_with_a_16mhz_avr.html This one] by Alan Burlison is especially helpful. That article not only explains in great detail why a library like FastSPI isn't guaranteed to work, but it comes with working code for a 16Mhz AVR that appears rock solid in its timing. Another good source was [http://www.instructables.com/id/My-response-to-the-WS2811-with-an-AVR-thing a post] by "Cunning_Fellow", although I only took a cursory glance at his article, since his software solution was the last item in a longish text.
+
Of course, if you're creating a hardware project that controls more than 1 LED, you're going to have to demonstrate it with a Knight Rider LED sequence (which, I just learned, is actually called a [https://en.wikipedia.org/wiki/Glen_A._Larson Larson] scanner)... The sources for all the demonstrations in these videos can be found on [https://github.com/DannyHavenith/ws2811 github].
 +
{|
 +
|- valign="top"
 +
|colspan="3"|{{#ev:youtube|mP7vBw4UeME|400|left|Knight Rider on Steroids ([https://github.com/DannyHavenith/ws2811/blob/master/effects/chasers.hpp])}}
 +
|- valign="top"
 +
|{{#ev:youtube|MdX45a3OFS0|300||the [https://hackaday.io/project/9393-wheres-my-christmas-light/log/50423-the-most-simplest-is-the-most-easiest where's my christmas light] project}}
 +
|{{#ev:youtube|_wkTtAk2V_k|300||"Water Torture" or "Lava Drops" demo ([https://github.com/DannyHavenith/ws2811/blob/master/effects/water_torture.hpp source code], [[WS2811 "Water torture"|details]])}}
 +
|{{#ev:youtube|jAm7nVRvY_I|300||[[Driving a large WS2811 LED string with an ATtiny13 and nothing else|Special sparse driver]] allows an attiny13 to drive arbitrarily large LED strings from 64 bytes of memory}}
 +
|- valign="top"
 +
|{{#ev:youtube|mItPxAtkuVI|300||Flares demo on an attiny13 ([https://github.com/DannyHavenith/ws2811/blob/master/effects/flares.hpp source code])}}
 +
|{{#ev:youtube|VM84Q1vWSZE|300||Fireflies!}}
 +
|{{#ev:youtube|otCWPGf6O6w|300||Fire without flies}}
 +
|}
 +
<div style="clear: both"></div>
 +
==Download source code==
 +
The library is header-only. Using it is a matter of downloading the source code and including it in your AVR code. Example source code can be found [https://github.com/DannyHavenith/ws2811 here]. The code comes as an avr-eclipse project consisting for a large part of C++ demonstration code and the main driver function in assembly, in files [https://github.com/DannyHavenith/ws2811/blob/master/ws2811/ws2811_8.h ws2811_8.h] and [https://github.com/DannyHavenith/ws2811/blob/master/ws2811/ws2811_96.h ws2811_96.h] (for the 9.6Mhz version).  
  
Small problem: I didn't have any 16Mhz crystals on stock, so I ordered a few, on ebay again and sat back for the 25 day shipping time to pass. 25 Days is a long time. The led strip had arrived and was sitting on my desk. 25 Days is a really long time. Maybe it could work off an AVR on its internal 8Mhz oscillator? It would be a lot of work. But 25 days is a very, very, long time.
+
I don't recommend trying to understand the assembly code by reading these sources. How the code functions is described [[#Defining the puzzle|below]]. Usage information is in [[#Usage|the next section]]. The rest of this page describes the 8Mhz version. The 9.6Mhz code was added later, but is created in the same way.
 +
 
 +
'''New:''' If you'd like to see the assembly code in action, take a look at the [http://rurandom.org/avrgo_demo/ online AVR emulator]!
 +
[[File:Avrgo demo screenshot.png|400px|link=https://rurandom.org/avrgo_demo/]]
 +
 
 +
==Usage==
 +
You'll need the C++ compiler for this to work (turning ws2811.h into "pure C" is left as an exercise to the reader). I am told that this works just as good for an Arduino, but I haven't tested this myself. Remember that this code was written and optimized for 8Mhz and 9.6Mhz, it would run too fast on an 16Mhz Arduino. From the sources, you'll need files ''ws2811.h'', ''ws2811_8.h'', ''ws2811_96.h'' and ''rgb.h'', though you only include "ws2811.h". A simple example of how to use this code:
 +
 
 +
<syntaxhighlight lang="c++">
 +
#include <avr/io.h> // for _BV()
 +
 
 +
#define WS2811_PORT PORTD// ** your port here **
 +
#include "ws2811.h" // this will auto-select the 8Mhz or 9.6Mhz version
 +
 
 +
using ws2811::rgb;
 +
 
 +
namespace {
 +
  const int output_pin = 3;
 +
  rgb buffer[] = { rgb(255,255,255), rgb(0,0,255)};
 +
}
 +
 
 +
int main()
 +
{
 +
  // don't forget to configure your output pin,
 +
  // the library doesn't do that for you.
 +
  // in this example DDRD, because we're using PORTD.
 +
  DDRD = _BV( output_pin);
 +
 
 +
  // send the RGB-values in buffer via pin 3
 +
  // you can control up to 8 led strips from one AVR with this code, as long as they
 +
  // are connected to pins of the same port. Just
 +
  // provide the pin number that you want to send the values to here.
 +
  send( buffer, output_pin);
  
So, that is how I got to sit down and write my 8Mhz version of a WS2811@800Khz bit banger. The challenge is of course that I have 10 clock cycles for every bit, no more no less, and 80 cycles for every byte, no more no less. I wanted the timing to be as rock-steady as Alans, give-or-take the imprecise nature of the AVR internal oscillator.
+
  // alternatively, if you don't statically know the size of the buffer
 +
  // or you have a pointer-to-rgb instead of an array-of-rgb.
 +
  send( buffer, sizeof buffer/ sizeof buffer[0], output_pin);
 +
  for(;;);
 +
}
 +
</syntaxhighlight>
  
You can see the result below. For the impatient: example source code can be found [https://github.com/DannyHavenith/ws2811 here]. The assembly is absolutely unreadable, but I explain everything as best as I can after the video. I'll also explain how a spreadsheet came to be the best IDE for this type of coding.
+
==History==
 +
WS2811 LED controllers are hot. Projects using WS2811 (or WS2812, WS2812B or NeoPixel) LED strips have been featured on [http://hackaday.com/?s=ws2811 HackaDay] several times in the last few months. One feature showed how an AVR clocked at 16Mhz could send data at the required high rates. Inspired by this, I ordered an LED strip and 16Mhz oscillators from ebay. The LED strip arrived quickly, only the oscillators took weeks to arrive, which gave me plenty of time to think about the possibility of driving these led strips from an 8Mhz atmega88 without an external oscillator. With only 10 clock ticks per bit, this was going to be a challenge.  
  
And of course, if you're creating a hardware project that controls more than 1 LED, you're going to have to demonstrate it with a Knight Rider display...
+
Normally I'd go straight to the datasheet and start working from there, but in this particular case the datasheets are [http://www.nooelec.com/files/WS2811.pdf not so very informative]. Luckily, the HackaDay links provide some excellent discussions. [http://bleaklow.com/2012/12/02/driving_the_ws2811_at_800khz_with_a_16mhz_avr.html This one] by Alan Burlison is especially helpful. That article not only explains in great detail why a library like FastSPI isn't guaranteed to work, but it comes with working code for a 16Mhz AVR that appears rock solid in its timing.
  
{{#ev:youtube|mP7vBw4UeME|400}}
+
Small problem: I didn't have any 16Mhz crystals on stock, so I ordered a few, on ebay again and sat back for the 25 day shipping time to pass. 25 Days is a long time. The led strip had arrived and was sitting on my desk. 25 Days is a really long time. Maybe it could work off an AVR on its internal 8Mhz oscillator? It would be a lot of work. But 25 days is a very, very, long time.
  
 +
So, that is how I got to sit down and write my 8Mhz version of a WS2811@800Khz bit banger. The challenge is of course that I have 10 clock cycles for every bit, no more no less, and 80 cycles for every byte, no more no less. '''I wanted the timing to be as rock-steady as Alan's''', give-or-take the imprecise nature of the AVR internal oscillator. The part about it being steady was important to me. People have argued that the code can be made a lot easier if you're willing to have a few extra clock cycles in between bytes or triplets and that such code works for them. I agree that such code is a lot easier to create or read. It's trivial, in fact. However, the WS2811's datasheets are ambiguous at best with regards to the maximum allowed delay between bytes (or bits) and anyway, I liked the challenge of trying to have zero clock ticks delay between bytes or triplets.
  
 
==The challenge==
 
==The challenge==
For a full description of the required protocal to communicate with a WS2811, please refer to either [http://bleaklow.com/2012/12/02/driving_the_ws2811_at_800khz_with_a_16mhz_avr.html Alans page] or the [http://www.nooelec.com/files/WS2811.pdf datasheet]. In summary, the microcontroller should send a serial signal containing 3 bytes for every LED in the chain, in GRB-order. The bits of this signal are encoded in a special way. See the figure below.
+
For a full description of the required protocol to communicate with a WS2811, please refer to either [http://bleaklow.com/2012/12/02/driving_the_ws2811_at_800khz_with_a_16mhz_avr.html Alans page] or the [http://www.nooelec.com/files/WS2811.pdf datasheet]. In summary, the microcontroller should send a serial signal containing 3 bytes for every LED in the chain, in GRB-order. The bits of this signal are encoded in a special way. See the figure below.
  
 
[[File:Ws2811 waveform.png|illustration of a WS2811 waveform]]
 
[[File:Ws2811 waveform.png|illustration of a WS2811 waveform]]
Line 32: Line 86:
 
Oh yes, and that is of course in addition to actually switching the output levels.
 
Oh yes, and that is of course in addition to actually switching the output levels.
  
All of that does not fit into a single 10-clock time frame. Luckily, it doesn't have to: why not, instead of having a 10-clock loop over one bit, use a 20 tick loop over 2 bits? That is the central idea behind the code: at the cost of extra program space, create a sequence of assembly instructions that falls into 4 different states for every possible two-bit combination, instead of 2 states for every possible bit value. As it turns out, adding the end-of-byte and end-of-data criteria increases the number of states to 10.
+
All of that does not fit into a single 10-clock time frame. Luckily, it doesn't have to. My first version of this driver partially unrolled the bit loop into a 2-bit loop. This allowed all those actions described above to fit within the loop, but it also required 4 versions of the loop (one for every 2-bit combination). The code would jump from one version of the loop to the other as appropriate.
 +
 
 +
When writing code for the 9.6 Mhz version and the [[Driving a large WS2811 LED string with an ATtiny13 and nothing else|version for sparse LED strings]] (strings where most LEDs were off), I figured out a way where I could basically have one small loop for each bit but where the code for the last two bits would be unrolled, giving enough time to fetch the next byte and reset the bit counter. This resulted in the much smaller driver code that I have now.
  
 
==Defining the puzzle==
 
==Defining the puzzle==
Juggling with so many states, jumping from one to the other without introducing phase errors turns out to be ''interesting''. I spent a couple of lonely lunch breaks and several pages in my little (paper!) notebook before I even figured out how to describe the problem. When a notation became clear, however, the going was easy enough and this exercise turned into one of the nicer kinds of pastimes. Because there is no way I can relate the amount of work that went into solving the puzzle, here comes the full code in my notation system of choice:
+
===Inventing a notation===
 
+
Juggling with many states, jumping from one piece of code to the other without introducing phase errors turns out to be ''interesting''. I spent a couple of lonely lunch breaks and several pages in my little (paper!) notebook before I even figured out how to describe the problem. When a notation became clear, however, the going was easy enough and this exercise turned into one of the nicer kinds of pastimes.  
[[File:Ws2811 instruction table.png|572px]]
 
 
 
The image above shows pseudo assembly code in the yellow blocks. To the left of each yellow block is a graphic representing the wave form being generated. Tilt your head to the right to see the more conventional waveform graphic. Each horizontal row in the yellow blocks represents a clock tick, not necessarily an instruction word. To the left of each waveform graphic there are numbers from 00 to 19 that represent the "phase" at the corresponding clock tick.
 
 
 
The yellow code blocks are organized into 4 columns, one for every two-bit combination. As can be seen from the waveforms, these columns, from left to right, represent the combinations "10", "11", "00" and "01" respectively.
 
  
What makes this notation so convenient is the fact that I can now easily determine the waveform phase at each point in the code and can also check whether a jump lands in the correct phase. Each jump at phase n (0 < n < 20) should land at a label which is placed at phase n + 2 (modulo 20). Put differently: each jump should be to a label that is two lines down from the jump location (or 18 lines up).
+
[[File:Ws2811 driver code.png|link=https://github.com/DannyHavenith/ws2811/blob/master/design/ws2811@8Mhz.ods?raw=true]]
  
The drawn waveforms make it easy to verify that when I jump from the middle of a wave, the code lands in a place where that same wave form is continued. It also shows clearly where the 'up' and 'down' pseudo statements need to go.
+
The image above shows the full code for the driver in a spreadsheet with pseudo assembly code in the yellow blocks. To the left of each yellow block is a graphic representing the wave form being generated. Tilt your head to the right to see the more conventional waveform graphic. The blue blocks show where the signal could be high or low, depending on the current bit value being sent. Each horizontal row in the yellow blocks represents a clock tick, not necessarily an instruction word. To the left of each waveform graphic there are numbers from 00 to 19 that represent the "phase" at the corresponding clock tick. Phases 00-09 are those of the first 7 bits, phases 10-19 are those of the last bit.
  
==What the code does==
+
What makes this notation so convenient is the fact that I can now easily determine the waveform phase at each point in the code and can also check whether a jump lands in the correct phase. Each jump at phase n (0 <= n < 09) should land at a label which is placed at phase n + 2 (modulo 10), because jumps take 2 clock cycles. Put differently: each jump should be to a label that is two lines down from the jump location (or 8 or 18 lines up).
The code starts (not shown in the tables above) with the first data byte in the 'data' register, left-shifted so that the first bit (the most significant bit) is in the Carry flag. From there, depending on the carry flag the code should then continue to the label L0x or L1x. These labels are in the top left and top right of the illustration. This is where the code in the tables takes off.  
 
  
At labels L0x and L1x, the first bit of the bit-pair is known, but the second isn't (hence the 'x' in L1x). The code will now determine the value of the second bit, and if necessary switch tracks (take a jump to a block of code that is one column to the left or right). An example of such a track switch is the jump to L00 right after label L0x.
+
The drawn waveforms make it easy to verify that when I jump from the middle of a wave, the code lands in a place where that same wave form is continued. It also shows clearly where the 'up' and 'down' statements that do the actual signal levels need to go.
  
Now that we're in the right column, we know how the coming 2 bits will play out and we can concentrate on the other tests that need to be performed. The first test is whether we're at the final two bits of the byte. If that isn't the case, the code will remain in its yellow block and only needs to determine the value of the next bit (the first bit of the next bit-pair). At the end of these blocks, code will return to L1x and L0x again.
+
Wherever there is a "^^^" in the table, it means that the previous instruction takes 2 clock cycles, so that particular clock cycle still belongs to the previous instruction.
  
If we detect that we're at the end of a byte, the code jumps one row of blocks down, to one of the M-labels. At this point we're well underway in the first bit and a lot of things need to be done before we can jump to one of the L0x/L1x labels again. The following tasks are done in whatever order fits best in between the "up"s and "down"s:
+
===How the code works===
* reset the bit counter to 4 (we're counting bit ''pairs'')
+
In summary, the code works as follows: The start of a bit waveform occurs at label ''s00''. At this point the value of the bit to be sent is assumed to be in the carry flag. The line is pulled high and if the current bit (carry flag) is a zero bit, it is pulled low two clock cycles later. Then a bit counter is decreased and if we're not in the second-to-last bit, we continue the second half of the waveform by jumping to label ''cont06'', which is above ''s00''. From ''cont06'' the code just waits a while, then brings the line down (which has no effect if the line was already brought down) and shifts the next bit from the data byte into the carry flag. From here the code falls back into label s00, ready to transmit the next bit.
* load the next byte from memory. This takes 2 clock cycles and therefore this instruction takes two rows in the spreadsheet (remember that a row represents a clock cycle, not an in