Driving the WS2811 at 800 kHz with an 8 MHz AVR

From Just in Time
Jump to: navigation, search

WS2811 LED controllers are hot. Projects using WS2811 led strips have been featured on HackaDay several times in the last few months. One feature showed how an AVR clocked at 16Mhz could send data at the required high rates. Inspired by this, I ordered an LED strip and 16Mhz oscillators from ebay. The LED strip arrived quickly, only the oscillators took weeks to arrive, which gave me plenty of time to think about the possibility of driving these led strips from an 8Mhz atmega88 without an external oscillator. With only 10 clock ticks per bit, this was going to be a challenge.

It turns out that this is doable, with the same output timing as the 16Mhz version...

On this page I'll describe how to drive a WS2811 from an 8Mhz or 9.6Mhz AVR like an Atmel ATmega88, ATtiny2313 or ATtiny13 without added components such as an external oscillator.

In order to write the first version of this time-critical code:

  • I used a spreadsheet (or in fact: some tabular layout) instead of an IDE so that I could keep an eye on the timing of each instruction I write;
  • I combined assembly code fragments using a dedicated C++ program to minimize jump distances.

In the latest version of the driver, the C++ program wasn't needed anymore, because the current driver is so small (32 instructions) that it falls completely within the range of an AVR conditional jump instruction.



For the impatient: example source code can be found here. The code comes as an avr-eclipse project consisting for a large part of C++ demonstration code and the main driver function in assembly, in files ws2811_8.h and ws2811_96.h (for the 9.6Mhz version). I don't recommend trying to understand the assembly code by reading these sources. How the code functions is described below. Usage information can be found after the videos. The rest of this page describes the 8Mhz version. The 9.6Mhz code was added later, but is created in the same way.


And of course, if you're creating a hardware project that controls more than 1 LED, you're going to have to demonstrate it with a Knight Rider display (which, I just learned, is actually called a Larson scanner)... The sources for all the demonstrations in these videos can be found on github.

Knight Rider on Steroids (source code)
"Water Torture" or "Lava Drops" demo (source code)
Special sparse driver allows an attiny13 to drive arbitrarily large LED strings from 64 bytes of memory
Flares demo on an attiny13 (source code)


You'll need the C++ compiler for this to work (turning ws2811.h into "pure C" is left as an exercise to the reader). I am told that this works just as good for an Arduino, but I haven't tested this myself. From the sources, you'll need files ws2811.h, ws2811_8.h, ws2811_96.h and rgb.h. A simple example of how to use this code:

#include <avr/io.h> // for _BV()
#define WS2811_PORT PORTD// ** your port here **
#include "ws2811.h" // this will auto-select the 8Mhz or 9.6Mhz version
using ws2811::rgb;
namespace {
  const int output_pin = 3;
  rgb buffer[] = { rgb(255,255,255), rgb(0,0,255)};
int main()
  // don't forget to configure your output pin,
  // the library doesn't do that for you.
  // in this example DDRD, because we're using PORTD.
  DDRD = _BV( output_pin);
  // send the RGB-values in buffer via pin 3
  // you can control up to 8 led strips from one AVR with this code, as long as they
  // are connected to pins of the same port. Just 
  // provide the pin number that you want to send the values to here.
  send( buffer, output_pin);
  // alternatively, if you don't statically know the size of the buffer
  // or you have a pointer-to-rgb instead of an array-of-rgb.
  send( buffer, sizeof buffer/ sizeof buffer[0], output_pin);


Normally I'd go straight to the datasheet and start working from there, but in this particular case the datasheets are not so very informative. Luckily, the HackaDay links provide some excellent discussions. This one by Alan Burlison is especially helpful. That article not only explains in great detail why a library like FastSPI isn't guaranteed to work, but it comes with working code for a 16Mhz AVR that appears rock solid in its timing.

Small problem: I didn't have any 16Mhz crystals on stock, so I ordered a few, on ebay again and sat back for the 25 day shipping time to pass. 25 Days is a long time. The led strip had arrived and was sitting on my desk. 25 Days is a really long time. Maybe it could work off an AVR on its internal 8Mhz oscillator? It would be a lot of work. But 25 days is a very, very, long time.

So, that is how I got to sit down and write my 8Mhz version of a WS2811@800Khz bit banger. The challenge is of course that I have 10 clock cycles for every bit, no more no less, and 80 cycles for every byte, no more no less. I wanted the timing to be as rock-steady as Alans, give-or-take the imprecise nature of the AVR internal oscillator.

The challenge

For a full description of the required protocol to communicate with a WS2811, please refer to either Alans page or the datasheet. In summary, the microcontroller should send a serial signal containing 3 bytes for every LED in the chain, in GRB-order. The bits of this signal are encoded in a special way. See the figure below.

illustration of a WS2811 waveform

This image shows a sequence of a "0" followed by a "1". Every bit starts with a rising flank. For zeros, the signal drops back to low "quickly" while for ones the signal stays high and drops nearer the end of the bit. I've chosen the following timing, in line with Alans observations and recommendations:

  • Zero: 250ns up, 1000ns down
  • One: 1000ns up, 250ns down

Giving a total duration of 1250ns for every bit, or 10μs per byte. These timings do not fall in the ranges permitted by the data sheet, but Alan describes clearly why that should not be a problem. 1250ns means 10 clock ticks per bit. That is not a lot. A typical, naive implementation would need to do the following things at every bit:

  1. determine whether the next bit is a 1 or a 0
  2. decrease a bit counter and determine if the end of a byte has been reached, if at the end:
    1. determine if we're at the end of the total sequence
    2. load a new byte in the data register
    3. decrement the byte counter
    4. reset the bit counter
  3. jump back to the first step

Oh yes, and that is of course in addition to actually switching the output levels.

All of that does not fit into a single 10-clock time frame. Luckily, it doesn't have to. My first version of this driver partially unrolled the bit loop into a 2-bit loop. This allowed all those actions described above to fit within the loop, but it also required 4 versions of the loop (one for every 2-bit combination). The code would jump from one version of the loop to the other as appropriate.

When writing code for the 9.6 Mhz version and the version for sparse LED strings (strings where most LEDs were off), I figured out a way where I could basically have one small loop for each bit but where the code for the last two bits would be unrolled, giving enough time to fetch the next byte and reset the bit counter. This resulted in the much smaller driver code that I have now.

Defining the puzzle

Juggling with many states, jumping from one piece of code to the other without introducing phase errors turns out to be interesting. I spent a couple of lonely lunch breaks and several pages in my little (paper!) notebook before I even figured out how to describe the problem. When a notation became clear, however, the going was easy enough and this exercise turned into one of the nicer kinds of pastimes.

Ws2811 driver code.png

The image above shows a spreadsheet with pseudo assembly code in the yellow blocks. To the left of each yellow block is a graphic representing the wave form being generated. Tilt your head to the right to see the more conventional waveform graphic. The blue blocks show where the signal could be high or low, depending on the current bit value being sent. Each horizontal row in the yellow blocks represents a clock tick, not necessarily an instruction word. To the left of each waveform graphic there are numbers from 00 to 19 that represent the "phase" at the corresponding clock tick. Phases 00-09 are those of the first 7 bits, phases 10-19 are those of the last bit.

What makes this notation so convenient is the fact that I can now easily determine the waveform phase at each point in the code and can also check whether a jump lands in the correct phase. Each jump at phase n (0 <= n < 09) should land at a label which is placed at phase n + 2 (modulo 10), because jumps take 2 clock cycles. Put differently: each jump should be to a label that is two lines down from the jump location (or 8 or 18 lines up).

The drawn waveforms make it easy to verify that when I jump from the middle of a wave, the code lands in a place where that same wave form is continued. It also shows clearly where the 'up' and 'down' statements that do the actual signal levels need to go.

Wherever there is a "^^^" in the table, it means that the previous instruction takes 2 clock cycles, so that particular clock cycle still belongs to the previous instruction.

In summary, the code works as follows: The start of a bit waveform occurs at label s00. The line is pulled high and if the current bit is a zero bit, it is pulled low two clock cycles later. Then a bit counter is decreased and if we're not in the second-to-last bit, we continue the second half of the waveform by jumping to label cont06, which is above s00.

If we're in the second-to-last bit, the code continues downward. We need to free up the data register for the next byte, so we quickly test the last bit of the current byte and then branch into one of two essentially equivalent pieces of code. The code on the left hand side generates a "1"-waveform, while the code on the right generates a "0" for the last bit of the byte. In between the OUT-instructions we find some free cycles to reset the bit counter (to 7), to load the next byte and to decrease the 16-bit byte counter. If indeed there is a next byte to send, we jump up to either label cont07 or cont09 where the rest of the bit waveform is generated before we continue with the bits of the next byte.

Combining the code

The latest version of the code is pretty small (32 instructions/64 bytes), but earlier versions were bigger, requiring jumps over longer address distances. This posed a problem, because a jump from the end of the code right to the beginning would be too long for the branch instructions of the AVR.

Note how all conditional jumps are in the form of branch instructions ("BRCC", "BREQ", etc). There is one important limit to these relative branches, they can only jump to a range of [PC - 63, PC + 64] (with PC the address of the jump instruction)! Any instruction more than 64 instructions away from the branch cannot be reached.

At first I tried to piece the code together manually in a spreadsheet that would calculate the maximum jump distance for me. After a few failed attempts I gave up and decided that computers are better at this. In the end, I just wrote a dedicated program in C++ that uses some common sense heuristics to shuffle the blocks of code around until it finds a sequence in which all jumps are within range.

After this, it became a matter of just pasting the code blocks into one sequence and changing some of the pseudo instructions into real instructions.


The main point of this text is not that I can show 4 (four!) Larson scanners in one led strip. Actually there are two different points I am trying to make:

First of all, it is possible to control WS2811 led strips from an AVR without external 16 Mhz oscillator and I want to tell the world.

Secondly, during this exercise I discovered that this kind of extremely time-critical code can be solved with a number or techniques:

  • unrolling loops. That is not a new technique, but in this case it not only saves on the number of test-and-jump-to-the-starts (the normal reason to unrol a loop), but also decreases the number of other tests and allows me to sweep a few precious left-over clock cycles into contiguous blocks.
  • when code is "phase critical", abandon the idea of a list-of-instructions and organize the code in "phase aligned" side-by-side blocks, where a jump is most often a jump "to the right" or "left".
  • Use software to optimize code layout in memory. I am not aware of any assembler that will automatically do this when jump labels are out of reach, but I know I have wished for it more than once.

Comments? Questions?

16 February 2013 08:25:36
Great stuff, just ordered some led strips...

could you tweak this code to run at 16mhz? i got a small atmega32u2 board with 16mhz crystal id like to play with..

16 February 2013 12:12:40
Any reason you couldn't use Alan Burlisons version (http://bleaklow.com/2012/12/02/driving_the_ws2811_at_800khz_with_a_16mhz_avr.html)? It does the same thing at 16Mhz. My version would take up more program space if I added the NOPs to bring it down to half the speed.

I do have a 9.6Mhz version forthcoming, so that I can run a few of these LEDs with a $0.66 attiny13.
2 April 2013 20:42:20
Hi Danny,

I'm trying to get this to run on an 8mhz arduino, and it doesn't appear to be working (ATMega328p with 8mhz crystal).
I've got it wired up properly (I wrote simple code that is just PORTD=0x08, 8x NOP, PORTD=0x0, 2x NOP), and sent it 24 times. And I get a white pixel from pin 3 as expected.
However, your code doesn't appear to do anything.
My code is simply:
#define WS2811_PORT PORTD
#include "WS2811.h"
#define NUM_LEDS 2
rgb buffer[NUM_LEDS];
void setup() {
   buffer[0] = rgb(255,255,255);
   buffer[1] = rgb(0,0,255);
void loop() {

Am I do anything obviously wrong? Would you expect your code to work on an arduino?
2 April 2013 20:46:05
Erm. I didn't set my pinMode. Sorry! Not sure how to delete a comment... It's working great now!
2 April 2013 21:56:55
Cool, thanks for the feedback. I'm glad it works for you too now. If you don't mind I'll leave your comment here and also add a remark in the instructions for use, so that others are reminded to set their pin modes as well...

I expect this type of thing happens quite often. I know that I've spent an impressive amount of time debugging projects where I had forgotten to set the correct pin modes.
28 May 2013 05:35:31
if you could manage to use that to feed input from a PC/RasPI/... directly to the LEDs you could get some highlevel control using http://niftyled.de
(e.g. to combine various multiple LED chains) We're always happy about testers/contributors :)
12 July 2013 20:36:22
I've been working on a browser-based JavaScript simulation of LED Strips, and I've been converting your color driver code. So far I've done Flares, Chasers, and Water Torture. Some of the code is still a bit sloppy, and the Water Torture code is currently the best encapsulation. I'm getting ready to refactor the others, then finish up with the Color Cycle code.

I might abstract some bits a little more before I'm done. It's basically a framework so that I can test my own color changing algorithms before going to hardware.


I'll put it up on Github soon, too.
12 July 2013 23:02:16
That's impressive! This could be very useful to test new patterns. On linux, I had to switch from firefox to chromium to make it work, but it's definitely promising. I can imagine that this is especially interesting for those who want to create different Christmas lighting patterns...
30 August 2013 20:19:32
What compiler are you using?  I'm trying to build this in Atmel Studio 6.1 (AVR-GCC/AVR-G++), and ws2811_8.h gives a compiler error

"inconsistent operand constraints in an 'asm'"

at line 177.
30 August 2013 22:10:31
I'm running this with AVR-GCC on linux from within eclipse. The avr-g++ version I have is 4.7.2 ("avr-g++ -v"). For which MCU are you compiling this? Could it be that your MCU doesn't have a C-port? In that case you can change the port by defining WS2811_PORT before including ws2811_8.h
Running avrdude from eclipse under linux
1 September 2013 07:02:17
I was trying to compile for the ATtiny10, and I did change the port to PORTB.  Tried targeting the Atmega88 and it worked fine, so it must just be the chip.  I was hoping the ATtiny would have worked.  Unlike some of the others like the ATtiny13, it actually does have an 8MHz internal oscillator, and the SOIC-8 package would be pretty nice...
1 September 2013 07:11:26
Looks like the ATtiny45/85 will work.  Or at least it compiles.
1 September 2013 10:38:25
I'd like to reproduce the issue, but my version of avr-gcc won't even compile for attiny10, it says "avr-g++: error: unrecognized argument in option ‘-mmcu=attiny10’".
1 September 2013 11:11:38
Btw., given the methods described on this page, it should be possible to write an attiny13/9.6Mhz version. Single bits should then take 12 cycles where the short pulse takes 3 and the long pulse takes 9 cycles. That will render a shape that fits the specs for the ws2811. I guess that would be worthwhile since the tiny13 seems to be the cheapest version around (at around $5 for 10pcs).
I may spend some time on this, but don't let that stop you from trying it yourself, it's fun :-)
1 September 2013 23:27:21
I finally got around to refactoring my code some, and it's up on Github:


I was having problems converting your color cycle code to work right in my JavaScript framework. I might try to go back and revisit it sometime. The ColorWave driver is my own from-scratch replacement. My JS version uses native Math.sin(), but in Arduino land people might want to use integer-based approximations (there are plenty of code samples around).

The Chasers code was the first pattern driver I converted, and I was a little looser in translating from your code. With Water Torture, I started with a much more direct conversion, basically just copying your code and modifying it "in-place" into JavaScript. However, I know it has some issues, and I put a couple of "cheats" in.

But basically, I'm pretty happy with how the actual ws2812.js and ledstrip.js code came out.
1 September 2013 23:33:38
Oh, and I had Flares working with my original code, but I still need to refactor it to work with API updates in my ledstrip code since then. In its current form, it's pretty married to the assumption that certain global variables are available, and I want to make it more of a self-contained black-box. I'll probably end up changing it significantly from your original code.
3 September 2013 07:03:56
Yeah, apparently the ATtiny4/5/9/10 chips are apparently pretty notoriously difficult to work with, but the $0.70 price tag and SOT-23 package are pretty sweet, especially for a project like this.
3 September 2013 22:21:11
OK. The GIT-repository now has updated code for 9.6Mhz MCUs like the attiny13. Be aware that that most of the demos for my code won't fit in an attiny13 (both flash and RAM).
5 September 2013 22:10:31
They're still looking good. Will you be moving some of your versions of the animations to silicon as well?
14 October 2013 21:37:12
This is great.

One quick question.  I am trying to run it on hardware that has output pins on both PORTB and PORTD.  I can get the LEDS to work on one or the other, but not both.

Any thoughts?
19 October 2013 09:50:26
This code can send on one port at a time only and can't switch ports at run-time. If you're willing to spend twice the amount of code space, you could adapt ws2811_8.h so that you can include it twice, re-defining WS2811_PORT each time. You would need to remove the include guards, of course. It's not trivial, but it can be done:

One reasonably easy way to do that is to wrap the send function in a namespace. You can't use the token "WS2811_PORT" to generate that namespace name (all kinds of trouble because that token resolves to something like 'PORTB', which in turn gets preprocessed into other tokens) , so it would be easiest to just create a second define WS2811_NAMESPACE.

In summary, you could wrap the send(...) function (the three-argument one) in a namespace: "namespace WS2811_NAMESPACE { ..." and then, each time when you include ws2811.h you define both WS2811_PORT and WS2811_NAMESPACE. Finally, you can send to the right port using a statement like ws2811::namespace_for_port_x::send( my_leds, led_count, pin). Typicially in a function, you would state "using ws211::namespace_for_port_x" and then just use the send()-function.
18 January 2014 19:40:12
Hello Danny!
I appreciate your effort for writing this code, it's great.
Managed to compile your code and upload to an attiny2313, but some strange thing happens:
when connecting the attiny's output pin to the ledstip's DIN, some random, changing colors appears on the strip.
If the DIN of the ledstrip is not connected to attiny's pin, and I power cycle (turn down and then up) the dedicated power source of the strip, all leds are off.
Analyzed the signal generated by the program using a digital analyzer.
I would like to have a confirmation from you, if this is the desired output signal:
- 0 bit: 250us at 5v, 1000us at 0v
- 1 bit: 1000us at 5v, 250us at 0v

My code looks like:
 #define WS2811_PORT PORTB
 DDRB = _BV(channel);
 ws2811::rgb colors[1];
 colors[0].red = 0xff;
 colors[0].green = 0x00;
 colors[0].blue = 0xff;

I would appreciate a hint for solving this problem.

18 January 2014 19:58:35
0 bit: 0.25us at 5V and 1.0us at 0V
1 bit: 1.0us at 5V and 0.25us at 0V
18 January 2014 20:57:48
Hi Adam,
The timing you're seeing is indeed as intended. You should see the pin at 5V in its normal state, then drop low for at least 40us, then 24 bits of grb and finally the line should be at 5V again. There's nothing obviously wrong with the code you're showing as well. Are you defining the WS2811_PORT symbol before including ws2811.h? And how is channel defined?
18 January 2014 22:20:00
Hei Danny,

Thanks for your quick answer!
static const uint8_t channel = 5;
#define F_CPU 8000000UL
#define WS2811_PORT PORTB
#include "ws2811.h"

Meanwhile I found out, that my ledstrip probably is not using ws2811 but ws2812 (led is integrated with control chip)
The chip on the strip has 6 pins (like in ws2812 spec); ws2811 has 8 pins.
In 2812 datasheet the timings are other that in the 2811 datasheet.
Probably this is the problem.

Thanks, again, for your answer
Best regards,
19 January 2014 18:08:24
Hey Danny,

Managed to find the problem, it seems that's the power source used (a PC power source).
Cut one led off the strip, and powered using the power from usb port, it's working.

Thanks again, for your feedback,
19 January 2014 21:45:10
You're welcome Adam. Good luck with your project!
21 January 2014 14:57:50
Hello Danny!
The problem was caused by using separate ground for microcontroller and ledstrip...
If both run on the same power source, it's working!

Best regards,
14 February 2014 21:16:39
Can someone did it run in C? If it is possible please main.c code because I can not cope.
15 February 2014 10:03:43
All the demos are pretty much tied to C++. The basic driver code however should be relatively easy to adapt to a plain C compiler: (1) in ws2811_8.h and ws2811_96.h remove the namespace declaration, (2) in ws2811.h remove the whole part that's within the namespace declaration (plus the declaration itself). And finally in rgb.h remove the constructors.

You have to remember then, that the structs must be initialized in GRB-order, so a string of yellow leds must be initialized as:

struct rgb yellow[] = { {255, 255, 0}, { 255, 255, 0} /*etc...*/};
17 February 2014 20:07:07
Unfortunately, I think I can not handle with this code made ​​into pure c
17 February 2014 20:33:17
What the Library are necessary for the effect of water
17 February 2014 21:08:21
Maybe he found someone who would have to help me run water_torture in pure c eclips bigjack at wp.pl
18 March 2014 18:57:26
Apart from the WS2811 driver code described here, there is no need for a library. I've added hyperlinks to the source code under each of the videos.
Personal tools