Actions

Difference between revisions of "Fast, arduino compatible digital pin functions"

From Just in Time

 
(40 intermediate revisions by the same user not shown)
Line 1: Line 1:
{{WIP}}
 
 
[[File:Lotsapins.jpg|right|300px]]
 
[[File:Lotsapins.jpg|right|300px]]
 +
This page describes a library that offers overloads of Arduino's digital pin functions (''digitalWrite()'', ''digitalRead()'', ''shiftOut()'', ''shiftIn()''), but with native performance. For example, the ''digitalWrite()'' function of this library produces 1 inlined assembler instruction and runs in 2 clock cycles instead of Arduino's 50+ cycles. This library can be used to include Arduino code in "raw" AVR projects without suffering a memory footprint hit. It is also offered as an Arduino library, providing a much faster implementation of the digital pin functions.
  
 +
The library is header file only and produces binary code that is as fast and as small as hand-crafted assembly code. When changing more than one bit at a time, the resulting code is generally faster than the C-style equivalent using ''#define''d macros and bitwise logical operators.
 +
 +
[[Category:AVR]][[Category:C++]]
 
==Fast or readable?==
 
==Fast or readable?==
When reading, or setting pin values on AVRS, there are typically only two options: the readable way or the fast way. The readable way is offered by the Arduino platform and consists of digital pin functions like ''digitalRead'', ''shiftOut'', etc. The fast way is available both on Arduino and on ‘raw’ AVR and consists of bit-wise AND- and OR-operations. In code this looks like this:
+
When reading or setting pin values on AVRS, there are typically only two options: the readable way or the fast way. The readable way is offered by the Arduino platform and consists of digital pin functions like ''digitalRead'', ''shiftOut'', etc. The fast way is available both on Arduino and on ‘raw’ AVR and consists of bit-wise AND- and OR-operations. In code this looks like this:
  
 
<source lang="cpp">
 
<source lang="cpp">
Line 26: Line 29:
  
 
#define MYPINPORT PORTB
 
#define MYPINPORT PORTB
 +
#define MYPINDDR  DDRB
 
#define MYPINMASK _BV(4)
 
#define MYPINMASK _BV(4)
  
 +
void setup()
 +
{
 +
  MYPINDDR |= MYPINMASK;
 +
}
 
void loop()
 
void loop()
 
{
 
{
Line 33: Line 41:
 
     MYPINPORT &= ~MYPINMASK;
 
     MYPINPORT &= ~MYPINMASK;
 
}
 
}
 +
</source>
  
</source>
+
Apart from the obvious differences, there is another, more subtle difference between the two approaches: when giving a name to a pin function, this is typically done by using an integer variable on Arduino, which is resolved at run-time. The pins and ports for bitwise operations are normally declared using preprocessor macros, which get resolved at compile time. There are hybrid cases, such as the Arduino SoftwareSerial library<ref name="softSerial">[https://github.com/arduino/Arduino/blob/master/hardware/arduino/avr/libraries/SoftwareSerial/SoftwareSerial.cpp SoftwareSerial source code], take a look at the tx_pin_write()- or rx_pin_read()-methods</ref>, where a pin number is converted once into a bit mask at run time and then that bit mask is used for the remainder of the program in logical AND- and OR operations.
Apart from the obvious differences, there is another, more subtle difference between the two approaches: when giving a name to a pin function, this is typically done by using a variable on Arduino, which is resolved at run-time, while the pins and ports for bitwise operations are normally declared using preprocessor macros, which get resolved at compile time.
+
 
 +
It is partly the run-time resolving of pin numbers that makes Arduino’s digital pin functions so slow<ref>well, that, and checking whether the pin is maybe used for PWM at the time, see the implementation of the [https://github.com/arduino/Arduino/blob/master/hardware/arduino/avr/cores/arduino/wiring_digital.c digitalWrite()-function].</ref>; simply setting or clearing a single output pin will set you back more than 50 clock cycles!<ref>[http://billgrundmann.wordpress.com/2009/03/03/to-use-or-not-use-writedigital/ "To use or not use digitalWrite"], a blog by Bill Grundman who, unlike me, is not too lazy to hook up a scope to his AVR and do measurements</ref> As a developer, you have to choose between fast, but less readable bitwise operators in combination with macros, or readable but slow Arduino digital pin functions.
  
It is partly the run-time resolving of pin numbers by Arduino’s digital pin functions that makes them so slow; simply setting or clearing a single output pin will set you back more than 50 clock cycles!<ref>[http://billgrundmann.wordpress.com/2009/03/03/to-use-or-not-use-writedigital/ "To use or not use digitalWrite"]</ref> As a developer, you have to choose between fast, but less readable bitwise operators in combination with macros, or readable but slow Arduino digital pin functions, or so it seems…
+
This choice can be clearly seen when looking at the implementations of several standard Arduino libraries. The strictly timed SoftwareSerial library, as mentioned before, uses bitwise operators<ref name="softSerial"/>, while for example the LiquidCrystal library can afford to use the digital pin functions<ref>[https://github.com/arduino/Arduino/blob/master/libraries/LiquidCrystal/src/LiquidCrystal.cpp LiquidCrystal source code]</ref>.
  
 
At the same time, both ‘raw’ AVR developers and Arduino developers use AVR-GCC, which is a full-flexed, modern C++ compiler. Modern C++ compilers allow techniques like template meta programming (TMP) which in turn allows the compiler to perform almost arbitrarily complex processing before generating the assembler instructions that end up in your AVR’s firmware. Shouldn’t it be possible to use the compiler to solve the readable/fast dilemma?
 
At the same time, both ‘raw’ AVR developers and Arduino developers use AVR-GCC, which is a full-flexed, modern C++ compiler. Modern C++ compilers allow techniques like template meta programming (TMP) which in turn allows the compiler to perform almost arbitrarily complex processing before generating the assembler instructions that end up in your AVR’s firmware. Shouldn’t it be possible to use the compiler to solve the readable/fast dilemma?
Line 43: Line 53:
 
It should, and it is.
 
It should, and it is.
  
==Fast and readable==
+
==Fast ''and'' readable==
 
Look at the following code and spot the differences with the digital pin functions as shown earlier
 
Look at the following code and spot the differences with the digital pin functions as shown earlier
  
Line 63: Line 73:
 
</source>
 
</source>
  
You see? Almost the same code. One difference that you can’t see is that now the digitalWrite function compiles into 1 assembler instruction, taking 2 clock cycles.
+
You see? Almost the same code. One difference that you can’t see is that now the <code>digitalWrite()</code> and <code>pinMode()</code> functions each compile into 1 assembler instruction that executes in 2 clock cycles.
  
The big difference of course is that the digital pin functions you see called in the code above are not the same ones as the original Arduino ones. These are overloads. The other big difference is that ''myPin'' in the code above is no longer an integer which is read at run-time to determine the port. Instead it is now a variable of some special type, where the pin number is actually a part of the type, not a value. This means that no time needs to be spent at run time to determine which hardware address to use and which bit to set in that hardware.
+
The big difference is the <code>#include "FastPins.h"</code>: now the digital pin functions are not the original Arduino ones, but overloads. The other difference is that ''myPin'' in the code above is no longer an integer which is read and interpreted at run-time. Instead, it is a variable of some special type, where the pin number is part of the type. This means that no time is spent at run time to determine which hardware address to use and which bit to set in that hardware.
  
The generated code is also smaller; in my Arduino environment, the binary size for the code above shrinks from 882 bytes to 472 bytes. If I throw in a ''shiftOut'', code shrinks from 1042 bytes to 508 bytes.
+
The generated code is also smaller; in my Arduino environment, the binary size for the code above shrinks from 882 bytes to 472 bytes. For code that contains a ''shiftOut''-call, code shrinks from 1042 bytes to 508 bytes.
  
An additional advantage is that now, if you specify a nonsensical pin number (like, say, 42) you will be punished with a compiler error instead of being silently ignored at run time, which is the Arduino treatment.
+
Also: if you specify a nonsensical pin number (like, say, 42) you will be punished with a compiler error instead of being silently ignored at run time, which is the Arduino treatment.
  
There are disadvantages, but let me just show you one more advantage: these functions in the FastPins library are designed to allow setting or clearing several bits at the same time. The library will generate optimal code, which means that if a call simultaneously sets or resets bits in the same AVR port, the fastest code will be emitted to do that. For example:
+
I'll describe some disadvantages later&mdash;or rather: consequences, but let me just show you one more advantage: if you want to squeeze the last clock tick from your code, the functions in the FastPins library allow setting or clearing several bits at the same time. The library will generate optimal code, which means that if a call simultaneously sets or resets bits in the same AVR port, the fastest code will be emitted to do that. For example:
  
 
<source lang="cpp">
 
<source lang="cpp">
Line 92: Line 102:
 
   digitalWrite( led1 | led2 | led3, HIGH);
 
   digitalWrite( led1 | led2 | led3, HIGH);
 
}
 
}
 +
</source>
 +
In the above code we're asserting pins led1, led2 and led3 at the same time. Because led1 and led3 are in the same AVR port, they offer an optimization opportunity. Instead of creating code that is equivalent to this C-code:
  
 +
<source lang="cpp">
 +
    PORTB |= LED1_MASK; // 2 clocks
 +
    PORTD |= LED2_MASK; // 2 clocks
 +
    PORTB |= LED3_MASK; // 2 clocks
 
</source>
 
</source>
 +
 +
The library generates code that is equivalent to the following C-code:
 +
 +
<source lang="cpp">
 +
    PORTB |= (LED1_MASK | LED3_MASK); // 3 clocks
 +
    PORTD |= LED2_MASK;              // 2 clocks
 +
</source>
 +
 +
...which results in slightly faster assembly code<ref>The compiler's optimizer will never combine these OR expressions by itself because the ports are declared ''volatile''. Successive writes to a volatile register will never be combined into a single write.</ref>. The gain is not enormous compared to the already fast new overloads (1 clock tick in the example above), but adding even more pins is free (will happen in the same 3 clock ticks) and in tight inner loops it feels good to know that there is absolutely no faster way to toggle the pins.
 +
 +
==Consequences==
 +
FastPins is not completely a drop-in replacement for Arduino pin functions. Because declared pins are not integers anymore, but all have different types, they cannot simply be used in existing Arduino libraries. The following is currently not possible:
 +
 +
<source lang="cpp">
 +
#include "FastPins.h"
 +
using namespace FastPins;
 +
 +
DigitalPin<12>::type rxPin;
 +
DigitalPin<11>::type txPin;
 +
 +
SoftwareSerial mySerial(rxPin, txPin);
 +
</source>
 +
 +
Although it is certainly possible to create the conversion operators to translate DigitalPins to integers, the SoftwareSerial object would still have the "old" Arduino performance. In fact, any library that would want to use fast pin definitions and that supports configurable pins, should be written as a template that accepts either DigitalPins or pin numbers as template arguments. A fastpin-version of SoftwareSerial would then be instantiated as:
 +
 +
<source lang="cpp">
 +
SoftwareSerial<rxPinType, txPinType> mySerial;
 +
</source>
 +
 +
Additionally, the SoftwareSerial becoming a template would mean that most of its implementation would move to a header file and its methods would be instantiated repeatedly for each rx/tx combinatition. Particularly the SoftwareSerial could be problematic, because people often use this class out of a need to have more than one serial port. For most other libraries however, this is not a problem because these libraries control some piece of hardware and inherently implement singletons.
 +
 +
==Writing libraries==
 +
I've implemented some libraries using this type of pin definitions. As stated above, any such library should take its pin definitions as template arguments because pins are now defined as types, not integer values.
 +
 +
Still, more choices need to be made, because there are two possible styles of providing the pin definitions: provide a struct with named members for each of the pins, or provide each pin as a template argument. Currently, I'm using the first approach. For example, the fast-pins version of a bit-banging SPI device is used as follows:
 +
 +
<source lang="cpp">
 +
/// This struct defines which pins are used by the bit banging spi device
 +
struct spi_pins {
 +
    DigitalPin< 8>::type mosi; // B0
 +
    DigitalPin< 9>::type miso; // B1
 +
    DigitalPin<10>::type clk;  // B2
 +
};
 +
typedef bitbanged_spi<spi_pins> spi;
 +
 +
void f()
 +
{
 +
    // ...
 +
    spi::transmit_receive( value);
 +
    // ...
 +
}
 +
</source>
 +
This is a little more verbose than just providing the types as template arguments, but&mdash;like with named function arguments&mdash;readers of this code will have a much easier time understanding what the spi class does.
 +
 +
==Is it worth it?==
 +
In summary, the fast pins library offers fast, low-footprint digital pin functions without sacrificing readabillity. Especially for the simple use cases these functions offer these features without any noticable disadvantages.
 +
 +
When writing libraries, some things need to be taken into account, such as the fact that most of the library implementation now becomes part of a header file and pin definitions can no longer be provided as constructor arguments, but should be type arguments to some template.
 +
 +
For all "raw" AVR projects, these fast pin definitions&mdash;and [[Arduino-like pin definitions in C++|the underlying library]] with a less arduino-like interface&mdash;have offered me nothing but advantages: never slower, and sometimes faster, than native code, combined with nicely readable application code.
 +
 +
In my view, the fast pins library is another example of how the C++ programming language can help you make the readable/optimised dilemma go away.
 +
 +
==Download==
 +
The Arduino-ified library is available here: [{{filepath:FastPins.zip}} FastPins.zip]. The sources are also on [https://github.com/DannyHavenith/avr_utilities github], together with the [[Arduino-like pin definitions in C++|original library]] (with function names like "set()" and "clear()").
  
 
==References==
 
==References==
 
<references/>
 
<references/>
 +
 +
==Comments? Questions?==
 +
{{ShowComments|show=True}}

Latest revision as of 18:51, 2 November 2015

Lotsapins.jpg

This page describes a library that offers overloads of Arduino's digital pin functions (digitalWrite(), digitalRead(), shiftOut(), shiftIn()), but with native performance. For example, the digitalWrite() function of this library produces 1 inlined assembler instruction and runs in 2 clock cycles instead of Arduino's 50+ cycles. This library can be used to include Arduino code in "raw" AVR projects without suffering a memory footprint hit. It is also offered as an Arduino library, providing a much faster implementation of the digital pin functions.

The library is header file only and produces binary code that is as fast and as small as hand-crafted assembly code. When changing more than one bit at a time, the resulting code is generally faster than the C-style equivalent using #defined macros and bitwise logical operators.

Fast or readable?

When reading or setting pin values on AVRS, there are typically only two options: the readable way or the fast way. The readable way is offered by the Arduino platform and consists of digital pin functions like digitalRead, shiftOut, etc. The fast way is available both on Arduino and on ‘raw’ AVR and consists of bit-wise AND- and OR-operations. In code this looks like this:

<source lang="cpp"> // ************************************************** // The 'readable' way. Use digitalWrite() and friends

int myPin = 12;

void setup() {

 pinMode( myPin, OUTPUT);

}

void loop() {

 digitalWrite( myPin, HIGH);
 digitalWrite( myPin, LOW);

}

// ************************************************** // The 'fast' way. Use bitwise operators and defines.

  1. define MYPINPORT PORTB
  2. define MYPINDDR DDRB
  3. define MYPINMASK _BV(4)

void setup() {

 MYPINDDR |= MYPINMASK;

} void loop() {

   MYPINPORT |= MYPINMASK;
   MYPINPORT &= ~MYPINMASK;

} </source>

Apart from the obvious differences, there is another, more subtle difference between the two approaches: when giving a name to a pin function, this is typically done by using an integer variable on Arduino, which is resolved at run-time. The pins and ports for bitwise operations are normally declared using preprocessor macros, which get resolved at compile time. There are hybrid cases, such as the Arduino SoftwareSerial library[1], where a pin number is converted once into a bit mask at run time and then that bit mask is used for the remainder of the program in logical AND- and OR operations.

It is partly the run-time resolving of pin numbers that makes Arduino’s digital pin functions so slow[2]; simply setting or clearing a single output pin will set you back more than 50 clock cycles![3] As a developer, you have to choose between fast, but less readable bitwise operators in combination with macros, or readable but slow Arduino digital pin functions.

This choice can be clearly seen when looking at the implementations of several standard Arduino libraries. The strictly timed SoftwareSerial library, as mentioned before, uses bitwise operators[1], while for example the LiquidCrystal library can afford to use the digital pin functions[4].

At the same time, both ‘raw’ AVR developers and Arduino developers use AVR-GCC, which is a full-flexed, modern C++ compiler. Modern C++ compilers allow techniques like template meta programming (TMP) which in turn allows the compiler to perform almost arbitrarily complex processing before generating the assembler instructions that end up in your AVR’s firmware. Shouldn’t it be possible to use the compiler to solve the readable/fast dilemma?

It should, and it is.

Fast and readable

Look at the following code and spot the differences with the digital pin functions as shown earlier

<source lang="cpp">

  1. include "FastPins.h"

using namespace FastPins; DigitalPin<12>::type myPin;

void setup() {

 pinMode( myPin, OUTPUT);

}

void loop() {

 digitalWrite( myPin, HIGH);
 digitalWrite( myPin, LOW);

} </source>

You see? Almost the same code. One difference that you can’t see is that now the digitalWrite() and pinMode() functions each compile into 1 assembler instruction that executes in 2 clock cycles.

The big difference is the #include "FastPins.h": now the digital pin functions are not the original Arduino ones, but overloads. The other difference is that myPin in the code above is no longer an integer which is read and interpreted at run-time. Instead, it is a variable of some special type, where the pin number is part of the type. This means that no time is spent at run time to determine which hardware address to use and which bit to set in that hardware.

The generated code is also smaller; in my Arduino environment, the binary size for the code above shrinks from 882 bytes to 472 bytes. For code that contains a shiftOut-call, code shrinks from 1042 bytes to 508 bytes.

Also: if you specify a nonsensical pin number (like, say, 42) you will be punished with a compiler error instead of being silently ignored at run time, which is the Arduino treatment.

I'll describe some disadvantages later—or rather: consequences, but let me just show you one more advantage: if you want to squeeze the last clock tick from your code, the functions in the FastPins library allow setting or clearing several bits at the same time. The library will generate optimal code, which means that if a call simultaneously sets or resets bits in the same AVR port, the fastest code will be emitted to do that. For example:

<source lang="cpp">

  1. include "FastPins.h"

using namespace FastPins; DigitalPin<12>::type led1; // this is in port B DigitalPin< 7>::type led2; // port D DigitalPin<11>::type led3; // port B again


void setup() {

 pinMode( led1 | led2 | led3, OUTPUT);

}

void loop() {

 // this will combine the output to led1 and led3
 // for optimum performance.
 digitalWrite( led1 | led2 | led3, HIGH);

} </source> In the above code we're asserting pins led1, led2 and led3 at the same time. Because led1 and led3 are in the same AVR port, they offer an optimization opportunity. Instead of creating code that is equivalent to this C-code:

<source lang="cpp">

   PORTB |= LED1_MASK; // 2 clocks
   PORTD |= LED2_MASK; // 2 clocks
   PORTB |= LED3_MASK; // 2 clocks

</source>

The library generates code that is equivalent to the following C-code:

<source lang="cpp">

   PORTB |= (LED1_MASK | LED3_MASK); // 3 clocks
   PORTD |= LED2_MASK;               // 2 clocks

</source>

...which results in slightly faster assembly code[5]. The gain is not enormous compared to the already fast new overloads (1 clock tick in the example above), but adding even more pins is free (will happen in the same 3 clock ticks) and in tight inner loops it feels good to know that there is absolutely no faster way to toggle the pins.

Consequences

FastPins is not completely a drop-in replacement for Arduino pin functions. Because declared pins are not integers anymore, but all have different types, they cannot simply be used in existing Arduino libraries. The following is currently not possible:

<source lang="cpp">

  1. include "FastPins.h"

using namespace FastPins;

DigitalPin<12>::type rxPin; DigitalPin<11>::type txPin;

SoftwareSerial mySerial(rxPin, txPin); </source>

Although it is certainly possible to create the conversion operators to translate DigitalPins to integers, the SoftwareSerial object would still have the "old" Arduino performance. In fact, any library that would want to use fast pin definitions and that supports configurable pins, should be written as a template that accepts either DigitalPins or pin numbers as template arguments. A fastpin-version of SoftwareSerial would then be instantiated as:

<source lang="cpp"> SoftwareSerial<rxPinType, txPinType> mySerial; </source>

Additionally, the SoftwareSerial becoming a template would mean that most of its implementation would move to a header file and its methods would be instantiated repeatedly for each rx/tx combinatition. Particularly the SoftwareSerial could be problematic, because people often use this class out of a need to have more than one serial port. For most other libraries however, this is not a problem because these libraries control some piece of hardware and inherently implement singletons.

Writing libraries

I've implemented some libraries using this type of pin definitions. As stated above, any such library should take its pin definitions as template arguments because pins are now defined as types, not integer values.

Still, more choices need to be made, because there are two possible styles of providing the pin definitions: provide a struct with named members for each of the pins, or provide each pin as a template argument. Currently, I'm using the first approach. For example, the fast-pins version of a bit-banging SPI device is used as follows:

<source lang="cpp"> /// This struct defines which pins are used by the bit banging spi device struct spi_pins {

   DigitalPin< 8>::type mosi; // B0
   DigitalPin< 9>::type miso; // B1
   DigitalPin<10>::type clk;  // B2

}; typedef bitbanged_spi<spi_pins> spi;

void f() {

   // ...
   spi::transmit_receive( value);
   // ...

} </source> This is a little more verbose than just providing the types as template arguments, but—like with named function arguments—readers of this code will have a much easier time understanding what the spi class does.

Is it worth it?

In summary, the fast pins library offers fast, low-footprint digital pin functions without sacrificing readabillity. Especially for the simple use cases these functions offer these features without any noticable disadvantages.

When writing libraries, some things need to be taken into account, such as the fact that most of the library implementation now becomes part of a header file and pin definitions can no longer be provided as constructor arguments, but should be type arguments to some template.

For all "raw" AVR projects, these fast pin definitions—and the underlying library with a less arduino-like interface—have offered me nothing but advantages: never slower, and sometimes faster, than native code, combined with nicely readable application code.

In my view, the fast pins library is another example of how the C++ programming language can help you make the readable/optimised dilemma go away.

Download

The Arduino-ified library is available here: FastPins.zip. The sources are also on github, together with the original library (with function names like "set()" and "clear()").

References

  1. 1.0 1.1 SoftwareSerial source code, take a look at the tx_pin_write()- or rx_pin_read()-methods
  2. well, that, and checking whether the pin is maybe used for PWM at the time, see the implementation of the digitalWrite()-function.
  3. "To use or not use digitalWrite", a blog by Bill Grundman who, unlike me, is not too lazy to hook up a scope to his AVR and do measurements
  4. LiquidCrystal source code
  5. The compiler's optimizer will never combine these OR expressions by itself because the ports are declared volatile. Successive writes to a volatile register will never be combined into a single write.

Comments? Questions?

nopreview

{{#set: |Article has average rating={{#averagerating:}} }} {{#showcommentform:}}

{{#ask: Belongs to article::Fast, arduino compatible digital pin functionsModification date::+

 | ?Has comment person
 | ?Has comment date
 | ?Has comment text
 | ?Has comment rating
 | ?Belongs to comment
 | ?Comment was deleted#true,false
 | ?Has comment editor
 | ?Modification date
 | ?Has attached article
 | format=template
 | template=CommentResult
 | sort=Has comment date
 | order=asc
 | link=none
 | limit=100

}}