Arduino Code – Performance

I’m currently making a self balancing robot, powered by an Arduino Nano, and decided to use NEMA 17 stepper motors to power it.

The DRV8825 motor drivers are great, but cannot be driven using a PWM signal. Consequently the Arduino code must send a ‘step’ signal to the correct pin many times per second – Easily in the thousands if you require a decent amount of speed, especially if you’re micro-stepping.

Performance is critical…

To make the stepper rotate one full turn requires it to step 200 times. In my case I’m using a micro-step level of ‘8’, meaning I actually have to send 1,600 ‘steps’ (2 * 800) signals to the DRV8825 input pin. So to get a speed of 2 revolutions per second that’s actually 3,200 steps per second! Getting this sort of speed (combined with code to calculate motor acceleration, the angle of the robot, etc) is no simple feat!

To achieve the best code performance you need to really analyse how long each loop cycle takes, trimming down unnecessary calculations as far as possible. You may or may not know that different types of calculation take different amounts of time. For example, multiplying two ints is faster than multiplying two doubles, and any sort of ‘divide’ operations is significantly slower again – See below!

There are various tricks you can use to keep performance high:

  • Use the simplest possible numeric types.
  • Apply fixed point math.
  • Pre-calculate as many values as possible.
  • Consider that a ‘divide’ operation can be achieved by multiplying by a value’s reciprocal. (E.g. x / y is the same as x * (1 / y), and (1 / y) might be pre-calculated.)
  • Perform some operations (such as reading gyro values) only periodically. (Made simpler by using my Timed Events code!)
  • User faster pin I/O code.

There’s an extremely handy table showing how long various Arduino operations take. It’s definitely worth keeping at hand! I’ll include a copy here for reference:

Speed test
F_CPU = 16000000 Hz
1/F_CPU = 0.0625 us
The next tests are runtime compensated for overhead
Interrupts are still enabled, because millis() is used for timing
  nop                       : 0.063 us
  avr gcc I/O               : 0.125 us
  Arduino digitalRead       : 3.585 us
  Arduino digitalWrite      : 5.092 us
  pinMode                   : 4.217 us
  multiply byte             : 0.632 us
  divide byte               : 5.412 us
  add byte                  : 0.569 us
  multiply integer          : 1.387 us
  divide integer            : 14.277 us
  add integer               : 0.883 us
  multiply long             : 6.100 us
  divide long               : 38.687 us
  add long                  : 1.763 us
  multiply float            : 7.110 us
  divide float              : 79.962 us
  add float                 : 9.227 us
  itoa()                    : 13.397 us
  ltoa()                    : 126.487 us
  dtostrf()                 : 78.962 us
  random()                  : 51.512 us
  y |= (1<<x)               : 0.569 us
  bitSet()                  : 0.569 us
  analogRead()              : 111.987 us
  analogWrite() PWM         : 11.732 us
  delay(1)                  : 1006.987 us
  delay(100)                : 99999.984 us
  delayMicroseconds(2)      : 0.506 us
  delayMicroseconds(5)      : 3.587 us
  delayMicroseconds(100)    : 99.087 us

Notice how much longer a ‘divide’ operation takes! Nearly 12x slower for a float!

Leave a Reply

Your email address will not be published.