Efficient C Code for ARM Devices
Anbieter zum Thema
Consider also the constraints which are placed on compiler optimization by features which are built into the language. Pointers in C are extremely powerful but, because of that power, the phenomenon of pointer aliasing greatly restricts the freedom of the compiler in carrying out optimizations.
In the following sequence, each input value must be loaded twice. This is because the compiler must assume that there is some possibility that the output array and the input array overlap.
Sometimes, languages provide extra keywords (in this case the ‘restrict’ keyword) which can help here.
Consider also where the precise definition of the language may work against you. For instance, most integer data types default to a signed representation. This means that most arithmetic operations are carried out to generate a signed result, often requiring extra instructions to normalize the result for correct size and sign.
If you actually want unsigned arithmetic, make sure that your types are defined correctly to avoid these unnecessary operations.
The following simple division (which, with an unsigned variable could be implemented using a simple shift) results in quite a complex sequence of instructions as the compiler needs to maintain the correct sign of the result.
Here is the resulting code output.
Most languages incorporate features which allow you to pass “meta-information” to the compiler which will assist in code generation. A common example is the ‘const’ modifier which allows you to inform the compiler that a particular data item will not change.
First, this means that a ‘const’ variable can be allocated to ROM rather than occupying valuable RAM. Secondly, when used to modify a function parameter, the compiler is informed that the function will not change this parameter. This serves two functions: firstly, the compiler can warn you if you attempt to change it; secondly, the compiler knows that it can optimize in the knowledge that this item will not change.
In the following example, the compiler does not need to reload ‘foo’ in the if statement as it is able to assume that the function call will not change it.
When developing for an embedded processor, you are typically writing within several constraints. It is important to know what these limits are and to work within them.
Any processor supports a well-defined instruction set. You should familiarize yourself with that instruction set and make sure that you use it effectively. In the case of ARM processors, most support at least two instruction sets, and sometimes three.
It is a compile-time choice which instruction set to use. Usually, the Thumb-2 instruction set is used as it provides the best balance between performance, functionality and code density. In some circumstances, it can be beneficial to select the ARM instruction set for high performance in critical code regions.
Traditionally, assembly code has been used to extract maximum performance from machines by optimizing data access and by re-ordering instructions to minimize pipeline hazards and load-use penalties. For two reasons, this is less useful in a modern system.
Firstly, compilers are much better at exploiting the architecture. Secondly, modern processors often incorporate out-of-order or superscalar execution hardware which works to minimize pipeline hazards at execution time.