Lecture 15
Optimizations
Lecture
Outline
- Reducing amount of work
- Data optimization
- Loop optimizations
- Logic optimizations
- Function optimizations
- Optimizing compilers and their limitations
List of Optimizations
Data structures
- Packing and encoding
- Augmentation
- Precomputation
- Compile-time initialization
- Caching
- Lazy evaluation
- Sparsity
Loops
- Hoisting
- Sentinels
- Loop unrolling
- Loop fusion
- Eliminating wasted iterations
Logic
- Constant folding and propagation
- Common-subexpression elimination
- Algebraic identities
- Short-circuiting
- Ordering tests
- Creating a fast path
- Combining tests
Functions
- Inlining
- Tail-recursion elimination
- Coarsening recursion
Workshop
Outline
- Discuss optimizations from the lecture, whoch are related to hardware.
- Use the RISC-V toolchain (use compiler explorer Godbolt)
to compile C programs with optimizations.
- Study examples of optimized and unoptimized C programs.
- Review the assembly code generate by the compiler.
- Try different levels of compiler optimizations (
-01,-02).
- Practice implementing optimizations in the RISC-V assembly.
Examples
-
Inlining and constant folding.
#include <stdio.h> int square(int x) { return x * x; } int add(int x, int y) { return x + y; } int main() { int x = 10; int y = 5; int z = add(square(x), y); printf("Result = %d\n", z); return 0; }- Compile the program without optimizations and then with optimizations. See the difference.
- Add the
inlinedirective to the function declarations. What happens after you do this? - Modify the program so that
xandyare read from the user input. How does it affect optimizations?
-
Loop unrolling.
#include <stdio.h> void fcopy(char *source, char *target) { for (int i = 0; i < 256; ++i) { target[i] = source[i]; } } int main() { char in[256]; scanf("%s", in); char out[256]; fcopy(in, out); printf("Result = %s\n", out); return 0; }- Compile the program without optimizations and then with optimizations (
-03). See the difference.
- Compile the program without optimizations and then with optimizations (
-
Memory accesses.
#include <stdio.h> void add (int x, int y, int z, int *result) { *result += x; *result += y; *result += z; } int main() { int x, y, z; scanf("%d%d%d", &x, &y, &z); int result; add(x, y, z, &result); printf("Result=%d\n", result); return 0; }- Compile the program without optimizations and then with optimizations (
-01). See the difference.
- Compile the program without optimizations and then with optimizations (
-
Branch prediction.
int test(int a, int b) { if (a < b) [[unlikely]] { int x = a * 16 + 7; int y = b / 2 + 4; return x / y; } else { int x = a * 2 + 1; int y = b * 3 + 2; return x * y; } }Use
[[likely]]and[[unlikely]]C++ attributes or C macros to modify the program structure.#define likely(x) __builtin_expect(!!(x), 1) #define unlikely(x) __builtin_expect(!!(x), 0)The idea: the standard branch prediction heuristic assumes that a forward branch is always taken. This means that the instructions of the
ifbranch will be fetched by the CPU. However, we can mark the condition as unlikely and the compiler will change the program structure in such a way that theelsebranch instructions will be fetched by the CPU.- Compile the program marking different branches with
[[unlikely]]/[[likely]]and see the difference.
- Compile the program marking different branches with
Tasks
-
Explain why passing arguments to functions by reference or by pointer is not a good idea unless it is really necessary. Explain the difference between:
void add(int* z, int* x, int* y) { *z = *x + *y; }and:
int add(int x, int y) { return x + y; } -
Write a function in RISC-V assembly, which accepts as arguments an array of 16-bit values and returns the result of following expression:
A[0] - A[1] + A[2] - A[3] + [A4] ... +- A[N-1]. Use the loop unrolling technique to make calculations faster.An unoptimized C implementation look like this:
#include <stdio.h> int func(short *array, int size) { int result = 0; for (int i = 0; i < size; ++i) { short value = array[i]; if (i % 2 == 0) { result += value; } else { result -= value; } } return result; } int main() { short arr[] = {1, 2, 3, 4, 5}; printf("Result= %d\n", func(arr, sizeof(arr)/sizeof(short))); return 0; }
Homework
Finish all the tasks.
References
- Optimizing program performance. Chapter 5 in [CSPP].
- MIT 6.172. Lecture 2. Slides.
- Loop unrolling (Wikipedia).
- Loop-invariant code motion (Wikipedia).
- Profiling (Wikipedia).
- Program optimization (Wikipedia).
- Branch prediction macros in GCC (GeeksForGeeks).
- C++ attribute: likely, unlikely (since C++20).