Lecture 15
Optimizations
Lecture
Outline
- Reducing amount of work
- Data optimization
- Loop optimizations
- Logic optimizations
- Function optimizations
- Optimizing compilers and their limitations
List of Optimizations
Data structures
- Packing and encoding
- Augmentation
- Precomputation
- Compile-time initialization
- Caching
- Lazy evaluation
- Sparsity
Loops
- Hoisting
- Sentinels
- Loop unrolling
- Loop fusion
- Eliminating wasted iterations
Logic
- Constant folding and propagation
- Common-subexpression elimination
- Algebraic identities
- Short-circuiting
- Ordering tests
- Creating a fast path
- Combining tests
Functions
- Inlining
- Tail-recursion elimination
- Coarsening recursion
Workshop
Outline
- Discuss optimizations from the lecture, whoch are related to hardware.
- Use the RISC-V toolchain to compile C programs with optimizations.
- Study examples of optimized and unoptimized C programs.
- Review the assembly code generate by the compiler.
- Try different levels of compiler optimizations (
-01
,-02
).
- Practice implementing optimizations in the RISC-V assembly.
Using the RISC-V Toolchain
-
Run the Linux Ubuntu 20.04 LTS with RISC-V toolchain VM in your VirtualBox.
-
Use the password
acos2020
to log in and open the Bash terminal. -
(Optional) Connect to the VM from your host OS (MacOS or Windows) via SSH by executing the following command in the terminal (use the same password):
ssh acos@localhost -p2022
-
Create a C program (e.g.
prog.c
) using your favourite editor (nano, mcedit, vim, etc.). -
Compile the C program to the assembly language:
riscv64-unknown-linux-gnu-gcc prog.c -S
-
Compile the C program to the assembly language with optimizations enabled:
riscv64-unknown-linux-gnu-gcc prog.c -S -O1
-
Compare the optimized and the unoptimized versions of the assembly program (the
prog.s
file). -
Compile the assembly program to the object format and link it to an executable file:
riscv64-unknown-linux-gnu-gcc prog.s -o prog -static
-
Run the program using the Spike simulator:
spike $RISCV/riscv64-unknown-linux-gnu/bin/pk prog
See the list of optimization flags supported by GCC here.
See the RISC-V options for GCC here.
Examples
- Inlining and constant folding.
#include <stdio.h>
int square(int x) {
return x * x;
}
int add(int x, int y) {
return x + y;
}
int main() {
int x = 10;
int y = 5;
int z = add(square(x), y);
printf("Result = %d\n", z);
return 0;
}
- Compile the program without optimizations and then with optimizations. See the difference.
- Add the
inline
directive to the function declarations. What happens after you do this? - Modify the program so that
x
andy
are read from the user input. How does it affect optimizations?
- Loop unrolling.
#include <stdio.h>
void fcopy(char source[256], char target[256]) {
for (int i = 0; i < 256; ++i) {
target[i] = source[i];
}
}
int main() {
char in[256];
scanf("%s", in);
char out[256];
fcopy(in, out);
printf("Result = %s\n", out);
return 0;
}
- Compile the program without optimizations and then with optimizations (
-03
). See the difference.
- Memory accesses.
#include <stdio.h>
void add (int x, int y, int z, int *result) {
*result += x;
*result += y;
*result += z;
}
int main() {
int x, y, z;
scanf("%d%d%d", &x, &y, &z);
int result;
add(x, y, z, &result);
printf("Result=%d\n", result);
return 0;
}
- Compile the program without optimizations and then with optimizations (
-01
). See the difference.
Tasks
-
Explain why passing arguments to functions by reference or by pointer is not a good idea unless it is really necessary. Explain the difference between:
void add(int* z, int* x, int* y) { *z = *x + *y; }
and:
int add(int x, int y) { return x + y; }
-
Write a function in RISC-V assembly, which accepts as arguments an array of 16-bit values and returns the result of following expression:
A[0] - A[1] + A[2] - A[3] + [A4] ... +- A[N-1]
. Use the loop unrolling technique to make calculations faster.An unoptimized C implementation look like this:
#include <stdio.h> int func(short *array, int size) { int result = 0; for (int i = 0; i < size; ++i) { short value = array[i]; if (i % 2 == 0) { result += value; } else { result -= value; } } return result; } int main() { short arr[] = {1, 2, 3, 4, 5}; printf("Result= %d\n", func(arr, sizeof(arr)/sizeof(short))); return 0; }
Homework
Finish all the tasks.
References
- Optimizing program performance. Chapter 5 in [CSPP].
- MIT 6.172. Lecture 2. Slides.
- Loop unrolling (Wikipedia).
- Loop-invariant code motion (Wikipedia).
- Profiling (Wikipedia).
- Program optimization (Wikipedia).