View on GitHub

Computer Architecture and Operating Systems

Course taught at Faculty of Computer Science of Higher School of Economics

Lecture 15

Optimizations

Lecture

Slides (PDF, PPTX).

Outline

Reducing amount of work
Data optimization
Loop optimizations
Logic optimizations
Function optimizations
Optimizing compilers and their limitations

List of Optimizations

Data structures

Packing and encoding
Augmentation
Precomputation
Compile-time initialization
Caching
Lazy evaluation
Sparsity

Loops

Hoisting
Sentinels
Loop unrolling
Loop fusion
Eliminating wasted iterations

Logic

Constant folding and propagation
Common-subexpression elimination
Algebraic identities
Short-circuiting
Ordering tests
Creating a fast path
Combining tests

Functions

Inlining
Tail-recursion elimination
Coarsening recursion

Workshop

Outline

Discuss optimizations from the lecture, whoch are related to hardware.
Use the RISC-V toolchain to compile C programs with optimizations.
- Study examples of optimized and unoptimized C programs.
- Review the assembly code generate by the compiler.
- Try different levels of compiler optimizations (-01, -02).
Practice implementing optimizations in the RISC-V assembly.

Using the RISC-V Toolchain

Note: If the toolchain is unavailable, use compiler explorer Godbolt.

Run the Linux Ubuntu 20.04 LTS with RISC-V toolchain VM in your VirtualBox.
Use the password acos2020 to log in and open the Bash terminal.
(Optional) Connect to the VM from your host OS (MacOS or Windows) via SSH by executing the following command in the terminal (use the same password):
```
ssh acos@localhost -p2022 
```
Create a C program (e.g. prog.c) using your favourite editor (nano, mcedit, vim, etc.).
Compile the C program to the assembly language:
```
riscv64-unknown-linux-gnu-gcc prog.c -S
```
Compile the C program to the assembly language with optimizations enabled:
```
riscv64-unknown-linux-gnu-gcc prog.c -S -O1
```
Compare the optimized and the unoptimized versions of the assembly program (the prog.s file).
Compile the assembly program to the object format and link it to an executable file:
```
riscv64-unknown-linux-gnu-gcc prog.s -o prog -static
```

Run the program using the Spike simulator:

spike $RISCV/riscv64-unknown-linux-gnu/bin/pk prog

See the list of optimization flags supported by GCC here.

See the RISC-V options for GCC here.

Examples

Inlining and constant folding.
```
#include <stdio.h>
  
int square(int x) {
    return x * x;
}
    
int add(int x, int y) {
    return x + y;
}
    
int main() {
    int x = 10;
    int y = 5;
    int z = add(square(x), y);
    printf("Result = %d\n", z);
    return 0;
}
```
- Compile the program without optimizations and then with optimizations. See the difference.
- Add the inline directive to the function declarations. What happens after you do this?
- Modify the program so that x and y are read from the user input. How does it affect optimizations?

Loop unrolling.

#include <stdio.h>
 
void fcopy(char *source, char *target) {
    for (int i = 0; i < 256; ++i) {
       target[i] = source[i];
    }
}
    
int main() {
    char in[256];
    scanf("%s", in);
    char out[256];
    fcopy(in, out);
    printf("Result = %s\n", out);
    return 0;
}

Compile the program without optimizations and then with optimizations (-03). See the difference.

Memory accesses.

#include <stdio.h>
  
void add (int x, int y, int z, int *result) {
    *result += x;
    *result += y;
    *result += z;
}
    
int main() {
    int x, y, z;
    scanf("%d%d%d", &x, &y, &z);
    int result;
    add(x, y, z, &result);
    printf("Result=%d\n", result);
    return 0;
}

Compile the program without optimizations and then with optimizations (-01). See the difference.

Branch prediction.
```
int test(int a, int b) {
    if (a < b) [[unlikely]] {
        int x = a * 16 + 7;
        int y = b / 2 + 4;
        return x / y;
    } else {
        int x = a * 2 + 1;
        int y = b * 3 + 2;
        return x * y;
    }    
}
```
Use [[likely]] and [[unlikely]] C++ attributes or C macros to modify the program structure.
```
#define likely(x)      __builtin_expect(!!(x), 1) 
#define unlikely(x)    __builtin_expect(!!(x), 0) 
```
The idea: the standard branch prediction heuristic assumes that a forward branch is always taken. This means that the instructions of the if branch will be fetched by the CPU. However, we can mark the condition as unlikely and the compiler will change the program structure in such a way that the else branch instructions will be fetched by the CPU.
- Compile the program marking different branches with [[unlikely]]/[[likely]] and see the difference.

Tasks

Explain why passing arguments to functions by reference or by pointer is not a good idea unless it is really necessary. Explain the difference between:
```
void add(int* z, int* x, int* y) {
    *z = *x + *y;
}
```
and:
```
int add(int x, int y) {
    return x + y;
}
```

Write a function in RISC-V assembly, which accepts as arguments an array of 16-bit values and returns the result of following expression: A[0] - A[1] + A[2] - A[3] + [A4] ... +- A[N-1]. Use the loop unrolling technique to make calculations faster.

An unoptimized C implementation look like this:

#include <stdio.h>
   
int func(short *array, int size) {
    int result = 0;
    for (int i = 0; i < size; ++i) {
        short value = array[i];
        if (i % 2 == 0) {
            result += value;
        } else {
            result -= value;
        }
    }
    return result;
}

int main() {
    short arr[] = {1, 2, 3, 4, 5};
    printf("Result= %d\n", func(arr, sizeof(arr)/sizeof(short)));
    return 0;
}

Homework

Finish all the tasks.

References

Optimizing program performance. Chapter 5 in [CSPP].
MIT 6.172. Lecture 2. Slides.
Loop unrolling (Wikipedia).
Loop-invariant code motion (Wikipedia).
Profiling (Wikipedia).
Program optimization (Wikipedia).
Branch prediction macros in GCC (GeeksForGeeks).
C++ attribute: likely, unlikely (since C++20).