View on GitHub

Computer Architecture and Operating Systems

Course taught at Faculty of Computer Science of Higher School of Economics

Lecture 13

More instruction-level parallelism. Multiple issue and out-of-order execution.

Lecture

Slides (PDF, PPTX).

Outline

Multiple issue processors
Dynamic and static scheduling
Out-of-order execution

Workshop

Outline

System calls in RISC-V (additional topic related to exceptions)
Working with files and heap allocation
Experimenting with a 6-stage dual-issue RISC-V processor (use Ripes simulator)
Experimenting with branch prediction (use RARS simulator)

System calls in RARS (RISC-V Assembly)

System call

System calls are exceptions generated by user code and serviced by the environment (operating system). The environment executes the service code in kernel mode (has full access to all resources). This code handles interaction with a specific device and its driver and returns control back to the user.

Standard (provided in all OS) system calls supported by RARS:

open (1024): opens a file with the specified path
Input: a0 = Null terminated string for the path, a1 = flags
Output: a0 = the file descriptor or -1 if an error occurred
Supported flags: read-only (0), write-only (1), and write-append (9). The write-only flag creates a file if it does not exist, so it is technically write-create. The write-append flag will start writing at end of an existing file.
close (57): closes a file
Input: a0 = the file descriptor to close
Output: N/A
read (63): reads from a file descriptor into a buffer
Input: a0 = the file descriptor, a1 = address of the buffer, a2 = maximum length to read.
Output: a0 = the length read or -1 if error.
write (64): writes to a file from a buffer
Input: a0 = the file descriptor, a1 = the buffer address, a2 = the length to write.
Output: a0 = the number of characters written.
sbrk (9): allocates heap memory
Input: a0 = amount of memory in bytes
Output: a0 = address to the allocated block

Examples

file_write.s - writing text to a file
file_read.s - reading text from a file
heap_alloc.s - allocating memory in the heap

Memory Layout

Dual-Issue RISC-V CPU (Ripes Simulator)

Dual-Issue RISC-V

Examples

Runs the add_scalar.s example and see how many CPU clock cycles it uses.

Ripes 1 Ripes 2

Branch History Table (RARS Simulator)

Branch Prediction

Run programs from lectures 4-7 in RARS simulator with the “Branch History Table” plugin connected. See how well it can predict branch outcomes with different settings.

Tasks

Write a program that creates a copy of the specified file. Input arguments:
- The name of the source and target files are read from the standard input (use system call 8).
- The buffer to store data being copied is allocated in the heap (use system call 9). The buffer size is specified in standard input.
- Buffers for storing source and target names are also allocated in the heap (their size is 256 bytes).
Optimize the add_scalar.s program to make it waste less CPU cycles. Use the loop-unrolling technique (two or more loop iterations merged). How many cycles are used now?
Write an optimized version of the PlusMinus program, which solves the issue of incorrect branch prediction with loop unrolling (“even” and “odd” operations must be done at the same loop iteration).

Homework

TODO

References

Parallelism via Instructions. Section 4.10 in [CODR].
Advanced Microarchitecture. Section 7.7 in [DDCA].
Instruction-Level Parallelism and Its Exploitation. Chapter 3 in [CAQA] (Advanced).
Superscalar processor (Wikipedia).
Out-of-order execution (Wikipedia).
Register renaming (Wikipedia).
Branch predictor (Wikipedia).