View on GitHub

Computer Architecture and Operating Systems

Course taught at Faculty of Computer Science of Higher School of Economics

Lecture 13

More instruction-level parallelism. Multiple issue and out-of-order execution.

Lecture

Slides (PDF, PPTX).

Outline

Examples

CPU Model Microarchitecture Generation Issue Width
Core i7-3615QM Ivy Bridge 3rd Gen 4
Core i7-8665U Whiskey Lake 8th Gen 4
Core i7-1260P Golden Cove (P-core) 12th Gen 6
  Gracemont (E-core)   5
Core i7-13700 Raptor Cove (P-core) 13th Gen 6
  Gracemont (E-core)   5

Workshop

Outline

Dual-Issue RISC-V CPU (Ripes Simulator)

Dual-Issue RISC-V

Examples

Runs the add_scalar.s example and see how many CPU clock cycles it uses.

Ripes 1 Ripes 2

Branch History Table (RARS Simulator)

Branch Prediction

Run programs from lectures 4-7 in RARS simulator with the “Branch History Table” plugin connected. See how well it can predict branch outcomes with different settings.

Tasks

  1. Optimize the add_scalar.s program to make it waste less CPU cycles. Use the loop-unrolling technique (two or more loop iterations merged). How many cycles are used now?

  2. Write an optimized version of the PlusMinus program, which solves the issue of incorrect branch prediction with loop unrolling (“even” and “odd” operations must be done at the same loop iteration).

References