This course aims to provide a strong foundation for students to understand the modern eras of computer architecture (i.e., the single-core era, multi-core era, and accelerator era) and to apply these insights and principles to future computer designs. The course is structured around the three primary building blocks of general-purpose computing systems: processors, memories, and networks.

The first half of the course focuses on the fundamentals of each building block. Topics include instruction set architecture; single-cycle, FSM, and pipelined processor microarchitecture; direct-mapped vs.~set-associative cache memories; memory protection, translation, and virtualization; FSM and pipelined cache microarchitecture; cache optimizations; and integrating processors, memories, and networks. The second half of the course delves into more advanced techniques and will enable students to understand how these three building blocks can be integrated to build a modern shared-memory multicore system. Topics include superscalar execution, out-of-order execution, register renaming, memory disambiguation, branch prediction, and speculative execution; multithreaded, VLIW, and SIMD processors; and memory synchronization, consistency, and coherence. Students will learn how to evaluate design decisions in the context of past, current, and future application requirements and technology constraints.

This course includes a significant project decomposed into four lab assignments. Throughout the semester, students will gradually design, implement, test, and evaluate a complete multicore system capable of running simple parallel applications at the register-transfer level.