Class
|
Topic
|
Reading
|
Milestones
|
Review 3/27
|
Review of performance metrics
|
3rd sections 1.4 and 1.6
(4th 1.8 and 1.9)
(JLB I.2 and I.3)
|
|
Review of pipelining
|
3rd Appendix A.1 to A.4
(4th A.1 to A.4)
(JLB II.1.1 to II.1.5)
|
Dynamic branch prediction 3/29
|
Dynamic branch prediction
|
3rd sections 3.4 and 3.5, pp. 265-266, and Fig. 3-40 p. 249
(4th 2.3 and 2.9 pp. 121-127)
(JLB IV.1 and sidebar)
|
|
Simulation (in section)
|
SimpleScalar documentation,
project report guidelines, and
sample project report,
available from the homework section.
|
Execution cores 4/3, 4/5, 4/10, 4/12
|
Superscalars and static scheduling
|
3rd pp. 215-220, and A.5 and A.8
(4th: unfortunately, the authors have decided to skip "in-order" scheduling;
see however A.5 and A.7)
(JLB III.1 and sidebar)
|
Homework 1 due April 5
|
Dynamic scheduling and Tomasulo's algorithm
|
3rd 3.2, 3.3, and pp. 220-224
(4th 2.4 and 2.5)
(JLB III.2 and sidebar)
|
R10000-style dynamic scheduling (a physical register pool)
|
The Smith/Sohi article for superscalars in a nutshell.
In the R10000 article read from register mapping,
p. 32, through Register files, p. 35.
(JLB IV.3)
|
Pentium-style dynamic scheduling (reorder buffers)
|
3rd 3.7, 3.10
(4th 2.6, 2.9 pp. 128-128, 2.10)
You might also want to look at
two articles on the Pentium Pro (pdf)
and (pdf). It's only necessary
to read the sections on the pipeline and dynamic scheduling at this point.
(JLB III.3)
|
VLIW 4/17
|
VLIW machines
|
3rd pp. 315-319, section 4.5 pp. 340-344, and 4.7 pp. 363-367
(4th section 2.7 and appendix G, TBD)
(JLB III.4)
|
Homework 2 due 4/19
|
Midterm 1 4/24, in class.
|
Memory hierarchy 4/19, 4/26
|
Hardware support for load speculation
|
3rd pp. 349-351
(4th TBD)
(JLB V.2)
|
|
Review of basic caching techniques (in section)
|
|
Advanced caching techniques
|
3rd pp. 413-430, 435-448
(4th 5.2)
(JLB VI.1-VI.3 and sidebar)
|
Multiprocessors 5/1, 5/3, 5/8, and 5/10
|
Overview of multiprocessing
|
3rd section 6.1
(4th 4.1)
(JLB VII.1)
|
Homework 3 due 5/3
Homework 4 due 5/10
|
Cache coherence, snooping, and directory protocols
|
3rd sections 6.3 and 6.5
(4th 4.2 and 4.4)
(JLB VII.2.1 and VII.2.2)
|
Synchronization
|
3rd section 6.7
(4th 4.5)
(JLB VII.3)
|
Multithreading 5/15 and 5/17
|
Introduction and Tera-style multithreading
|
Read the Tera paper (PDF).
Tera's runtime system (not required - this is just in
case the OS/RT students are interested).
(JLB VIII.1.1 and VIII.1.2)
|
|
Simultaneous multithreading
|
3rd section 6.9
(4th 3.5)
the SMT paper
(JLB VIII.1.3)
|
Multithreaded CMPs
|
(4th 4.8)
|
Dataflow Computers 5/22 and 5/24
|
Dataflow Machines
|
After reading them over, I don't think any of the papers on the
early dataflow machines are appropriate for classroom use. There
are no general overview papers. So just listen to the lecture.
|
|
Wavescalar architecture and implementation
|
The WaveScalar Architecture and
An overview of the WaveScalar implementation.
|
The End Game 5/29
|
Wrap-up, the final and course evaluations
|
Guest speakers 5/31
|
Martha Mercaldi, CSE graduate student
|
Component Self-assembly, aka Brick and Mortar
|
Extra Credit Homework due 5/31
|
David Bacon, CSE Research Faculty
|
Quantum Computers
|
Midterm 2 6/4 at 10:30, in class.
|