For background material, the undergraduate text "Computer Organization and Design, the Hardware/Software Interface, Second Edition" by Hennessy and Patterson contains a good introduction to basic material, including a review of basic arithmetic and the five-stage pipeline.
Students are expected to have a working knowledge of the following topics:
The following list of reading topics can help students gain a solid working knowledge of these topics.
Students should review their historical papers and be aware of some of the original arguments for RISC and out-of-order execution. Also, the ISCA Retrospective is an important source of some of the original insights and motivations for important research projects. Note that many of these papers are included in "Readings in Computer Architecture" by Mark Hill, Normal Jouppi, and Gurindar Sohi.
- David Patterson and David Ditzel, "The Case for the Reduced Instruction Set Computer," ACM SIGARCH Computer Architecture News 8 (15 Oct 1980)
- Douglas Clark and William Strecker, "Comments on 'The Case for the Reduced instruction Set Computer' by Patterson and Ditzel" ACM SIGARCH Computer Architecture News 8 (15 Oct 1980)
- David Patterson and Carlo Sequin, "RISC I: A Reduced Instruction Set VLSI Computer", Proceedings of the International Symposium on Computer Architecture (ISCA) 1981.
- David Ditzel and David Patterson, "Retrospective on High-Level Computer Architecture" Proceedings of the International Symposium on Computer Architecture (ISCA) 1981
- G. Amdahl, G. Blaauw, F. Brooks, Jr., "Architecture of the IBM System/360", IBM Journal, April 1964
- J.E. Thornton, "Parallel Operation in the Control Data 6600", Proceedings of the Fall Joint Computers Conference, vol 26, pp. 33-40, 1964
- E. Hauck and B. Dent, "Burroughs' B6500/B7500 stack Mechanism," AFIP SJCC, 1968
- Richard Russel, "The CRAY-1 Computer System", Communications of the ACM, 21(1) 63-72, January 1978
- David Moon, "Symbolics Architecture", IEEE Computer, 1987
Appendix A of "Computer Architecture: a Quantitative Approach, Second Edition" by Hennessy and Patterson contains some basic information on computer arithmetic. Also, see chapter 4 of the undergraduate text "Computer Organization and Design, the Hardware/Software Interface, Second Edition" by Hennessy and Patterson.
Review Chapters 3 and 4, as well as Appendix B of "Computer Architecture: a Quantitative Approach, Second Edition" by Hennessy and Patterson. Also, chapters 5, 6, and 7 of the undergraduate text "Computer Organization and Design, the Hardware/Software Interface, Second Edition" by Hennessy and Patterson. Make sure to thoroughly understand the Smith and Pleszkun paper below.Papers on Out-Of-Order Execution:
- James Smith and Andrew Pleszkun, "Implementation of Precise Interrupts in Pipelined Processors", Proceedings of the International Symposium on Computer Architecture (ISCA) 1985
- R. Tomasulo, "An Efficient Algorithm for Exploiting Multiple Arithmetic units", IBM Journal, January 1967
- Wen-mei Hwu and Yale Patt, "HPSm, a High Performance Restricted Data Flow Architecture Having Minimal Functionality," Proceedings of the International Symposium on Computer Architecture (ISCA) 1986.
- James Smith, "Decoupled Access/Execute Computer Architectures", Proceedings of the International Symposium on Computer Architectures (ISCA), 1982.
- Jack Dennis and David Misunas, "A Preliminary Architecture for a Basic Data-Flow Processor", Proceedings of the International Symposium on Computer Architecture (ISCA) 1975.
- Gregory Papadopoulos and David Culler, "Monsoon: an Explicit Token-Store Architecture", Proceedings of the International Symposium on Computer Architecture (ISCA) 1990.
VLIW Papers:
- Robert Colwell, Robert Nix, John O'Donnel, David Papworth, and Paul Rodman, "A VLIW Architecture for a Trace Scheduling Compiler", IEEE Transactions on Computers, Vol 37, No. 8, August 1988
- Joseph Fisher, "Very Long Instruction Word Architectures and the ELI-512", Proceedings of the International Symposium on Computer Architecture (ISCA), 1983.
- Michael Smith, Mark horowitz, and Monica Lam, "Efficient Superscalar performance Through Boosting", Proceedings of the Symposium on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 1992
Other Relevant Advanced Pipelining Papers:
- Dean Tullsen, Susan Eggers, Henry Levy, "Simultaneous Multithreading: Maximizing On-Chip Parallelism", Proceedings of the International Symposium on Computer Architecture (ISCA-22), 1995
- M. Franklin and G. S. Sohi, "The Expandable Split Window Paradigm for Exploiting Fine-Grain Parallelism," Proceedings of the International Symposium on Computer Architecture (ISCA-19), 1992
- E. Rotenberg, Q. Jacobson, Y. Sazeides, and J. Smith, "Trace Processors", Proceedings of International Symposium on Microarchitecture (MICRO-30), 1997
Thoroughly review Chapter 4 of "Computer Architecture: a Quantitative Approach, Second Edition" by Hennessy and Patterson. Note that this chapter is a bit light on prediction technology, so we have supplemented below with relevant papers.
Papers on Prediction:
- Dirk Grunwald and Artur Klauser, "Confidence Estimation for Speculation Control", Proceedings of the International Symposium on Computer Architecture (ISCA), 1998
- Tse-Yu Yeh and Yale Patt, "Alternative Implementations of Two-level Adaptive Branch Prediction", Proceedings of the International Symposium on Computer Architecture (ISCA), 1992.
- Cliff Young, Nicolas Gloy, and Michael D. Smith, "A Comparative Analysis of Schemes for Correlated Branch Prediction", Proceedings of the International Symposium on Computer Architecture (ISCA), 1995.
- Cliff Young and Michael D. Smith. "Improving the Accuracy of Static Branch Prediction Using Branch Correlation," Proc. 6th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 1994.
- George Z. Chrysos and Joel S. Emer, "Memory Dependence Prediction using Store Sets", Proceedings of the International Symposium on Computer Architecture (ISCA), 1998
- Andreas Moshovos, Scott E. Breach, T.N. Vijaykumar, and Gurindar S. Sohi, "Dynamic Speculation and Synchronization of Data Dependences", Proceedings of the International Symposium on Computer Architecture (ISCA), 1997
- Avinash Sodani and Gurindar S. Sohi, "Dynamic Instruction Reuse", Proceedings of the International Symposium on Computer Architecture (ISCA), 1997
Chapter 5 of "Computer Architecture: a Quantitative Approach, Second Edition" by Hennessy and Patterson contains extensive discussion of caching technologies. The relevant papers below really supplement this information.Papers On Caching and Prefetching :
- David Kroft, "Lockup-Free Instruction Fetch/Prefetch Cache Organization", Proceedings of the International Symposium on Computer Architecture (ISCA), 1981
- Normal Jouppi, "Improving Direct-mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch buffers". Proceedings of the International Symposium on Computer Architecture (ISCA), 1981
- Todd Mowry, Monica Lam, and Anoop Gupta, "Design and Evaluation of a Compiler Algorithm for Prefetching, Proceedings of the Symposium on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 1992
- Anoop Gupta, John Hennessy, Kourosh Gharachorloo, todd Mowry, and Wolf-Dietrich Weber, "Comparative Evaluation of Latency Reducing and tolerating Techniques", Proceedings of the International Symposium on Computer Architecture (ISCA), 1991
- Subbarao Palacharla and R.E. Kessler, "Evaluating Stream Buffers as a Secondary Cache Replacement", Proceedings of the International Symposium on Computer Architecture (ISCA-21), 1994.
- Keith Farkas, Paul Chow, Norman Jouppi, and Zvonko Vranesic, "Memory-System Design Considerations for Dynamically-Scheduled Processors", Proceedings of the International Symposium on Computer Architecture (ISCA-24), 1997
- B. Ramakrishna Rau, "Pseudo-Randomly Interleaved Memory", Proceedings of the International Symposium on Computer Architecture (ISCA-18), 1991
Students should be aware of basic technology for parallel processing. This includes understanding of memory models and cache coherence, as well as an awareness of experimental prototypes. Chapter 8 of "Computer Architecture: a Quantitative Approach, Second Edition" by Hennessy and Patterson contains some basic material on parallel processing. A recent book that covers many topics of parallel computing is Parallel Computer Architecture: A Hardware/Software Approach by David E. Culler and Jaswinder Pal Singh.Papers on Cache Coherence:
- Wei C. Yen, David W.L. Yen, and King-Sun Fu, "Data Coherence Problem in a Multicache System," IEEE Transactions on Computers, Vol c-34 No. 1, January 1985.
- Sarita Adve and Mark Hill, "Weak Ordering -- A New Definition" Proceedings of the International Symposium on Computer Architecture (ISCA), June 1990
- Kourosh Gharachorloo, Daniel Lenoski, James Laudon, Phillip Gibbons, Anoop Gupta, and John Hennessy, "Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors," Proceedings of the International symposium on Computer Architecture, 1990.
- Mark D. Hill, "Multiprocessors Should Support Simple Memory-Consistency Models," IEEE Computer, August 1998
- David Chaiken, John Kubatowicz, and Anant Agarwal, "LimitLESS Directories: A Scalable Cache Coherence Scheme." Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS IV), pages 224-234, April 1991.
Papers on Experimental Parallel Machines:
- Daniel Lenoski, James Laudon, Truman Joe, David Nakahira, Luis Stevens, Anoop Gupta, and John Hennessy, "The DASH Prototype: Implementation and Performance," Proceedings of the International symposium on Computer Architecture, 1992.
- Noakes, Michael D. and Wallach, Deborah A. and Dally, William J. "The J-Machine Multicomputer: An Architectural Evaluation, Proceedings of the 20th International Symposium on Computer Architecture, 1993.
- Steven Reinhardt, James Larus, and David Wood, "Tempest and Typhoon: User-Level Shared Memory," Proceedings of International Symposium on Computer Architecture, 1994.
- Anant Agarwal, Ricardo Bianchini, David Chaiken, Kirk L. Johnson, David Kranz, John Kubiatowicz, Beng-Hong Lim, Kenneth Mackenzie, and Donald Yeung, "The MIT Alewife Machine: Architecture and Performance," Proceedings of the International symposium on Computer Architecture, 1995
- Jeffrey Kuskin, David Ofelt, Mark Heinrich, John Heinlein, Richard Simoni, Kourosh Gharachorloo, John Chapin, David Nakahira, Joel Baxter, Mark Horowitz, Anoop Gupta, Mendel Rosenblum, and John Hennessy. "The Stanford FLASH Multiprocessor" In Proceedings of the 21st International Symposium on Computer Architecture, pages 302-313, Chicago, IL, April 1994.
- Stefanos N. Damianakis, Angelos Bilas, Cezary Dubnicki, and Edward W. Felten, "Client-Server Computing on the SHRIMP Multicomputer." IEEE Micro 17(1):8-18, February 1997.
See Chapter 6 of "Computer Architecture: a Quantitative Approach, Second Edition" by Hennessy and Patterson. This contains a lot of relevant material. Make sure to understand the basic performance metrics for disks and file systems as presented in this chapter. Further, queueing theory for M/G/1 queues is presented in the context of disk I/O in this chapter.
Students are expected to have a basic understanding of networking issues, the TCP/IP protocol stack, and fast networking interfaces (for parallel processors). In addition, students should be aware of the basic issues in deadlock avoidance for network routers.General Networking:
- Chapter 7 of "Computer Architecture: a Quantitative Approach, Second Edition" by Hennessy and Patterson contains some discussion of networking.
- An understanding of TCP/IP and similar protocol issues can be gained from Internetworking with TCP/IP, Second Edition, by D.E. Comer.
- In addition, Parallel Computer Architecture: A Hardware/Software Approach by David E. Culler and Jaswinder Pal Singh contains material on networking for parallel processors.
Network Router Design/Deadlock Avoidance:
- Dally, William J., "Performance Analysis of k-ary n-cube Interconnection Networks," IEEE Transactions on Computers, Vol. 39, No. 6, June 1990, pp. 775-785.
Network Interface Design:
- Thorsten von Eicken, David E. Culler, Seth Copen Goldstein, and Klaus Erik Schauser, "Active Messages: a Mechanism for integrated Communication and Computation," Proceedings of the International symposium on Computer Architecture, 1992.
- Kenneth Mackenzie, John Kubiatowicz, Anant Agarwal, and Frans Kaashoek. "Exploiting Two-Case Delivery for Fast Protected Messaging." Proceedings of 4th Int'l Symposium on High Performance Computer Architecture Feb. 1998.
- Shubhendu S. Mukherjee, Babak Falsafi, Mark D. Hill, and David A. Wood, "Coherent Network Interfaces for Fine-Grain Communication." Proceedings of the 23rd Annual International Symposium on Computer Architecture (ISCA), May 1996.
Students should understand the basic M/M/1 and M/G/1 queues, as well as simple Hamming codes.Reading on Queueing Theory:
- Chapter 6 of "Computer Architecture: a Quantitative Approach, Second Edition" by Hennessy and Patterson contains some basic discussion of Queueing Theory. In that chapter, the M/G/1 queue is presented in the context of disk I/O.
- For a more in depth discussion, see "Queueing Systems, Volume I: Theory" by Leonard Kleinrock.
Reading on Error Correction:
- Any basic text on error correction will present the Hamming codes. One very straightforward book is "Error-Correcting Codes and Finite Fields. Student Edition", by Oliver Pretzel, Oxford University Press. This book provides a gentle introduction to basic Hamming codes as well as algebraic codes such as Reed-Solomon codes.
- M. Y. Hsiao, "A Class of Optimal Minimum Odd-weight-column SC-DED Codes", IBM J. Res Develop, vol 14, no 4, July 1970
- Shigeo Kaneda, "A Class of Odd-Weight-Column SEC-DED-SbED Codes for Memory System Applications", IEEE Transactions on computers, vol c-33, no 8, August 1984
Students are assumed to have a basic understanding of technology issues and tradeoffs. One good source of information is "Circuits, Interconnections, and Packaging for VLSI" by H. B. Bakoglu. However, this contains some pretty advanced material and should not be considered "required reading" as a whole. A good basic text is "Application-Specific Integrated Circuits," by Michael John Sebastian Smith, Addison-Wesley, 1997. This text is online at: http://www-ee.eng.hawaii.edu/~msmith/ASICs/HTML/ASICs.htm. Students should have basic understanding of issues of technology scaling, wire resistance, basic memory technologies, and IC implementation alternatives, among other things. Students should also understand current trends in modern VLSI processor design, how advances in processing effect architecture, and how advances are used to improve performance, cost, power, or all. Relevant Reading:
- For a basic understanding of IC implementation alternatives (full-custom, standard cells, FPGAs). Tradeoffs among alternatives for performance, cost, and power, see Chapter 1 of "Application-Specific Integrated Circuits". Also, Chapter 1 of "Computer Architecture: a Quantitative Approach, Second Edition" by Hennessy and Patterson contains IC cost modeling formulas.
- For a basic understanding of CMOS transistors, gate implementations, and wire and gate delay, see Chapters 2 and 3 of "Application-Specific Integrated Circuits".
- D.W. Dobberpuhl, et al, "A 200-MHz 64-b Dual-Issue CMOS Microprocessor," D. W. Dobberpuhl, et. al., IEEE Journal of Solid-State Circuits, vol. 27, No. 11, November 1992.
- J. Montanaro, et. al, "A 160-MHz, 32-b, 0.5-W CMOS RISC Microprocessor," IEEE Journal of Solid-State Circuits, vol. 31, no. 11, Novenber 1996.
- Subbarao Palacharla, Norman P. Jouppi and J. E. Smith , "Complexity Efficient Superscalar Processors", Proceedings of the 24th International Symposium on Computer Architecture