Computer Architecture Formulas
1. CPU time = Instruction count ! Clock cycles per instruction ! Clock cycle time
2. X is n times faster than Y: n =
3. Amdahl’s Law: Speedupoverall = =
4.
5.
6.
7. Availability = Mean time to fail / (Mean time to fail + Mean time to repair)
8.
where Wafer yield accounts for wafers that are so bad they need not be tested and is a parameter called the process-complexity factor, a measure of manufacturing difficulty. ranges from 11.5 to 15.5 in 2011.
9. Means—arithmetic (AM), weighted arithmetic (WAM), and geometric (GM):
AM = WAM = GM =
where Timei is the execution time for the ith program of a total of n in the workload, Weighti is the weighting of the ith program in the workload.
10. Average memory-access time = Hit time + Miss rate ! Miss penalty
11. Misses per instruction = Miss rate ! Memory access per instruction
12. Cache index size: 2index = Cache size /(Block size ! Set associativity)
13. Power Utilization Effectiveness (PUE) of a Warehouse Scale Computer =
Rules of Thumb
1. Amdahl/Case Rule: A balanced computer system needs about 1 MB of main memory capacity and 1 megabit per second of I/O bandwidth per MIPS of CPU performance.
2. 90/10 Locality Rule: A program executes about 90% of its instructions in 10% of its code.
3. Bandwidth Rule: Bandwidth grows by at least the square of the improvement in latency.
4. 2:1 Cache Rule: The miss rate of a direct-mapped cache of size N is about the same as a two-way set- associative cache of size N/2.
5. Dependability Rule: Design with no single point of failure.
6. Watt-Year Rule: The fully burdened cost of a Watt per year in a Warehouse Scale Computer in North America in 2011, including the cost of amortizing the power and cooling infrastructure, is about $2.
Execution timeY Execution timeX/ PerformanceX PerformanceY/=
Execution timeold Execution timenew -------------------------------------------
1
1 Fractionenhanced–# ) Fractionenhanced Speedupenhanced ------------------------------------+
---------------------------------------------------------------------------------------------
Energydynamic 1 2/ Capacitive load Voltage 2
!!
Powerdynamic 1 2/ Capacitive load! Voltage 2 Frequency switched! !
Powerstatic Currentstatic Voltage!
Die yield Wafer yield 1 1 Defects per unit area Die area!+ )(/ N!=
1 n --- Timei
i 1=
n
Weighti Timei!
i 1=
n n Timei
i 1=
n
Total Facility Power IT Equipment Power --------------------------------------------------
(
N N
In Praise of Computer Architecture: A Quantitative Approach Sixth Edition
“Although important concepts of architecture are timeless, this edition has been thoroughly updated with the latest technology developments, costs, examples, and references. Keeping pace with recent developments in open-sourced architec- ture, the instruction set architecture used in the book has been updated to use the RISC-V ISA.”
—from the foreword by Norman P. Jouppi, Google
“Computer Architecture: A Quantitative Approach is a classic that, like fine wine, just keeps getting better. I bought my first copy as I finished up my undergraduate degree and it remains one of my most frequently referenced texts today.”
—James Hamilton, Amazon Web Service
“Hennessy and Patterson wrote the first edition of this book when graduate stu- dents built computers with 50,000 transistors. Today, warehouse-size computers contain that many servers, each consisting of dozens of independent processors and billions of transistors. The evolution of computer architecture has been rapid and relentless, butComputer Architecture: A Quantitative Approach has kept pace, with each edition accurately explaining and analyzing the important emerging ideas that make this field so exciting.”
—James Larus, Microsoft Research
“Another timely and relevant update to a classic, once again also serving as a win- dow into the relentless and exciting evolution of computer architecture! The new discussions in this edition on the slowing of Moore's law and implications for future systems are must-reads for both computer architects and practitioners working on broader systems.”
—Parthasarathy (Partha) Ranganathan, Google
“I love the ‘Quantitative Approach’ books because they are written by engineers, for engineers. John Hennessy and Dave Patterson show the limits imposed by mathematics and the possibilities enabled by materials science. Then they teach through real-world examples how architects analyze, measure, and compromise to build working systems. This sixth edition comes at a critical time: Moore’s Law is fading just as deep learning demands unprecedented compute cycles. The new chapter on domain-specific architectures documents a number of prom- ising approaches and prophesies a rebirth in computer architecture. Like the scholars of the European Renaissance, computer architects must understand our own history, and then combine the lessons of that history with new techniques to remake the world.”
—Cliff Young, Google
She Zinan
This page intentionally left blank
Computer Architecture A Quantitative Approach
Sixth Edition
John L. Hennessy is a Professor of Electrical Engineering and Computer Science at Stanford University, where he has been a member of the faculty since 1977 and was, from 2000 to 2016, its 10th President. He currently serves as the Director of the Knight-Hennessy Fellow- ship, which provides graduate fellowships to potential future leaders. Hennessy is a Fellow of the IEEE and ACM, a member of the National Academy of Engineering, the National Acad- emy of Science, and the American Philosophical Society, and a Fellow of the American Acad- emy of Arts and Sciences. Among his many awards are the 2001 Eckert-Mauchly Award for his contributions to RISC technology, the 2001 Seymour Cray Computer Engineering Award, and the 2000 John von Neumann Award, which he shared with David Patterson. He has also received 10 honorary doctorates.
In 1981, he started the MIPS project at Stanford with a handful of graduate students. After completing the project in 1984, he took a leave from the university to cofound MIPS Com- puter Systems, which developed one of the first commercial RISC microprocessors. As of 2017, over 5 billion MIPS microprocessors have been shipped in devices ranging from video games and palmtop computers to laser printers and network switches. Hennessy subse- quently led the DASH (Director Architecture for Shared Memory) project, which prototyped the first scalable cache coherent multiprocessor; many of the key ideas have been adopted in modern multiprocessors. In addition to his technical activities and university responsibil- ities, he has continued to work with numerous start-ups, both as an early-stage advisor and an investor.
David A. Patterson became a Distinguished Engineer at Google in 2016 after 40 years as a UC Berkeley professor. He joined UC Berkeley immediately after graduating from UCLA. He still spends a day a week in Berkeley as an Emeritus Professor of Computer Science. His teaching has been honored by the Distinguished Teaching Award from the University of California, the Karlstrom Award from ACM, and the Mulligan Education Medal and Under- graduate Teaching Award from IEEE. Patterson received the IEEE Technical Achievement Award and the ACM Eckert-Mauchly Award for contributions to RISC, and he shared the IEEE Johnson Information Storage Award for contributions to RAID. He also shared the IEEE John von NeumannMedal and the C & C Prize with John Hennessy. Like his co-author, Patterson is a Fellow of the American Academy of Arts and Sciences, the Computer History Museum, ACM, and IEEE, and he was elected to the National Academy of Engineering, the National Academy of Sciences, and the Silicon Valley Engineering Hall of Fame. He served on the Information Technology Advisory Committee to the President of the United States, as chair of the CS division in the Berkeley EECS department, as chair of the Computing Research Association, and as President of ACM. This record led to Distinguished Service Awards from ACM, CRA, and SIGARCH. He is currently Vice-Chair of the Board of Directors of the RISC-V Foundation.
At Berkeley, Patterson led the design and implementation of RISC I, likely the first VLSI reduced instruction set computer, and the foundation of the commercial SPARC architec- ture. He was a leader of the Redundant Arrays of Inexpensive Disks (RAID) project, which led to dependable storage systems frommany companies. He was also involved in the Network of Workstations (NOW) project, which led to cluster technology used by Internet companies and later to cloud computing. His current interests are in designing domain-specific archi- tectures for machine learning, spreading the word on the open RISC-V instruction set archi- tecture, and in helping the UC Berkeley RISELab (Real-time Intelligent Secure Execution).
Computer Architecture A Quantitative Approach
Sixth Edition
John L. Hennessy Stanford University
David A. Patterson University of California, Berkeley
With Contributions by
Krste Asanovi!c University of California, Berkeley Jason D. Bakos University of South Carolina Robert P. Colwell R&E Colwell & Assoc. Inc. Abhishek Bhattacharjee Rutgers University Thomas M. Conte Georgia Tech Jos!e Duato Proemisa Diana Franklin University of Chicago David Goldberg eBay
Norman P. Jouppi Google Sheng Li Intel Labs Naveen Muralimanohar HP Labs Gregory D. Peterson University of Tennessee Timothy M. Pinkston University of Southern California Parthasarathy Ranganathan Google David A. Wood University of Wisconsin–Madison Cliff Young Google Amr Zaky University of Santa Clara
Morgan Kaufmann is an imprint of Elsevier 50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States
© 2019 Elsevier Inc. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions.
This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein).
Notices Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary.
Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility.
To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein.
Library of Congress Cataloging-in-Publication Data A catalog record for this book is available from the Library of Congress
British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library
ISBN: 978-0-12-811905-1
For information on all Morgan Kaufmann publications visit our website at https://www.elsevier.com/books-and-journals
Publisher: Katey Birtcher Acquisition Editor: Stephen Merken Developmental Editor: Nate McFadden Production Project Manager: Stalin Viswanathan Cover Designer: Christian J. Bilbow
Typeset by SPi Global, India
http://www.elsevier.com/permissions
https://www.elsevier.com/books-and-journals
To Andrea, Linda, and our four sons
This page intentionally left blank
Foreword
by Norman P. Jouppi, Google
Much of the improvement in computer performance over the last 40 years has been provided by computer architecture advancements that have leveraged Moore’s Law and Dennard scaling to build larger and more parallel systems. Moore’s Law is the observation that the maximum number of transistors in an integrated circuit doubles approximately every two years. Dennard scaling refers to the reduc- tion of MOS supply voltage in concert with the scaling of feature sizes, so that as transistors get smaller, their power density stays roughly constant. With the end of Dennard scaling a decade ago, and the recent slowdown of Moore’s Law due to a combination of physical limitations and economic factors, the sixth edition of the preeminent textbook for our field couldn’t be more timely. Here are some reasons.
First, because domain-specific architectures can provide equivalent perfor- mance and power benefits of three or more historical generations of Moore’s Law and Dennard scaling, they now can provide better implementations than may ever be possible with future scaling of general-purpose architectures. And with the diverse application space of computers today, there are many potential areas for architectural innovation with domain-specific architectures. Second, high-quality implementations of open-source architectures now have a much lon- ger lifetime due to the slowdown in Moore’s Law. This gives them more oppor- tunities for continued optimization and refinement, and hence makes them more attractive. Third, with the slowing of Moore’s Law, different technology compo- nents have been scaling heterogeneously. Furthermore, new technologies such as 2.5D stacking, new nonvolatile memories, and optical interconnects have been developed to provide more than Moore’s Law can supply alone. To use these new technologies and nonhomogeneous scaling effectively, fundamental design decisions need to be reexamined from first principles. Hence it is important for students, professors, and practitioners in the industry to be skilled in a wide range of both old and new architectural techniques. All told, I believe this is the most exciting time in computer architecture since the industrial exploitation of instruction-level parallelism in microprocessors 25 years ago.
The largest change in this edition is the addition of a new chapter on domain- specific architectures. It’s long been known that customized domain-specific archi- tectures can have higher performance, lower power, and require less silicon area than general-purpose processor implementations. However when general-purpose
ix
processors were increasing in single-threaded performance by 40% per year (see Fig. 1.11), the extra time to market required to develop a custom architecture vs. using a leading-edge standard microprocessor could cause the custom architecture to lose much of its advantage. In contrast, today single-core performance is improving very slowly, meaning that the benefits of custom architectures will not be made obsolete by general-purpose processors for a very long time, if ever. Chapter 7 covers several domain-specific architectures. Deep neural networks have very high computation requirements but lower data precision requirements – this combination can benefit significantly from custom architectures. Two example architectures and implementations for deep neural networks are presented: one optimized for inference and a second optimized for training. Image processing is another example domain; it also has high computation demands and benefits from lower-precision data types. Furthermore, since it is often found in mobile devices, the power savings from custom architectures are also very valuable. Finally, by nature of their reprogrammability, FPGA-based accelerators can be used to implement a variety of different domain-specific architectures on a single device. They also can benefit more irregular applications that are frequently updated, like accelerating internet search.
Although important concepts of architecture are timeless, this edition has been thoroughly updated with the latest technology developments, costs, examples, and references. Keeping pace with recent developments in open-sourced architecture, the instruction set architecture used in the book has been updated to use the RISC-V ISA.
On a personal note, after enjoying the privilege of working with John as a grad- uate student, I am now enjoying the privilege of working with Dave at Google. What an amazing duo!
x ■ Foreword
Contents
Foreword ix
Preface xvii
Acknowledgments xxv
Chapter 1 Fundamentals of Quantitative Design and Analysis
1.1 Introduction 2 1.2 Classes of Computers 6 1.3 Defining Computer Architecture 11 1.4 Trends in Technology 18 1.5 Trends in Power and Energy in Integrated Circuits 23 1.6 Trends in Cost 29 1.7 Dependability 36 1.8 Measuring, Reporting, and Summarizing Performance 39 1.9 Quantitative Principles of Computer Design 48 1.10 Putting It All Together: Performance, Price, and Power 55 1.11 Fallacies and Pitfalls 58 1.12 Concluding Remarks 64 1.13 Historical Perspectives and References 67