Recent Orders

Our Reviews

Sample Papers

How It Works

Get First 2 Pages Of Your Homework Absolutely Free!

Messages

Welcome to TutorsOnSpot.Com!

World's No. 1 Assignment Writing Market

Post Your Homework

Proposals

Post your homework and get free proposals here!

Post Your Homework

Stuck in your homework and missing deadline? Get urgent help in $10/Page with 24 hours deadline

Get Urgent Writing Help In Your Essays, Assignments, Homeworks, Dissertation, Thesis Or Coursework & Achieve A+ Grades.

Privacy Guaranteed - 100% Plagiarism Free Writing - Free Turnitin Report - Professional And Experienced Writers - 24/7 Online Support

Get Free Quotes Post Your Requirements

Installed memory ram 4gb 2.94 gb usable

03/11/2021 Client: muhammad11 Deadline: 2 Day

Computer Architecture Formulas

1. CPU time = Instruction count ! Clock cycles per instruction ! Clock cycle time

2. X is n times faster than Y: n =

3. Amdahl’s Law: Speedupoverall = =

7. Availability = Mean time to fail / (Mean time to fail + Mean time to repair)

where Wafer yield accounts for wafers that are so bad they need not be tested and is a parameter called the process-complexity factor, a measure of manufacturing difficulty. ranges from 11.5 to 15.5 in 2011.

9. Means—arithmetic (AM), weighted arithmetic (WAM), and geometric (GM):

AM = WAM = GM =

where Timei is the execution time for the ith program of a total of n in the workload, Weighti is the weighting of the ith program in the workload.

10. Average memory-access time = Hit time + Miss rate ! Miss penalty

11. Misses per instruction = Miss rate ! Memory access per instruction

12. Cache index size: 2index = Cache size /(Block size ! Set associativity)

13. Power Utilization Effectiveness (PUE) of a Warehouse Scale Computer =

Rules of Thumb

1. Amdahl/Case Rule: A balanced computer system needs about 1 MB of main memory capacity and 1 megabit per second of I/O bandwidth per MIPS of CPU performance.

2. 90/10 Locality Rule: A program executes about 90% of its instructions in 10% of its code.

3. Bandwidth Rule: Bandwidth grows by at least the square of the improvement in latency.

4. 2:1 Cache Rule: The miss rate of a direct-mapped cache of size N is about the same as a two-way set- associative cache of size N/2.

5. Dependability Rule: Design with no single point of failure.

6. Watt-Year Rule: The fully burdened cost of a Watt per year in a Warehouse Scale Computer in North America in 2011, including the cost of amortizing the power and cooling infrastructure, is about $2.

Execution timeY Execution timeX/ PerformanceX PerformanceY/=

Execution timeold Execution timenew -------------------------------------------

1 Fractionenhanced–# ) Fractionenhanced Speedupenhanced ------------------------------------+

---------------------------------------------------------------------------------------------

Energydynamic 1 2/ Capacitive load Voltage 2

Powerdynamic 1 2/ Capacitive load! Voltage 2 Frequency switched! !

Powerstatic Currentstatic Voltage!

Die yield Wafer yield 1 1 Defects per unit area Die area!+ )(/ N!=

1 n --- Timei

i 1=

Weighti Timei!

i 1=

n n Timei

i 1=

Total Facility Power IT Equipment Power --------------------------------------------------

(

N N

In Praise of Computer Architecture: A Quantitative Approach Sixth Edition

“Although important concepts of architecture are timeless, this edition has been thoroughly updated with the latest technology developments, costs, examples, and references. Keeping pace with recent developments in open-sourced architec- ture, the instruction set architecture used in the book has been updated to use the RISC-V ISA.”

—from the foreword by Norman P. Jouppi, Google

“Computer Architecture: A Quantitative Approach is a classic that, like fine wine, just keeps getting better. I bought my first copy as I finished up my undergraduate degree and it remains one of my most frequently referenced texts today.”

—James Hamilton, Amazon Web Service

“Hennessy and Patterson wrote the first edition of this book when graduate stu- dents built computers with 50,000 transistors. Today, warehouse-size computers contain that many servers, each consisting of dozens of independent processors and billions of transistors. The evolution of computer architecture has been rapid and relentless, butComputer Architecture: A Quantitative Approach has kept pace, with each edition accurately explaining and analyzing the important emerging ideas that make this field so exciting.”

—James Larus, Microsoft Research

“Another timely and relevant update to a classic, once again also serving as a win- dow into the relentless and exciting evolution of computer architecture! The new discussions in this edition on the slowing of Moore's law and implications for future systems are must-reads for both computer architects and practitioners working on broader systems.”

—Parthasarathy (Partha) Ranganathan, Google

“I love the ‘Quantitative Approach’ books because they are written by engineers, for engineers. John Hennessy and Dave Patterson show the limits imposed by mathematics and the possibilities enabled by materials science. Then they teach through real-world examples how architects analyze, measure, and compromise to build working systems. This sixth edition comes at a critical time: Moore’s Law is fading just as deep learning demands unprecedented compute cycles. The new chapter on domain-specific architectures documents a number of prom- ising approaches and prophesies a rebirth in computer architecture. Like the scholars of the European Renaissance, computer architects must understand our own history, and then combine the lessons of that history with new techniques to remake the world.”

—Cliff Young, Google

She Zinan
This page intentionally left blank

Computer Architecture A Quantitative Approach

Sixth Edition

John L. Hennessy is a Professor of Electrical Engineering and Computer Science at Stanford University, where he has been a member of the faculty since 1977 and was, from 2000 to 2016, its 10th President. He currently serves as the Director of the Knight-Hennessy Fellow- ship, which provides graduate fellowships to potential future leaders. Hennessy is a Fellow of the IEEE and ACM, a member of the National Academy of Engineering, the National Acad- emy of Science, and the American Philosophical Society, and a Fellow of the American Acad- emy of Arts and Sciences. Among his many awards are the 2001 Eckert-Mauchly Award for his contributions to RISC technology, the 2001 Seymour Cray Computer Engineering Award, and the 2000 John von Neumann Award, which he shared with David Patterson. He has also received 10 honorary doctorates.

In 1981, he started the MIPS project at Stanford with a handful of graduate students. After completing the project in 1984, he took a leave from the university to cofound MIPS Com- puter Systems, which developed one of the first commercial RISC microprocessors. As of 2017, over 5 billion MIPS microprocessors have been shipped in devices ranging from video games and palmtop computers to laser printers and network switches. Hennessy subse- quently led the DASH (Director Architecture for Shared Memory) project, which prototyped the first scalable cache coherent multiprocessor; many of the key ideas have been adopted in modern multiprocessors. In addition to his technical activities and university responsibil- ities, he has continued to work with numerous start-ups, both as an early-stage advisor and an investor.

David A. Patterson became a Distinguished Engineer at Google in 2016 after 40 years as a UC Berkeley professor. He joined UC Berkeley immediately after graduating from UCLA. He still spends a day a week in Berkeley as an Emeritus Professor of Computer Science. His teaching has been honored by the Distinguished Teaching Award from the University of California, the Karlstrom Award from ACM, and the Mulligan Education Medal and Under- graduate Teaching Award from IEEE. Patterson received the IEEE Technical Achievement Award and the ACM Eckert-Mauchly Award for contributions to RISC, and he shared the IEEE Johnson Information Storage Award for contributions to RAID. He also shared the IEEE John von NeumannMedal and the C & C Prize with John Hennessy. Like his co-author, Patterson is a Fellow of the American Academy of Arts and Sciences, the Computer History Museum, ACM, and IEEE, and he was elected to the National Academy of Engineering, the National Academy of Sciences, and the Silicon Valley Engineering Hall of Fame. He served on the Information Technology Advisory Committee to the President of the United States, as chair of the CS division in the Berkeley EECS department, as chair of the Computing Research Association, and as President of ACM. This record led to Distinguished Service Awards from ACM, CRA, and SIGARCH. He is currently Vice-Chair of the Board of Directors of the RISC-V Foundation.

At Berkeley, Patterson led the design and implementation of RISC I, likely the first VLSI reduced instruction set computer, and the foundation of the commercial SPARC architec- ture. He was a leader of the Redundant Arrays of Inexpensive Disks (RAID) project, which led to dependable storage systems frommany companies. He was also involved in the Network of Workstations (NOW) project, which led to cluster technology used by Internet companies and later to cloud computing. His current interests are in designing domain-specific archi- tectures for machine learning, spreading the word on the open RISC-V instruction set archi- tecture, and in helping the UC Berkeley RISELab (Real-time Intelligent Secure Execution).

Computer Architecture A Quantitative Approach

Sixth Edition

John L. Hennessy Stanford University

David A. Patterson University of California, Berkeley

With Contributions by

Krste Asanovi!c University of California, Berkeley Jason D. Bakos University of South Carolina Robert P. Colwell R&E Colwell & Assoc. Inc. Abhishek Bhattacharjee Rutgers University Thomas M. Conte Georgia Tech Jos!e Duato Proemisa Diana Franklin University of Chicago David Goldberg eBay

Norman P. Jouppi Google Sheng Li Intel Labs Naveen Muralimanohar HP Labs Gregory D. Peterson University of Tennessee Timothy M. Pinkston University of Southern California Parthasarathy Ranganathan Google David A. Wood University of Wisconsin–Madison Cliff Young Google Amr Zaky University of Santa Clara

Morgan Kaufmann is an imprint of Elsevier 50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States

No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions.

This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein).

Notices Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary.

Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility.

To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein.

Library of Congress Cataloging-in-Publication Data A catalog record for this book is available from the Library of Congress

British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library

ISBN: 978-0-12-811905-1

For information on all Morgan Kaufmann publications visit our website at https://www.elsevier.com/books-and-journals

Publisher: Katey Birtcher Acquisition Editor: Stephen Merken Developmental Editor: Nate McFadden Production Project Manager: Stalin Viswanathan Cover Designer: Christian J. Bilbow

Typeset by SPi Global, India

http://www.elsevier.com/permissions
https://www.elsevier.com/books-and-journals
To Andrea, Linda, and our four sons

This page intentionally left blank

Foreword

by Norman P. Jouppi, Google

Much of the improvement in computer performance over the last 40 years has been provided by computer architecture advancements that have leveraged Moore’s Law and Dennard scaling to build larger and more parallel systems. Moore’s Law is the observation that the maximum number of transistors in an integrated circuit doubles approximately every two years. Dennard scaling refers to the reduc- tion of MOS supply voltage in concert with the scaling of feature sizes, so that as transistors get smaller, their power density stays roughly constant. With the end of Dennard scaling a decade ago, and the recent slowdown of Moore’s Law due to a combination of physical limitations and economic factors, the sixth edition of the preeminent textbook for our field couldn’t be more timely. Here are some reasons.

First, because domain-specific architectures can provide equivalent perfor- mance and power benefits of three or more historical generations of Moore’s Law and Dennard scaling, they now can provide better implementations than may ever be possible with future scaling of general-purpose architectures. And with the diverse application space of computers today, there are many potential areas for architectural innovation with domain-specific architectures. Second, high-quality implementations of open-source architectures now have a much lon- ger lifetime due to the slowdown in Moore’s Law. This gives them more oppor- tunities for continued optimization and refinement, and hence makes them more attractive. Third, with the slowing of Moore’s Law, different technology compo- nents have been scaling heterogeneously. Furthermore, new technologies such as 2.5D stacking, new nonvolatile memories, and optical interconnects have been developed to provide more than Moore’s Law can supply alone. To use these new technologies and nonhomogeneous scaling effectively, fundamental design decisions need to be reexamined from first principles. Hence it is important for students, professors, and practitioners in the industry to be skilled in a wide range of both old and new architectural techniques. All told, I believe this is the most exciting time in computer architecture since the industrial exploitation of instruction-level parallelism in microprocessors 25 years ago.

The largest change in this edition is the addition of a new chapter on domain- specific architectures. It’s long been known that customized domain-specific archi- tectures can have higher performance, lower power, and require less silicon area than general-purpose processor implementations. However when general-purpose

processors were increasing in single-threaded performance by 40% per year (see Fig. 1.11), the extra time to market required to develop a custom architecture vs. using a leading-edge standard microprocessor could cause the custom architecture to lose much of its advantage. In contrast, today single-core performance is improving very slowly, meaning that the benefits of custom architectures will not be made obsolete by general-purpose processors for a very long time, if ever. Chapter 7 covers several domain-specific architectures. Deep neural networks have very high computation requirements but lower data precision requirements – this combination can benefit significantly from custom architectures. Two example architectures and implementations for deep neural networks are presented: one optimized for inference and a second optimized for training. Image processing is another example domain; it also has high computation demands and benefits from lower-precision data types. Furthermore, since it is often found in mobile devices, the power savings from custom architectures are also very valuable. Finally, by nature of their reprogrammability, FPGA-based accelerators can be used to implement a variety of different domain-specific architectures on a single device. They also can benefit more irregular applications that are frequently updated, like accelerating internet search.

Although important concepts of architecture are timeless, this edition has been thoroughly updated with the latest technology developments, costs, examples, and references. Keeping pace with recent developments in open-sourced architecture, the instruction set architecture used in the book has been updated to use the RISC-V ISA.

On a personal note, after enjoying the privilege of working with John as a grad- uate student, I am now enjoying the privilege of working with Dave at Google. What an amazing duo!

x ■ Foreword

Contents

Foreword ix

Preface xvii

Acknowledgments xxv

Chapter 1 Fundamentals of Quantitative Design and Analysis

1.1 Introduction 2 1.2 Classes of Computers 6 1.3 Defining Computer Architecture 11 1.4 Trends in Technology 18 1.5 Trends in Power and Energy in Integrated Circuits 23 1.6 Trends in Cost 29 1.7 Dependability 36 1.8 Measuring, Reporting, and Summarizing Performance 39 1.9 Quantitative Principles of Computer Design 48 1.10 Putting It All Together: Performance, Price, and Power 55 1.11 Fallacies and Pitfalls 58 1.12 Concluding Remarks 64 1.13 Historical Perspectives and References 67

Case Studies and Exercises by Diana Franklin 67

Chapter 2 Memory Hierarchy Design

2.1 Introduction 78 2.2 Memory Technology and Optimizations 84 2.3 Ten Advanced Optimizations of Cache Performance 94 2.4 Virtual Memory and Virtual Machines 118 2.5 Cross-Cutting Issues: The Design of Memory Hierarchies 126 2.6 Putting It All Together: Memory Hierarchies in the ARM Cortex-A53

and Intel Core i7 6700 129 2.7 Fallacies and Pitfalls 142 2.8 Concluding Remarks: Looking Ahead 146 2.9 Historical Perspectives and References 148

Case Studies and Exercises by Norman P. Jouppi, Rajeev Balasubramonian, Naveen Muralimanohar, and Sheng Li 148

Chapter 3 Instruction-Level Parallelism and Its Exploitation

3.1 Instruction-Level Parallelism: Concepts and Challenges 168 3.2 Basic Compiler Techniques for Exposing ILP 176 3.3 Reducing Branch Costs With Advanced Branch Prediction 182 3.4 Overcoming Data Hazards With Dynamic Scheduling 191 3.5 Dynamic Scheduling: Examples and the Algorithm 201 3.6 Hardware-Based Speculation 208 3.7 Exploiting ILP Using Multiple Issue and Static Scheduling 218 3.8 Exploiting ILP Using Dynamic Scheduling, Multiple Issue, and

Speculation 222 3.9 Advanced Techniques for Instruction Delivery and Speculation 228 3.10 Cross-Cutting Issues 240 3.11 Multithreading: Exploiting Thread-Level Parallelism to Improve

Uniprocessor Throughput 242 3.12 Putting It All Together: The Intel Core i7 6700 and ARM Cortex-A53 247 3.13 Fallacies and Pitfalls 258 3.14 Concluding Remarks: What’s Ahead? 264 3.15 Historical Perspective and References 266

Case Studies and Exercises by Jason D. Bakos and Robert P. Colwell 266

Chapter 4 Data-Level Parallelism in Vector, SIMD, and GPU Architectures

4.1 Introduction 282 4.2 Vector Architecture 283 4.3 SIMD Instruction Set Extensions for Multimedia 304 4.4 Graphics Processing Units 310 4.5 Detecting and Enhancing Loop-Level Parallelism 336 4.6 Cross-Cutting Issues 345 4.7 Putting It All Together: Embedded Versus Server GPUs and

Tesla Versus Core i7 346 4.8 Fallacies and Pitfalls 353 4.9 Concluding Remarks 357 4.10 Historical Perspective and References 357

Case Study and Exercises by Jason D. Bakos 357

Chapter 5 Thread-Level Parallelism

5.1 Introduction 368 5.2 Centralized Shared-Memory Architectures 377 5.3 Performance of Symmetric Shared-Memory Multiprocessors 393

xii ■ Contents

5.4 Distributed Shared-Memory and Directory-Based Coherence 404 5.5 Synchronization: The Basics 412 5.6 Models of Memory Consistency: An Introduction 417 5.7 Cross-Cutting Issues 422 5.8 Putting It All Together: Multicore Processors and Their Performance 426 5.9 Fallacies and Pitfalls 438 5.10 The Future of Multicore Scaling 442 5.11 Concluding Remarks 444 5.12 Historical Perspectives and References 445

Case Studies and Exercises by Amr Zaky and David A. Wood 446

Chapter 6 Warehouse-Scale Computers to Exploit Request-Level and Data-Level Parallelism

6.1 Introduction 466 6.2 Programming Models and Workloads for Warehouse-Scale

Computers 471 6.3 Computer Architecture of Warehouse-Scale Computers 477 6.4 The Efficiency and Cost of Warehouse-Scale Computers 482 6.5 Cloud Computing: The Return of Utility Computing 490 6.6 Cross-Cutting Issues 501 6.7 Putting It All Together: A Google Warehouse-Scale Computer 503 6.8 Fallacies and Pitfalls 514 6.9 Concluding Remarks 518 6.10 Historical Perspectives and References 519

Case Studies and Exercises by Parthasarathy Ranganathan 519

Chapter 7 Domain-Specific Architectures

7.1 Introduction 540 7.2 Guidelines for DSAs 543 7.3 Example Domain: Deep Neural Networks 544 7.4 Google’s Tensor Processing Unit, an Inference Data

Center Accelerator 557 7.5 Microsoft Catapult, a Flexible Data Center Accelerator 567 7.6 Intel Crest, a Data Center Accelerator for Training 579 7.7 Pixel Visual Core, a Personal Mobile Device Image Processing Unit 579 7.8 Cross-Cutting Issues 592 7.9 Putting It All Together: CPUs Versus GPUs Versus DNN Accelerators 595 7.10 Fallacies and Pitfalls 602 7.11 Concluding Remarks 604 7.12 Historical Perspectives and References 606

Case Studies and Exercises by Cliff Young 606

Contents ■ xiii

Appendix A Instruction Set Principles

A.1 Introduction A-2 A.2 Classifying Instruction Set Architectures A-3 A.3 Memory Addressing A-7 A.4 Type and Size of Operands A-13 A.5 Operations in the Instruction Set A-15 A.6 Instructions for Control Flow A-16 A.7 Encoding an Instruction Set A-21 A.8 Cross-Cutting Issues: The Role of Compilers A-24 A.9 Putting It All Together: The RISC-V Architecture A-33 A.10 Fallacies and Pitfalls A-42 A.11 Concluding Remarks A-46 A.12 Historical Perspective and References A-47

Exercises by Gregory D. Peterson A-47

Appendix B Review of Memory Hierarchy

B.1 Introduction B-2 B.2 Cache Performance B-15 B.3 Six Basic Cache Optimizations B-22 B.4 Virtual Memory B-40 B.5 Protection and Examples of Virtual Memory B-49 B.6 Fallacies and Pitfalls B-57 B.7 Concluding Remarks B-59 B.8 Historical Perspective and References B-59

Exercises by Amr Zaky B-60

Appendix C Pipelining: Basic and Intermediate Concepts

C.1 Introduction C-2 C.2 The Major Hurdle of Pipelining—Pipeline Hazards C-10 C.3 How Is Pipelining Implemented? C-26 C.4 What Makes Pipelining Hard to Implement? C-37 C.5 Extending the RISC V Integer Pipeline to Handle Multicycle

Operations C-45 C.6 Putting It All Together: The MIPS R4000 Pipeline C-55 C.7 Cross-Cutting Issues C-65 C.8 Fallacies and Pitfalls C-70 C.9 Concluding Remarks C-71 C.10 Historical Perspective and References C-71

Updated Exercises by Diana Franklin C-71

xiv ■ Contents

Online Appendices

Appendix D Storage Systems

Appendix E Embedded Systems by Thomas M. Conte

Appendix F Interconnection Networks by Timothy M. Pinkston and Jos!e Duato

Appendix G Vector Processors in More Depth by Krste Asanovic

Appendix H Hardware and Software for VLIW and EPIC

Appendix I Large-Scale Multiprocessors and Scientific Applications

Appendix J Computer Arithmetic by David Goldberg

Appendix K Survey of Instruction Set Architectures

Appendix L Advanced Concepts on Address Translation by Abhishek Bhattacharjee

Appendix M Historical Perspectives and References

References R-1

Index I-1

Contents ■ xv

This page intentionally left blank

Preface

Why We Wrote This Book

Through six editions of this book, our goal has been to describe the basic principles underlying what will be tomorrow’s technological developments. Our excitement about the opportunities in computer architecture has not abated, and we echo what we said about the field in the first edition: “It is not a dreary science of paper machines that will never work. No! It’s a discipline of keen intellectual interest, requiring the balance of marketplace forces to cost-performance-power, leading to glorious failures and some notable successes.”

Our primary objective in writing our first book was to change the way people learn and think about computer architecture. We feel this goal is still valid and important. The field is changing daily and must be studied with real examples and measurements on real computers, rather than simply as a collection of defini- tions and designs that will never need to be realized. We offer an enthusiastic wel- come to anyone who came along with us in the past, as well as to those who are joining us now. Either way, we can promise the same quantitative approach to, and analysis of, real systems.

As with earlier versions, we have strived to produce a new edition that will continue to be as relevant for professional engineers and architects as it is for those involved in advanced computer architecture and design courses. Like the first edi- tion, this edition has a sharp focus on new platforms—personal mobile devices and warehouse-scale computers—and new architectures—specifically, domain- specific architectures. As much as its predecessors, this edition aims to demystify computer architecture through an emphasis on cost-performance-energy trade-offs and good engineering design. We believe that the field has continued to mature and move toward the rigorous quantitative foundation of long-established scientific and engineering disciplines.

xvii

This Edition

The ending of Moore’s Law and Dennard scaling is having as profound effect on computer architecture as did the switch to multicore. We retain the focus on the extremes in size of computing, with personal mobile devices (PMDs) such as cell phones and tablets as the clients and warehouse-scale computers offering cloud computing as the server. We also maintain the other theme of parallelism in all its forms: data-level parallelism (DLP) in Chapters 1 and 4, instruction-level par- allelism (ILP) in Chapter 3, thread-level parallelism in Chapter 5, and request- level parallelism (RLP) in Chapter 6.

The most pervasive change in this edition is switching fromMIPS to the RISC- V instruction set. We suspect this modern, modular, open instruction set may become a significant force in the information technology industry. It may become as important in computer architecture as Linux is for operating systems.

The newcomer in this edition is Chapter 7, which introduces domain-specific architectures with several concrete examples from industry.

As before, the first three appendices in the book give basics on the RISC-V instruction set, memory hierarchy, and pipelining for readers who have not read a book like Computer Organization and Design. To keep costs down but still sup- ply supplemental material that is of interest to some readers, available online at https://www.elsevier.com/books-and-journals/book-companion/9780128119051 are nine more appendices. There are more pages in these appendices than there are in this book!

This edition continues the tradition of using real-world examples to demonstrate the ideas, and the “Putting ItAll Together” sections are brand new.The “Putting ItAll Together” sectionsof this edition include thepipelineorganizationsandmemoryhier- archies of the ARM Cortex A8 processor, the Intel core i7 processor, the NVIDIA GTX-280 and GTX-480 GPUs, and one of the Google warehouse-scale computers.

Topic Selection and Organization

As before, we have taken a conservative approach to topic selection, for there are many more interesting ideas in the field than can reasonably be covered in a treat- ment of basic principles. We have steered away from a comprehensive survey of every architecture a reader might encounter. Instead, our presentation focuses on core concepts likely to be found in any new machine. The key criterion remains that of selecting ideas that have been examined and utilized successfully enough to permit their discussion in quantitative terms.

Our intent has always been to focus on material that is not available in equiv- alent form from other sources, so we continue to emphasize advanced content wherever possible. Indeed, there are several systems here whose descriptions can- not be found in the literature. (Readers interested strictly in a more basic introduc- tion to computer architecture should readComputer Organization and Design: The Hardware/Software Interface.)

xviii ■ Preface

https://www.elsevier.com/books-and-journals/book-companion/9780128119051
An Overview of the Content

Chapter 1 includes formulas for energy, static power, dynamic power, integrated cir- cuit costs, reliability, and availability. (These formulas are also found on the front inside cover.) Our hope is that these topics can be used through the rest of the book. In addition to the classic quantitative principles of computer design and performance measurement, it shows the slowing of performance improvement of general-purpose microprocessors, which is one inspiration for domain-specific architectures.

Our view is that the instruction set architecture is playing less of a role today than in 1990, so we moved this material to Appendix A. It now uses the RISC-V architecture. (For quick review, a summary of the RISC-V ISA can be found on the back inside cover.) For fans of ISAs, Appendix K was revised for this edition and covers 8 RISC architectures (5 for desktop and server use and 3 for embedded use), the 80!86, the DEC VAX, and the IBM 360/370.

We then move onto memory hierarchy in Chapter 2, since it is easy to apply the cost-performance-energy principles to this material, and memory is a critical resource for the rest of the chapters. As in the past edition, Appendix B contains an introductory review of cache principles, which is available in case you need it. Chapter 2 discusses 10 advanced optimizations of caches. The chapter includes virtual machines, which offer advantages in protection, software management, and hardware management, and play an important role in cloud computing. In addition to covering SRAM and DRAM technologies, the chapter includes new material both on Flash memory and on the use of stacked die packaging for extend- ing the memory hierarchy. The PIAT examples are the ARM Cortex A8, which is used in PMDs, and the Intel Core i7, which is used in servers.

Chapter 3 covers the exploitation of instruction-level parallelism in high- performance processors, including superscalar execution, branch prediction (including the new tagged hybrid predictors), speculation, dynamic scheduling, and simultaneous multithreading. As mentioned earlier, Appendix C is a review of pipelining in case you need it. Chapter 3 also surveys the limits of ILP. Like Chapter 2, the PIAT examples are again the ARM Cortex A8 and the Intel Core i7. While the third edition contained a great deal on Itanium and VLIW, this mate- rial is now in Appendix H, indicating our view that this architecture did not live up to the earlier claims.

The increasing importance of multimedia applications such as games and video processing has also increased the importance of architectures that can exploit data level parallelism. In particular, there is a rising interest in computing using graph- ical processing units (GPUs), yet few architects understand howGPUs really work. We decided to write a new chapter in large part to unveil this new style of computer architecture. Chapter 4 starts with an introduction to vector architectures, which acts as a foundation on which to build explanations of multimedia SIMD instruc- tion set extensions and GPUs. (Appendix G goes into even more depth on vector architectures.) This chapter introduces the Roofline performance model and then uses it to compare the Intel Core i7 and the NVIDIAGTX 280 andGTX 480GPUs. The chapter also describes the Tegra 2 GPU for PMDs.

Preface ■ xix

Chapter 5 describes multicore processors. It explores symmetric and distributed-memory architectures, examining both organizational principles and performance. The primary additions to this chapter include more comparison of multicore organizations, including the organization of multicore-multilevel caches, multicore coherence schemes, and on-chip multicore interconnect. Topics in synchronization and memory consistency models are next. The example is the Intel Core i7. Readers interested in more depth on interconnection networks should read Appendix F, and those interested in larger scale multiprocessors and scientific applications should read Appendix I.

Chapter 6 describes warehouse-scale computers (WSCs). It was extensively revised based on help from engineers at Google and Amazon Web Services. This chapter integrates details on design, cost, and performance ofWSCs that few archi- tects are aware of. It starts with the popular MapReduce programming model before describing the architecture and physical implementation of WSCs, includ- ing cost. The costs allow us to explain the emergence of cloud computing, whereby it can be cheaper to compute usingWSCs in the cloud than in your local datacenter. The PIAT example is a description of a Google WSC that includes information published for the first time in this book.

Homework is Completed By:

Writer	Writer Name	Amount	Client Comments & Rating
ONLINE	Instant Homework Helper 4.8 4305 Orders Completed	$36	She helped me in last minute in a very reasonable price. She is a lifesaver, I got A+ grade in my homework, I will surely hire her again for my next assignments, Thumbs Up! 5.00
Answer.docx Turnitin Report.pdf Contact Writer For Solution Contact Writer For Solution

Order & Get This Solution Within 3 Hours in $25/Page

Custom Original Solution And Get A+ Grades

100% Plagiarism Free
Proper APA/MLA/Harvard Referencing
Delivery in 3 Hours After Placing Order
Free Turnitin Report
Unlimited Revisions
Privacy Guaranteed

Order Now

Order & Get This Solution Within 6 Hours in $20/Page

Custom Original Solution And Get A+ Grades

100% Plagiarism Free
Proper APA/MLA/Harvard Referencing
Delivery in 6 Hours After Placing Order
Free Turnitin Report
Unlimited Revisions
Privacy Guaranteed

Order Now

Order & Get This Solution Within 12 Hours in $15/Page

Custom Original Solution And Get A+ Grades

100% Plagiarism Free
Proper APA/MLA/Harvard Referencing
Delivery in 12 Hours After Placing Order
Free Turnitin Report
Unlimited Revisions
Privacy Guaranteed

Order Now

6 writers have sent their proposals to do this homework:

Writer	Writer Name	Offer	Chat
ONLINE	Assignment Helper I have assisted scholars, business persons, startups, entrepreneurs, marketers, managers etc in their, pitches, presentations, market research, business plans etc. 4.9 21 Orders Completed	$38	Chat With Writer
ONLINE	Assignment Guru After reading your project details, I feel myself as the best option for you to fulfill this project with 100 percent perfection. 4.7 1134 Orders Completed	$48	Chat With Writer
ONLINE	Phd Writer I reckon that I can perfectly carry this project for you! I am a research writer and have been writing academic papers, business reports, plans, literature review, reports and others for the past 1 decade. 0 Orders Completed	$29	Chat With Writer
ONLINE	Calculation Guru I am an elite class writer with more than 6 years of experience as an academic writer. I will provide you the 100 percent original and plagiarism-free content. 4.4 189 Orders Completed	$22	Chat With Writer
ONLINE	Maths Master I have assisted scholars, business persons, startups, entrepreneurs, marketers, managers etc in their, pitches, presentations, market research, business plans etc. 4.8 1386 Orders Completed	$27	Chat With Writer
ONLINE	Homework Master I have read your project description carefully and you will get plagiarism free writing according to your requirements. Thank You 4.9 1470 Orders Completed	$46	Chat With Writer