Concepts Of Programming Language: Building A Scanner
Programming Language Pragmatics FOURTH EDITION
Michael L. Scott Department of Computer Science
University of Rochester
Morgan Kaufmann is an imprint of Elsevier 225 Wyman Street,Waltham, MA 02451, USA
British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library
Library of Congress Cataloging-in-Publication Data A catalog record for this book is available from the Library of Congress
For information on all MK publications visit our website at http://store.elsevier.com/
ISBN: 978-0-12-410409-9
Copyright © 2016, 2009, 2006, 1999 Elsevier Inc.
About the Author
Michael L. Scott is a professor and past chair of the Department of Computer Sci- ence at the University of Rochester. He received his Ph.D. in computer sciences in 1985 from the University of Wisconsin–Madison. From 2014–2015 he was a Vis- iting Scientist at Google. His research interests lie at the intersection of program- ming languages, operating systems, and high-level computer architecture, with an emphasis on parallel and distributed computing. His MCS mutual exclusion lock, co-designed with John Mellor-Crummey, is used in a variety of commercial and academic systems. Several other algorithms, co-designed with Maged Michael, Bill Scherer, and Doug Lea, appear in the java.util.concurrent standard li- brary. In 2006 he and Dr. Mellor-Crummey shared the ACM SIGACT/SIGOPS Edsger W. Dijkstra Prize in Distributed Computing.
Dr. Scott is a Fellow of the Association for Computing Machinery, a Fellow of the Institute of Electrical and Electronics Engineers, and a member of Usenix, the Union of Concerned Scientists, and the American Association of University Pro- fessors. The author of more than 150 refereed publications, he served as General Chair of the 2003 ACM Symposium on Operating Systems Principles (SOSP) and as Program Chair of the 2007 ACM SIGPLAN Workshop on Transactional Com- puting (TRANSACT), the 2008 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), and the 2012 International Confer- ence on Architectural Support for Programming Languages and Operating Sys- tems (ASPLOS). In 2001 he received the University of Rochester’s Robert and Pamela Goergen Award for Distinguished Achievement and Artistry in Under- graduate Teaching.
Contents
Foreword xxiii
Preface xxv
I FOUNDATIONS 3 1 Introduction 5
1.1 The Art of Language Design 7
1.2 The Programming Language Spectrum 11
1.3 Why Study Programming Languages? 14
1.4 Compilation and Interpretation 17
1.5 Programming Environments 24
1.6 An Overview of Compilation 26 1.6.1 Lexical and Syntax Analysis 28 1.6.2 Semantic Analysis and Intermediate Code Generation 32 1.6.3 Target Code Generation 34 1.6.4 Code Improvement 36
1.7 Summary and Concluding Remarks 37
1.8 Exercises 38
1.9 Explorations 39
1.10 Bibliographic Notes 40
2 Programming Language Syntax 43
2.1 Specifying Syntax: Regular Expressions and Context-Free Grammars 44 2.1.1 Tokens and Regular Expressions 45 2.1.2 Context-Free Grammars 48 2.1.3 Derivations and Parse Trees 50
2.2 Scanning 54 2.2.1 Generating a Finite Automaton 56 2.2.2 Scanner Code 61 2.2.3 Table-Driven Scanning 65 2.2.4 Lexical Errors 65 2.2.5 Pragmas 67
2.3 Parsing 69 2.3.1 Recursive Descent 73 2.3.2 Writing an LL(1) Grammar 79 2.3.3 Table-Driven Top-Down Parsing 82 2.3.4 Bottom-Up Parsing 89 2.3.5 Syntax Errors C 1 . 102
2.4 Theoretical Foundations C 13 . 103 2.4.1 Finite Automata C 13 2.4.2 Push-Down Automata C 18 2.4.3 Grammar and Language Classes C 19
2.5 Summary and Concluding Remarks 104
2.6 Exercises 105
2.7 Explorations 112
2.8 Bibliographic Notes 112
3 Names, Scopes, and Bindings 115
3.1 The Notion of Binding Time 116
3.2 Object Lifetime and Storage Management 118 3.2.1 Static Allocation 119 3.2.2 Stack-Based Allocation 120 3.2.3 Heap-Based Allocation 122 3.2.4 Garbage Collection 124
3.3 Scope Rules 125 3.3.1 Static Scoping 126 3.3.2 Nested Subroutines 127 3.3.3 Declaration Order 130 3.3.4 Modules 135 3.3.5 Module Types and Classes 139 3.3.6 Dynamic Scoping 142
3.4 Implementing Scope C 26 . 144 3.4.1 Symbol Tables C 26 3.4.2 Association Lists and Central Reference Tables C 31
3.5 The Meaning of Names within a Scope 145 3.5.1 Aliases 145 3.5.2 Overloading 147
3.6 The Binding of Referencing Environments 152 3.6.1 Subroutine Closures 153 3.6.2 First-Class Values and Unlimited Extent 155 3.6.3 Object Closures 157 3.6.4 Lambda Expressions 159
3.7 Macro Expansion 162
3.8 Separate Compilation C 36 . 165 3.8.1 Separate Compilation in C C 37 3.8.2 Packages and Automatic Header Inference C 40 3.8.3 Module Hierarchies C 41
3.9 Summary and Concluding Remarks 165
3.10 Exercises 167
3.11 Explorations 175
3.12 Bibliographic Notes 177
4 Semantic Analysis 179
4.1 The Role of the Semantic Analyzer 180
4.2 Attribute Grammars 184
4.3 Evaluating Attributes 187
4.4 Action Routines 195
4.5 Space Management for Attributes C 45 . 200 4.5.1 Bottom-Up Evaluation C 45 4.5.2 Top-Down Evaluation C 50
4.6 Tree Grammars and Syntax Tree Decoration 201
4.7 Summary and Concluding Remarks 208
4.8 Exercises 209
4.9 Explorations 214
4.10 Bibliographic Notes 215
5 Target Machine Architecture C 60 . 217 5.1 The Memory Hierarchy C 61
5.2 Data Representation C 63
5.2.1 Integer Arithmetic C 65 5.2.2 Floating-Point Arithmetic C 67
5.3 Instruction Set Architecture (ISA) C 70 5.3.1 Addressing Modes C 71 5.3.2 Conditions and Branches C 72
5.4 Architecture and Implementation C 75 5.4.1 Microprogramming C 76 5.4.2 Microprocessors C 77 5.4.3 RISC C 77 5.4.4 Multithreading and Multicore C 78 5.4.5 Two Example Architectures: The x86 and ARM C 80
5.5 Compiling for Modern Processors C 88 5.5.1 Keeping the Pipeline Full C 89 5.5.2 Register Allocation C 93
5.6 Summary and Concluding Remarks C 98
5.7 Exercises C 100
5.8 Explorations C 104
5.9 Bibliographic Notes C 105
II CORE ISSUES IN LANGUAGE DESIGN 221 6 Control Flow 223
6.1 Expression Evaluation 224 6.1.1 Precedence and Associativity 226 6.1.2 Assignments 229 6.1.3 Initialization 238 6.1.4 Ordering within Expressions 240 6.1.5 Short-Circuit Evaluation 243
6.2 Structured and Unstructured Flow 246 6.2.1 Structured Alternatives to goto 247 6.2.2 Continuations 250
6.3 Sequencing 252
6.4 Selection 253 6.4.1 Short-Circuited Conditions 254 6.4.2 Case/Switch Statements 256
6.5 Iteration 261
6.5.1 Enumeration-Controlled Loops 262 6.5.2 Combination Loops 266 6.5.3 Iterators 268 6.5.4 Generators in Icon C 107 . 274 6.5.5 Logically Controlled Loops 275
6.6 Recursion 277 6.6.1 Iteration and Recursion 277 6.6.2 Applicative- and Normal-Order Evaluation 282
6.7 Nondeterminacy C 110 . 283 6.8 Summary and Concluding Remarks 284
6.9 Exercises 286
6.10 Explorations 292
6.11 Bibliographic Notes 294
7 Type Systems 297
7.1 Overview 298 7.1.1 The Meaning of “Type” 300 7.1.2 Polymorphism 302 7.1.3 Orthogonality 302 7.1.4 Classification of Types 305
7.2 Type Checking 312 7.2.1 Type Equivalence 313 7.2.2 Type Compatibility 320 7.2.3 Type Inference 324 7.2.4 Type Checking in ML 326
7.3 Parametric Polymorphism 331 7.3.1 Generic Subroutines and Classes 333 7.3.2 Generics in C++, Java, and C# C 119 . 339
7.4 Equality Testing and Assignment 340
7.5 Summary and Concluding Remarks 342
7.6 Exercises 344
7.7 Explorations 347
7.8 Bibliographic Notes 348
8 Composite Types 351
8.1 Records (Structures) 351
8.1.1 Syntax and Operations 352 8.1.2 Memory Layout and Its Impact 353 8.1.3 Variant Records (Unions) C 136 . 357
8.2 Arrays 359 8.2.1 Syntax and Operations 359 8.2.2 Dimensions, Bounds, and Allocation 363 8.2.3 Memory Layout 368
8.3 Strings 375
8.4 Sets 376
8.5 Pointers and Recursive Types 377 8.5.1 Syntax and Operations 378 8.5.2 Dangling References C 144 . 388 8.5.3 Garbage Collection 389
8.6 Lists 398
8.7 Files and Input/Output C 148 . 401 8.7.1 Interactive I/O C 148 8.7.2 File-Based I/O C 149 8.7.3 Text I/O C 151
8.8 Summary and Concluding Remarks 402
8.9 Exercises 404
8.10 Explorations 409
8.11 Bibliographic Notes 410
9 Subroutines and Control Abstraction 411
9.1 Review of Stack Layout 412
9.2 Calling Sequences 414 9.2.1 Displays C 163 . 417 9.2.2 Stack Case Studies: LLVM on ARM; gcc on x86 C 167 . 417 9.2.3 Register Windows C 177 . 419 9.2.4 In-Line Expansion 419
9.3 Parameter Passing 422 9.3.1 Parameter Modes 423 9.3.2 Call by Name C 180 . 433 9.3.3 Special-Purpose Parameters 433 9.3.4 Function Returns 438
9.4 Exception Handling 440
9.4.1 Defining Exceptions 444 9.4.2 Exception Propagation 445 9.4.3 Implementation of Exceptions 447
9.5 Coroutines 450 9.5.1 Stack Allocation 453 9.5.2 Transfer 454 9.5.3 Implementation of Iterators C 183 . 456 9.5.4 Discrete Event Simulation C 187 . 456
9.6 Events 456 9.6.1 Sequential Handlers 457 9.6.2 Thread-Based Handlers 459
9.7 Summary and Concluding Remarks 461
9.8 Exercises 462
9.9 Explorations 467
9.10 Bibliographic Notes 468
10 Data Abstraction and Object Orientation 471
10.1 Object-Oriented Programming 473 10.1.1 Classes and Generics 481
10.2 Encapsulation and Inheritance 485 10.2.1 Modules 486 10.2.2 Classes 488 10.2.3 Nesting (Inner Classes) 490 10.2.4 Type Extensions 491 10.2.5 Extending without Inheritance 494
10.3 Initialization and Finalization 495 10.3.1 Choosing a Constructor 496 10.3.2 References and Values 498 10.3.3 Execution Order 502 10.3.4 Garbage Collection 504
10.4 Dynamic Method Binding 505 10.4.1 Virtual and Nonvirtual Methods 508 10.4.2 Abstract Classes 508 10.4.3 Member Lookup 509 10.4.4 Object Closures 513
10.5 Mix-In Inheritance 516 10.5.1 Implementation 517 10.5.2 Extensions 519
10.6 True Multiple Inheritance C 194 . 521 10.6.1 Semantic Ambiguities C 196 10.6.2 Replicated Inheritance C 200 10.6.3 Shared Inheritance C 201
10.7 Object-Oriented Programming Revisited 522 10.7.1 The Object Model of Smalltalk C 204 . 523
10.8 Summary and Concluding Remarks 524
10.9 Exercises 525
10.10 Explorations 528
10.11 Bibliographic Notes 529
III ALTERNATIVE PROGRAMMING MODELS 533 11 Functional Languages 535
11.1 Historical Origins 536
11.2 Functional Programming Concepts 537
11.3 A Bit of Scheme 539 11.3.1 Bindings 542 11.3.2 Lists and Numbers 543 11.3.3 Equality Testing and Searching 544 11.3.4 Control Flow and Assignment 545 11.3.5 Programs as Lists 547 11.3.6 Extended Example: DFA Simulation in Scheme 548
11.4 A Bit of OCaml 550 11.4.1 Equality and Ordering 553 11.4.2 Bindings and Lambda Expressions 554 11.4.3 Type Constructors 555 11.4.4 Pattern Matching 559 11.4.5 Control Flow and Side Effects 563 11.4.6 Extended Example: DFA Simulation in OCaml 565
11.5 Evaluation Order Revisited 567 11.5.1 Strictness and Lazy Evaluation 569 11.5.2 I/O: Streams and Monads 571
11.6 Higher-Order Functions 576
11.7 Theoretical Foundations C 212 . 580 11.7.1 Lambda Calculus C 214
11.7.2 Control Flow C 217 11.7.3 Structures C 219
11.8 Functional Programming in Perspective 581
11.9 Summary and Concluding Remarks 583
11.10 Exercises 584
11.11 Explorations 589
11.12 Bibliographic Notes 590
12 Logic Languages 591
12.1 Logic Programming Concepts 592
12.2 Prolog 593 12.2.1 Resolution and Unification 595 12.2.2 Lists 596 12.2.3 Arithmetic 597 12.2.4 Search/Execution Order 598 12.2.5 Extended Example: Tic-Tac-Toe 600 12.2.6 Imperative Control Flow 604 12.2.7 Database Manipulation 607
12.3 Theoretical Foundations C 226 . 612 12.3.1 Clausal Form C 227 12.3.2 Limitations C 228 12.3.3 Skolemization C 230
12.4 Logic Programming in Perspective 613 12.4.1 Parts of Logic Not Covered 613 12.4.2 Execution Order 613 12.4.3 Negation and the “Closed World” Assumption 615
12.5 Summary and Concluding Remarks 616
12.6 Exercises 618
12.7 Explorations 620
12.8 Bibliographic Notes 620
13 Concurrency 623
13.1 Background and Motivation 624 13.1.1 The Case for Multithreaded Programs 627 13.1.2 Multiprocessor Architecture 631
13.2 Concurrent Programming Fundamentals 635
13.2.1 Communication and Synchronization 635 13.2.2 Languages and Libraries 637 13.2.3 Thread Creation Syntax 638 13.2.4 Implementation of Threads 647
13.3 Implementing Synchronization 652 13.3.1 Busy-Wait Synchronization 653 13.3.2 Nonblocking Algorithms 657 13.3.3 Memory Consistency 659 13.3.4 Scheduler Implementation 663 13.3.5 Semaphores 667
13.4 Language-Level Constructs 669 13.4.1 Monitors 669 13.4.2 Conditional Critical Regions 674 13.4.3 Synchronization in Java 676 13.4.4 Transactional Memory 679 13.4.5 Implicit Synchronization 683
13.5 Message Passing C 235 . 687 13.5.1 Naming Communication Partners C 235 13.5.2 Sending C 239 13.5.3 Receiving C 244 13.5.4 Remote Procedure Call C 249
13.6 Summary and Concluding Remarks 688
13.7 Exercises 690
13.8 Explorations 695
13.9 Bibliographic Notes 697
14 Scripting Languages 699
14.1 What Is a Scripting Language? 700 14.1.1 Common Characteristics 701
14.2 Problem Domains 704 14.2.1 Shell (Command) Languages 705 14.2.2 Text Processing and Report Generation 712 14.2.3 Mathematics and Statistics 717 14.2.4 “Glue” Languages and General-Purpose Scripting 718 14.2.5 Extension Languages 724
14.3 Scripting the World Wide Web 727 14.3.1 CGI Scripts 728 14.3.2 Embedded Server-Side Scripts 729
14.3.3 Client-Side Scripts 734 14.3.4 Java Applets and Other Embedded Elements 734 14.3.5 XSLT C 258 . 736
14.4 Innovative Features 738 14.4.1 Names and Scopes 739 14.4.2 String and Pattern Manipulation 743 14.4.3 Data Types 751 14.4.4 Object Orientation 757
14.5 Summary and Concluding Remarks 764
14.6 Exercises 765
14.7 Explorations 769
14.8 Bibliographic Notes 771
IV A CLOSER LOOK AT IMPLEMENTATION 773 15 Building a Runnable Program 775
15.1 Back-End Compiler Structure 775 15.1.1 A Plausible Set of Phases 776 15.1.2 Phases and Passes 780
15.2 Intermediate Forms 780 15.2.1 GIMPLE and RTL C 273 . 782 15.2.2 Stack-Based Intermediate Forms 782
15.3 Code Generation 784 15.3.1 An Attribute Grammar Example 785 15.3.2 Register Allocation 787
15.4 Address Space Organization 790
15.5 Assembly 792 15.5.1 Emitting Instructions 794 15.5.2 Assigning Addresses to Names 796
15.6 Linking 797 15.6.1 Relocation and Name Resolution 798 15.6.2 Type Checking 799
15.7 Dynamic Linking C 279 . 800 15.7.1 Position-Independent Code C 280 15.7.2 Fully Dynamic (Lazy) Linking C 282
15.8 Summary and Concluding Remarks 802
15.9 Exercises 803
15.10 Explorations 805
15.11 Bibliographic Notes 806
16 Run-Time Program Management 807
16.1 Virtual Machines 810 16.1.1 The Java Virtual Machine 812 16.1.2 The Common Language Infrastructure C 286 . 820
16.2 Late Binding of Machine Code 822 16.2.1 Just-in-Time and Dynamic Compilation 822 16.2.2 Binary Translation 828 16.2.3 Binary Rewriting 833 16.2.4 Mobile Code and Sandboxing 835
16.3 Inspection/Introspection 837 16.3.1 Reflection 837 16.3.2 Symbolic Debugging 845 16.3.3 Performance Analysis 848
16.4 Summary and Concluding Remarks 850
16.5 Exercises 851
16.6 Explorations 853
16.7 Bibliographic Notes 854
17 Code Improvement C 297 . 857 17.1 Phases of Code Improvement C 299
17.2 Peephole Optimization C 301
17.3 Redundancy Elimination in Basic Blocks C 304 17.3.1 A Running Example C 305 17.3.2 Value Numbering C 307
17.4 Global Redundancy and Data Flow Analysis C 312 17.4.1 SSA Form and Global Value Numbering C 312 17.4.2 Global Common Subexpression Elimination C 315
17.5 Loop Improvement I C 323 17.5.1 Loop Invariants C 323 17.5.2 Induction Variables C 325
17.6 Instruction Scheduling C 328
17.7 Loop Improvement II C 332 17.7.1 Loop Unrolling and Software Pipelining C 332 17.7.2 Loop Reordering C 337
17.8 Register Allocation C 344
17.9 Summary and Concluding Remarks C 348
17.10 Exercises C 349
17.11 Explorations C 353
17.12 Bibliographic Notes C 354
A Programming Languages Mentioned 859
B Language Design and Language Implementation 871
C Numbered Examples 877
Bibliography 891
Index 911
Foreword
Programming languages are universally accepted as one of the core subjects that every computer scientist must master. The reason is clear: these languages are the main notation we use for developing products and for communicating new ideas. They have influenced the field by enabling the development of those multimillion-line programs that shaped the information age. Their success is owed to the long-standing effort of the computer science community in the cre- ation of new languages and in the development of strategies for their implemen- tation. The large number of computer scientists mentioned in the footnotes and bibliographic notes in this book by Michael Scott is a clear manifestation of the magnitude of this effort as is the sheer number and diversity of topics it contains.
Over 75 programming languages are discussed. They represent the best and most influential contributions in language design across time, paradigms, and ap- plication domains. They are the outcome of decades of work that led initially to Fortran and Lisp in the 1950s, to numerous languages in the years that followed, and, in our times, to the popular dynamic languages used to program the Web. The 75 plus languages span numerous paradigms including imperative, func- tional, logic, static, dynamic, sequential, shared-memory parallel, distributed- memory parallel, dataflow, high-level, and intermediate languages. They include languages for scientific computing, for symbolic manipulations, and for accessing databases. This rich diversity of languages is crucial for programmer productivity and is one of the great assets of the discipline of computing.
Cutting across languages, this book presents a detailed discussion of control flow, types, and abstraction mechanisms. These are the representations needed to develop programs that are well organized, modular, easy to understand, and easy to maintain. Knowledge of these core features and of their incarnation in to- day’s languages is a basic foundation to be an effective programmer and to better understand computer science today.
Strategies to implement programming languages must be studied together with the design paradigms. A reason is that success of a language depends on the quality of its implementation. Also, the capabilities of these strategies some- times constraint the design of languages. The implementation of a language starts with parsing and lexical scanning needed to compute the syntactic structure of programs. Today’s parsing techniques, described in Part I, are among the most beautiful algorithms ever developed and are a great example of the use of mathe- matical objects to create practical instruments. They are worthwhile studying just
as an intellectual achievement. They are of course of great practical value, and a good way to appreciate the greatness of these strategies is to go back to the first Fortran compiler and study the ad hoc, albeit highly ingenious, strategy used to implement precedence of operators by the pioneers that built that compiler.
The other usual component of implementation are the compiler components that carry out the translation from the high-level language representation to a lower level form suitable for execution by real or virtual machines. The transla- tion can be done ahead of time, during execution (just in time), or both. The book discusses these approaches and implementation strategies including the elegant mechanisms of translation driven by parsing. To produce highly effi- cient code, translation routines apply strategies to avoid redundant computations, make efficient use of the memory hierarchy, and take advantage of intra-processor parallelism. These, sometimes conflicting goals, are undertaken by the optimiza- tion components of compilers. Although this topic is typically outside the scope of a first course on compilers, the book gives the reader access to a good overview of program optimization in Part IV.
An important recent development in computing is the popularization of paral- lelism and the expectation that, in the foreseeable future, performance gains will mainly be the result of effectively exploiting this parallelism. The book responds to this development by presenting the reader with a range of topics in concurrent programming including mechanisms for synchronization, communication, and coordination across threads. This information will become increasingly impor- tant as parallelism consolidates as the norm in computing.
Programming languages are the bridge between programmers and machines. It is in them that algorithms must be represented for execution. The study of pro- gramming languages design and implementation offers great educational value by requiring an understanding of the strategies used to connect the different as- pects of computing. By presenting such an extensive treatment of the subject, Michael Scott’s Programming Language Pragmatics, is a great contribution to the literature and a valuable source of information for computer scientists.
David Padua Siebel Center for Computer Science
University of Illinois at Urbana-Champaign
Preface
A course in computer programming provides the typical student’s first ex- posure to the field of computer science. Most students in such a course will have used computers all their lives, for social networking, email, games, web brows- ing, word processing, and a host of other tasks, but it is not until they write their first programs that they begin to appreciate how applications work. After gaining a certain level of facility as programmers (presumably with the help of a good course in data structures and algorithms), the natural next step is to wonder how programming languages work. This book provides an explanation. It aims, quite simply, to be the most comprehensive and accurate languages text available, in a style that is engaging and accessible to the typical undergraduate. This aim re- flects my conviction that students will understand more, and enjoy the material more, if we explain what is really going on.
In the conventional “systems” curriculum, the material beyond data struc- tures (and possibly computer organization) tends to be compartmentalized into a host of separate subjects, including programming languages, compiler construc- tion, computer architecture, operating systems, networks, parallel and distributed computing, database management systems, and possibly software engineering, object-oriented design, graphics, or user interface systems. One problem with this compartmentalization is that the list of subjects keeps growing, but the num- ber of semesters in a Bachelor’s program does not. More important, perhaps, many of the most interesting discoveries in computer science occur at the bound- aries between subjects. Computer architecture and compiler construction, for example, have inspired each other for over 50 years, through generations of su- percomputers, pipelined microprocessors, multicore chips, and modern GPUs. Over the past decade, advances in virtualization have blurred boundaries among the hardware, operating system, compiler, and language run-time system, and have spurred the explosion in cloud computing. Programming language tech- nology is now routinely embedded in everything from dynamic web content, to gaming and entertainment, to security and finance.
Increasingly, both educators and practitioners have come to emphasize these sorts of interactions. Within higher education in particular, there is a growing trend toward integration in the core curriculum. Rather than give the typical stu- dent an in-depth look at two or three narrow subjects, leaving holes in all the others, many schools have revised the programming languages and computer or- ganization courses to cover a wider range of topics, with follow-on electives in
various specializations. This trend is very much in keeping with the ACM/IEEE- CS Computer Science Curricula 2013 guidelines [SR13], which emphasize the need to manage the size of the curriculum and to cultivate both a “system-level per- spective” and an appreciation of the interplay between theory and practice. In particular, the authors write,
Graduates of a computer science program need to think at multiple levels of detail and abstraction. This understanding should transcend the implementation details of the various components to encompass an appreciation for the structure of computer systems and the processes involved in their construction and analysis [p. 24].
On the specific subject of this text, they write
Programming languages are the medium through which programmers precisely describe concepts, formulate algorithms, and reason about solutions. In the course of a career, a computer scientist will work with many different languages, separately or together. Software developers must understand the programming models underlying different languages and make informed design choices in languages supporting multiple com- plementary approaches. Computer scientists will often need to learn new languages and programming constructs, and must understand the principles underlying how pro- gramming language features are defined, composed, and implemented. The effective use of programming languages, and appreciation of their limitations, also requires a ba- sic knowledge of programming language translation and static program analysis, as well as run-time components such as memory management [p. 155].
The first three editions of Programming Language Pragmatics (PLP) had the good fortune of riding the trend toward integrated understanding. This fourth edition continues and strengthens the “systems perspective” while preserving the central focus on programming language design.
At its core, PLP is a book about how programming languages work. Rather than enumerate the details of many different languages, it focuses on concepts that underlie all the languages the student is likely to encounter, illustrating those concepts with a variety of concrete examples, and exploring the tradeoffs that ex- plain why different languages were designed in different ways. Similarly, rather than explain how to build a compiler or interpreter (a task few programmers will undertake in its entirety), PLP focuses on what a compiler does to an input pro- gram, and why. Language design and implementation are thus explored together, with an emphasis on the ways in which they interact.
Changes in the Fourth Edition
In comparison to the third edition, PLP-4e includes
1. New chapters devoted to type systems and composite types, in place of the older single chapter on types
2. Updated treatment of functional programming, with extensive coverage of OCaml
3. Numerous other reflections of changes in the field 4. Improvements inspired by instructor feedback or a fresh consideration of fa-
miliar topics
Item 1 in this list is perhaps the most visible change. Chapter 7 was the longest in previous editions, and there is a natural split in the subject material. Reorgani- zation of this material for PLP-4e afforded an opportunity to devote more explicit attention to the subject of type inference, and of its role in ML-family languages in particular. It also facilitated an update and reorganization of the material on parametric polymorphism, which was previously scattered across several differ- ent chapters and sections.
Item 2 reflects the increasing adoption of functional techniques into main- stream imperative languages, as well as the increasing prominence of SML, OCaml, and Haskell in both education and industry. Throughout the text, OCaml is now co-equal with Scheme as a source of functional programming examples. As noted in the previous paragraph, there is an expanded section (7.2.4) on the ML type system, and Section 11.4 includes an OCaml overview, with coverage of equality and ordering, bindings and lambda expressions, type constructors, pattern matching, and control flow and side effects. The choice of OCaml, rather than Haskell, as the ML-family exemplar reflects its prominence in industry, together with classroom experience suggesting that—at least for many students—the initial exposure to functional thinking is easier in the context of eager evaluation. To colleagues who wish I’d chosen Haskell, my apologies!
Other new material (Item 3) appears throughout the text. Wherever appro- priate, reference has been made to features of the latest languages and standards, including C & C++11, Java 8, C# 5, Scala, Go, Swift, Python 3, and HTML 5. Section 3.6.4 pulls together previously scattered coverage of lambda expressions, and shows how these have been added to various imperative languages. Com- plementary coverage of object closures, including C++11’s std::function and std::bind, appears in Section 10.4.4. Section c-5.4.5 introduces the x86-64 and ARM architectures in place of the x86-32 and MIPS used in previous editions. Ex- amples using these same two architectures subsequently appear in the sections on calling sequences (9.2) and linking (15.6). Coverage of the x86 calling sequence continues to rely on gcc; the ARM case study uses LLVM. Section 8.5.3 intro- duces smart pointers. R-value references appear in Section 9.3.1. JavaFX replaces Swing in the graphics examples of Section 9.6.2. Appendix A has new entries for Go, Lua, Rust, Scala, and Swift.
Finally, Item 4 encompasses improvements to almost every section of the text. Among the more heavily updated topics are FOLLOW and PREDICT sets (Section 2.3.3); Wirth’s error recovery algorithm for recursive descent (Sec- tion c-2.3.5); overloading (Section 3.5.2); modules (Section 3.3.4); duck typing (Section 7.3); records and variants (Section 8.1); intrusive lists (removed from the running example of Chapter 10); static fields and methods (Section 10.2.2);
mix-in inheritance (moved from the companion site back into the main text, and updated to cover Scala traits and Java 8 default methods); multicore proces- sors (pervasive changes to Chapter 13); phasers (Section 13.3.1); memory models (Section 13.3.3); semaphores (Section 13.3.5); futures (Section 13.4.5); GIMPLE and RTL (Section c-15.2.1); QEMU (Section 16.2.2); DWARF (Section 16.3.2); and language genealogy (Figure A.1).
To accommodate new material, coverage of some topics has been condensed or even removed. Examples include modules (Chapters 3 and 10), variant records and with statements (Chapter 8), and metacircular interpretation (Chap- ter 11). Additional material—the Common Language Infrastructure (CLI) in particular—has moved to the companion site. Throughout the text, examples drawn from languages no longer in widespread use have been replaced with more recent equivalents wherever appropriate. Almost all remaining references to Pas- cal and Modula are merely historical. Most coverage of Occam and Tcl has also been dropped.
Overall, the printed text has grown by roughly 40 pages. There are 5 more “Design & Implementation” sidebars, 35 more numbered examples, and about 25 new end-of-chapter exercises and explorations. Considerable effort has been invested in creating a consistent and comprehensive index. As in earlier editions, Morgan Kaufmann has maintained its commitment to providing definitive texts at reasonable cost: PLP-4e is far less expensive than competing alternatives, but larger and more comprehensive.
The Companion Site
To minimize the physical size of the text, make way for new material, and al- low students to focus on the fundamentals when browsing, over 350 pages of more advanced or peripheral material can be found on a companion web site: booksite.elsevier.com/web/9780124104099. Each companion-site (CS) section is represented in the main text by a brief introduction to the subject and an “In More Depth” paragraph that summarizes the elided material.
Note that placement of material on the companion site does not constitute a judgment about its technical importance. It simply reflects the fact that there is more material worth covering than will fit in a single volume or a single-semester course. Since preferences and syllabi vary, most instructors will probably want to assign reading from the CS, and most will refrain from assigning certain sections of the printed text. My intent has been to retain in print the material that is likely to be covered in the largest number of courses.
Also included on the CS are pointers to on-line resources and compilable copies of all significant code fragments found in the text (in more than two dozen languages).
Design & Implementation Sidebars
Like its predecessors, PLP-4e places heavy emphasis on the ways in which language design constrains implementation options, and the ways in which antic- ipated implementations have influenced language design. Many of these connec- tions and interactions are highlighted in some 140 “Design & Implementation” sidebars. A more detailed introduction appears in Sidebar 1.1. A numbered list appears in Appendix B.
Numbered and Titled Examples
Examples in PLP-4e are intimately woven into the flow of the presentation. To make it easier to find specific examples, to remember their content, and to refer to them in other contexts, a number and a title for each is displayed in a marginal note. There are over 1000 such examples across the main text and the CS. A detailed list appears in Appendix C.
Exercise Plan
Review questions appear throughout the text at roughly 10-page intervals, at the ends of major sections. These are based directly on the preceding material, and have short, straightforward answers.
More detailed questions appear at the end of each chapter. These are divided into Exercises and Explorations. The former are generally more challenging than the per-section review questions, and should be suitable for homework or brief projects. The latter are more open-ended, requiring web or library research, sub- stantial time commitment, or the development of subjective opinion. Solutions to many of the exercises (but not the explorations) are available to registered in- structors from a password-protected web site: visit textbooks.elsevier.com/web/ 9780124104099.
How to Use the Book
Programming Language Pragmatics covers almost all of the material in the PL “knowledge units” of the Computing Curricula 2013 report [SR13]. The languages course at the University of Rochester, for which this book was designed, is in fact one of the featured “course exemplars” in the report (pp. 369–371). Figure 1 il- lustrates several possible paths through the text.
For self-study, or for a full-year course (track F in Figure 1), I recommend working through the book from start to finish, turning to the companion site as each “In More Depth” section is encountered. The one-semester course at Rochester (track R) also covers most of the book, but leaves out most of the CS
1 In
tr o
2 Sy
nt ax
3 N
am es
4 Se
m an
tic s
5 Ar
ch ite
ct ur
e 6
Co nt
ro l
7 Ty
pe S
ys te
m s
8 Co
m po
sit e T
yp es
9 Su
br ou
tin es
15 C
od eG
en
10 O
bj ec
ts
12 L
og ic
11 F
un ct
io na
l
13 C
on cu
rr en
cy 14
S cr
ip tin
g
17 Im
pr ov
em en
t
16 R
un tim
e
F
R
P
Q
C
2.3.4
2.3.3 9.3.3
2.2
Part I Part II Part III Part IV
15.5
The full-year/self-study plan The one-semester Rochester plan The traditional Programming Languages plan; would also de-emphasize implementation material throughout the chapters shown The compiler plan; would also de-emphasize design material throughout the chapters shown The 1+2 quarter plan: an overview quarter and two independent, optional follow-on quarters, one language-oriented, the other compiler-oriented
Companion site (CS) section
To be skimmed by students in need of review
F: R: P:
C:
Q:
Figure 1 Paths through the text. Darker shaded regions indicate supplemental “In More Depth” sections on the companion site. Section numbers are shown for breaks that do not correspond to supplemental material.
sections, as well as bottom-up parsing (2.3.4), logic languages (Chapter 12), and the second halves of Chapters 15 (Building a Runnable Program) and 16 (Run- time Program Management). Note that the material on functional programming (Chapter 11 in particular) can be taught in either OCaml or Scheme.
Some chapters (2, 4, 5, 15, 16, 17) have a heavier emphasis than others on im- plementation issues. These can be reordered to a certain extent with respect to the more design-oriented chapters. Many students will already be familiar with much of the material in Chapter 5, most likely from a course on computer organization; hence the placement of the chapter on the companion site. Some students may also be familiar with some of the material in Chapter 2, perhaps from a course on automata theory. Much of this chapter can then be read quickly as well, pausing perhaps to dwell on such practical issues as recovery from syntax errors, or the ways in which a scanner differs from a classical finite automaton.
A traditional programming languages course (track P in Figure 1) might leave out all of scanning and parsing, plus all of Chapter 4. It would also de-emphasize the more implementation-oriented material throughout. In place of these, it could add such design-oriented CS sections as multiple inheritance (10.6), Small- talk (10.7.1), lambda calculus (11.7), and predicate calculus (12.3).
PLP has also been used at some schools for an introductory compiler course (track C in Figure 1). The typical syllabus leaves out most of Part III (Chapters 11 through 14), and de-emphasizes the more design-oriented material throughout. In place of these, it includes all of scanning and parsing, Chapters 15 through 17, and a slightly different mix of other CS sections.
For a school on the quarter system, an appealing option is to offer an introduc- tory one-quarter course and two optional follow-on courses (track Q in Figure 1). The introductory quarter might cover the main (non-CS) sections of Chapters 1, 3, 6, 7, and 8, plus the first halves of Chapters 2 and 9. A language-oriented follow- on quarter might cover the rest of Chapter 9, all of Part III, CS sections from Chapters 6 through 9, and possibly supplemental material on formal semantics, type theory, or other related topics. A compiler-oriented follow-on quarter might cover the rest of Chapter 2; Chapters 4–5 and 15–17, CS sections from Chapters 3 and 9–10, and possibly supplemental material on automatic code generation, ag- gressive code improvement, programming tools, and so on.
Whatever the path through the text, I assume that the typical reader has already acquired significant experience with at least one imperative language. Exactly which language it is shouldn’t matter. Examples are drawn from a wide variety of languages, but always with enough comments and other discussion that readers without prior experience should be able to understand easily. Single-paragraph introductions to more than 60 different languages appear in Appendix A. Algo- rithms, when needed, are presented in an informal pseudocode that should be self-explanatory. Real programming language code is set in "typewriter" font. Pseudocode is set in a sans-serif font.
Supplemental Materials
In addition to supplemental sections, the companion site contains complete source code for all nontrivial examples, and a list of all known errors in the book. Additional resources are available on-line at textbooks.elsevier.com/web/ 9780124104099. For instructors who have adopted the text, a password-protected page provides access to
Editable PDF source for all the figures in the book
Editable PowerPoint slides
Solutions to most of the exercises
Suggestions for larger projects
Acknowledgments for the Fourth Edition
In preparing the fourth edition, I have been blessed with the generous assis- tance of a very large number of people. Many provided errata or other feed- back on the third edition, among them Yacine Belkadi, Björn Brandenburg,
Bob Cochran, Daniel Crisman, Marcelino Debajo, Chen Ding, Peter Drake, Michael Edgar, Michael Glass, Sérgio Gomes, Allan Gottlieb, Hossein Hadavi, Chris Hart, Thomas Helmuth, Wayne Heym, Scott Hoge, Kelly Jones, Ahmed Khademzadeh, Eleazar Enrique Leal, Kyle Liddell, Annie Liu, Hao Luo, Dirk Müller, Holger Peine, Andreas Priesnitz, Mikhail Prokharau, Harsh Raju, and Jingguo Yao. I also remain indebted to the many individuals acknowledged in previous editions, and to the reviewers, adopters, and readers who made those editions a success.
Anonymous reviewers for the fourth edition provided a wealth of useful sug- gestions; my thanks to all of you! Special thanks to Adam Chlipala of MIT for his detailed and insightful suggestions on the coverage of functional programming. My thanks as well to Nelson Beebe (University of Utah) for pointing out that com- pilers cannot safely use integer comparisons for floating-point numbers that may be NaNs; to Dan Scarafoni for prompting me to distinguish between FIRST/EPS of symbols and FIRST/EPS of strings in the algorithm to generate PREDICT sets; to Dave Musicant for suggested improvements to the description of deep binding; to Allan Gottlieb (NYU) for several key clarifications regarding Ada semantics; and to Benjamin Kowarsch for similar clarifications regarding Objective-C. Problems that remain in all these areas are entirely my own.
In preparing the fourth edition, I have drawn on 25 years of experience teach- ing this material to upper-level undergraduates at the University of Rochester. I am grateful to all my students for their enthusiasm and feedback. My thanks as well to my colleagues and graduate students, and to the department’s administra- tive, secretarial, and technical staff for providing such a supportive and produc- tive work environment. Finally, my thanks to David Padua, whose work I have admired since I was in graduate school; I am deeply honored to have him as the author of the Foreword.
As they were on previous editions, the staff at Morgan Kaufmann has been a genuine pleasure to work with, on both a professional and a personal level. My thanks in particular to Nate McFadden, Senior Development Editor, who shep- herded both this and the previous two editions with unfailing patience, good hu- mor, and a fine eye for detail; to Mohana Natarajan, who managed the book’s production; and to Todd Green, Publisher, who upholds the personal touch of the Morgan Kauffman imprint within the larger Elsevier universe.
Most important, I am indebted to my wife, Kelly, for her patience and support through endless months of writing and revising. Computing is a fine profession, but family is what really matters.
Michael L. Scott Rochester, NY
August 2015
IFoundations A central premise of Programming Language Pragmatics is that language design and implemen- tation are intimately connected; it’s hard to study one without the other.
The bulk of the text—Parts II and III—is organized around topics in language design, but with detailed coverage throughout of the many ways in which design decisions have been shaped by implementation concerns.
The first five chapters—Part I—set the stage by covering foundational material in both design and implementation. Chapter 1 motivates the study of programming languages, in- troduces the major language families, and provides an overview of the compilation process. Chapter 3 covers the high-level structure of programs, with an emphasis on names, the bind- ing of names to objects, and the scope rules that govern which bindings are active at any given time. In the process it touches on storage management; subroutines, modules, and classes; polymorphism; and separate compilation.
Chapters 2, 4, and 5 are more implementation oriented. They provide the background needed to understand the implementation issues mentioned in Parts II and III. Chapter 2 discusses the syntax, or textual structure, of programs. It introduces regular expressions and context-free grammars, which designers use to describe program syntax, together with the scan- ning and parsing algorithms that a compiler or interpreter uses to recognize that syntax. Given an understanding of syntax, Chapter 4 explains how a compiler (or interpreter) determines the semantics, or meaning of a program. The discussion is organized around the notion of at- tribute grammars, which serve to map a program onto something else that has meaning, such as mathematics or some other existing language. Finally, Chapter 5 (entirely on the companion site) provides an overview of assembly-level computer architecture, focusing on the features of modern microprocessors most relevant to compilers. Programmers who understand these fea- tures have a better chance not only of understanding why the languages they use were designed the way they were, but also of using those languages as fully and effectively as possible.
1Introduction The first electronic computers were monstrous contraptions, filling several rooms, consuming as much electricity as a good-size factory, and costing millions of 1940s dollars (but with much less computing power than even the sim- plest modern cell phone). The programmers who used these machines believed that the computer’s time was more valuable than theirs. They programmed in machine language. Machine language is the sequence of bits that directly controls a processor, causing it to add, compare, move data from one place to another, and so forth at appropriate times. Specifying programs at this level of detail is an enormously tedious task. The following program calculates the greatest commonEXAMPLE 1.1
GCD program in x86 machine language
divisor (GCD) of two integers, using Euclid’s algorithm. It is written in machine language, expressed here as hexadecimal (base 16) numbers, for the x86 instruc- tion set.
55 89 e5 53 83 ec 04 83 e4 f0 e8 31 00 00 00 89 c3 e8 2a 00 00 00 39 c3 74 10 8d b6 00 00 00 00 39 c3 7e 13 29 c3 39 c3 75 f6 89 1c 24 e8 6e 00 00 00 8b 5d fc c9 c3 29 d8 eb eb 90 �
As people began to write larger programs, it quickly became apparent that a less error-prone notation was required. Assembly languages were invented to al- low operations to be