DATABASE
www.itpub.net
T e n t h E d i t i o n
MODERN DATABASE MANAGEMENT
Editorial Director: Sally Yagan Editor in Chief: Eric Svendsen Executive Editor: Bob Horan Editorial Project Manager: Kelly Loftus Editorial Assistant: Jason Calcano Director of Marketing: Patrice Lumumba Jones Marketing Manager: Anne Fahlgren Marketing Assistant: Melinda Jensen Senior Managing Editor: Judy Leale Project Manager: Becca Richter Senior Operations Supervisor: Arnold Vila Operations Specialist: Ilene Kahn Senior Art Director: Jayne Conte
Cover Designer: Suzanne Behnke Cover Art: Fotolia © vuifah Manager, Visual Research: Karen Sanatar Permissions Project Manager: Shannon Barbe Media Project Manager, Editorial: Denise Vaughn Media Project Manager, Production: Lisa Rinaldi Supplements Editor: Kelly Loftus Full-Service Project Management: PreMediaGlobal Composition: PreMediaGlobal Printer/Binder: Edwards Brothers Cover Printer: Lehigh-Phoenix Color/Hagerstown Text Font: Palatino
Credits and acknowledgments borrowed from other sources and reproduced, with permission, in this textbook appear on appropriate page within text.
Microsoft® and Windows® are registered trademarks of the Microsoft Corporation in the U.S.A. and other countries. Screen shots and icons reprinted with permission from the Microsoft Corporation. This book is not sponsored or endorsed by or affiliated with the Microsoft Corporation.
Copyright © 2011, 2009, 2007, 2005, 2002 Pearson Education, Inc., publishing as Prentice Hall, One Lake Street, Upper Saddle River, New Jersey 07458. All rights reserved. Manufactured in the United States of America. This publication is protected by Copyright, and permission should be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. To obtain permission(s) to use material from this work, please submit a written request to Pearson Education, Inc., Permissions Department, One Lake Street, Upper Saddle River, New Jersey 07458.
Many of the designations by manufacturers and seller to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed in initial caps or all caps.
Library of Congress Cataloging-in-Publication Data
Hoffer, Jeffrey A. Modern database management / Jeffrey A. Hoffer, V. Ramesh, Heikki Topi. — 10th ed.
p. cm. Includes index. ISBN 0-13-608839-2 (alk. paper)
1. Database management. I. Ramesh, V. II. Topi, Heikki. III. Title. QA76.9.D3M395 2011 005.74—dc22
2010017419
10 9 8 7 6 5 4 3 2 1
ISBN 10: 0-13-608839-2 ISBN 13: 978-0-13-608839-4
www.itpub.net
T e n t h E d i t i o n
MODERN DATABASE MANAGEMENT
Jeffrey A. Hoffer University of Dayton
V. Ramesh Indiana University
Heikki Topi Bentley University
Prentice Hall Boston Columbus Indianapolis New York San Francisco Upper Saddle River
Amsterdam Cape Town Dubai London Madrid Milan Munich Paris Montreal Toronto Delhi Mexico City São Paulo Sydney Hong Kong Seoul Singapore Taipei Tokyo
To Patty, for her sacrifices, encouragement, and support for over 28 years of being a textbook author widow. To my students and colleagues, for being
receptive and critical and for challenging me to be a better teacher.
—J.A.H.
To Gayathri, for her sacrifices and patience these past 20 years. To my parents for letting me make the journey abroad, and to my cat, Raju, for being a
part of our family for almost 20 years.
—V.R.
To Anne-Louise, for her loving support, encouragement, and patience. To Leila, whose laughter and joy of life continue to teach me about what
is truly important. To my teachers, colleagues, and students, from whom I continue to learn every day.
—H.T.
www.itpub.net
Founding author of Modern Database Management, Fred McFadden, passed away on August 9, 2009. Fred was a dedicated educator for 30 years in the College of Business at the University of Colorado, Colorado Springs. He received his bachelor’s degree in Mechanical Engineering from Michigan State University, his MBA from the University of California, Los Angeles, and his PhD in Industrial Engineering from Stanford University. He began writing Modern Database Management in 1980 and was considered a leading information systems educator in database management, systems analysis, and decision support, all areas in which he was a scholarly author. Fred’s work on the initial design of this textbook was pioneering, as few books existed then to present information technology to business students.
Fred was an inspiration to his students and colleagues. An outstanding communicator with a strong sense of clarity and the needs of students, he was a mentor to his co-authors. Fred’s first concern was always what was best for the students using the book, and he worked tirelessly to make passages succinct, readable, and motivating. He taught through examples and imaginatively told stories with graphics. He was skilled at blending the latest and best industry practices with leading research results into material accessible to all readers, whether undergraduate or graduate students. Fred was encouraging to his co-authors, always prepared to take on any writing assignment, yet never so prideful of his writing as to not accept comments with respect. Fred was actively involved in writing this text through the 8th edition, and he remained a confidant and guide after he ceased active writing.
Besides his professional contributions, Fred more than anything else was a caring, gentle, passionate person. Growing up on a farm in Michigan taught him to love the outdoors and to have a strong sense of caring for his neighbor, whom Fred saw as everyone.
The co-authors of Modern Database Management, 10th edition, are humbled to dedicate this edition to Fred R. McFadden, our friend and colleague.
Fred R. McFadden 1933–2009
This page intentionally left blank
www.itpub.net
BRIEF CONTENTS
PART I The Context of Database Management 1 Chapter 1 The Database Environment and Development Process 2
PART II Database Analysis 55 Chapter 2 Modeling Data in the Organization 57
Chapter 3 The Enhanced E-R Model 113
PART III Database Design 153 Chapter 4 Logical Database Design and the Relational Model 155
Chapter 5 Physical Database Design and Performance 207
PART IV Implementation 241 Chapter 6 Introduction to SQL 243
Chapter 7 Advanced SQL 289
Chapter 8 Database Application Development 335
Chapter 9 Data Warehousing 375
PART V Advanced Database Topics 431 Chapter 10 Data Quality and Integration 433
Chapter 11 Data and Database Administration 461
Chapter 12 Overview: Distributed Databases 512
Chapter 13 Overview: Object-Oriented Data Modeling 516
Chapter 14 Overview: Using Relational Databases to Provide Object Persistence 525
Appendices Appendix A: Data Modeling Tools and Notation 535
Appendix B: Advanced Normal Forms 545
Appendix C: Data Structures 551
Glossary of Acronyms 563
Glossary of Terms 565
Index 573
Available Online at www.pearsonhighered.com/hoffer Chapter 12 Distributed Databases 12-1
Chapter 13 Object-Oriented Data Modeling 13-1
Chapter 14 Using Relational Databases to Provide Object Persistence 14-1
vii
www.pearsonhighered.com/hoffer
This page intentionally left blank
www.itpub.net
ix
CONTENTS
Preface xxv
Part I The Context of Database Management 1 An Overview of Part One 1
Chapter 1 The Database Environment and Development Process 2
Learning Objectives 2
Data Matter! 2
Introduction 3
Basic Concepts and Definitions 5
Data 5
Data Versus Information 6
Metadata 7
Traditional File Processing Systems 8
File Processing Systems at Pine Valley Furniture Company 8
Disadvantages of File Processing Systems 9 PROGRAM-DATA DEPENDENCE 9 DUPLICATION OF DATA 9 LIMITED DATA SHARING 9 LENGTHY DEVELOPMENT TIMES 9 EXCESSIVE PROGRAM MAINTENANCE 9
The Database Approach 10
Data Models 10 ENTITIES 10 RELATIONSHIPS 10
Relational Databases 10
Database Management Systems 11
Advantages of the Database Approach 12 PROGRAM-DATA INDEPENDENCE 13 PLANNED DATA REDUNDANCY 13 IMPROVED DATA CONSISTENCY 13 IMPROVED DATA SHARING 13 INCREASED PRODUCTIVITY OF APPLICATION DEVELOPMENT 13 ENFORCEMENT OF STANDARDS 13 IMPROVED DATA QUALITY 14 IMPROVED DATA ACCESSIBILITY AND RESPONSIVENESS 14 REDUCED PROGRAM MAINTENANCE 14 IMPROVED DECISION SUPPORT 14
Cautions About Database Benefits 15
Costs and Risks of the Database Approach 15 NEW, SPECIALIZED PERSONNEL 15 INSTALLATION AND MANAGEMENT COST AND COMPLEXITY 15 CONVERSION COSTS 15 NEED FOR EXPLICIT BACKUP AND RECOVERY 15 ORGANIZATIONAL CONFLICT 16
Components of the Database Environment 16
x Contents
The Range of Database Applications 17
Personal Databases 18
Two-Tier Client/Server Databases 18
Multitier Client/Server Databases 19
Enterprise Applications 20
Evolution of Database Systems 21
1960s 21
1970s 23
1980s 23
1990s 23
2000 and Beyond 23
The Database Development Process 24
Systems Development Life Cycle 25 PLANNING—ENTERPRISE MODELING 26 PLANNING—CONCEPTUAL DATA MODELING 26 ANALYSIS—CONCEPTUAL DATA MODELING 26 DESIGN—LOGICAL DATABASE DESIGN 26 DESIGN—PHYSICAL DATABASE DESIGN AND DEFINITION 27 IMPLEMENTATION—DATABASE IMPLEMENTATION 27 MAINTENANCE—DATABASE MAINTENANCE 27
Alternative IS Development Approaches 28
Three-Schema Architecture for Database Development 29
Managing the People Involved in Database Development 31
Developing a Database Application for Pine Valley Furniture Company 31
Simplified Project Data Model Example 33
A Current Pine Valley Furniture Company Project Request 35
Project Planning 36
Analyzing Database Requirements 37
Designing the Database 38
Using the Database 41
Administering the Database 42 Summary 42 • Key Terms 43 • Review Questions 44 • Problems and Exercises 45 • Field Exercises 46 • References 47 • Further Reading 47 • Web Resources 48
� CASE: Mountain View Community Hospital 49
Part II Database Analysis 55 An Overview of Part Two 55
Chapter 2 Modeling Data in the Organization 57 Learning Objectives 57
Introduction 57
The E-R Model: An Overview 59
Sample E-R Diagram 59
E-R Model Notation 61
Modeling the Rules of the Organization 62
Overview of Business Rules 63 THE BUSINESS RULES PARADIGM 63
www.itpub.net
Contents xi
Scope of Business Rules 64 GOOD BUSINESS RULES 64 GATHERING BUSINESS RULES 64
Data Names and Definitions 65 DATA NAMES 65 DATA DEFINITIONS 66 GOOD DATA DEFINITIONS 66
Modeling Entities and Attributes 68
Entities 68 ENTITY TYPE VERSUS ENTITY INSTANCE 68 ENTITY TYPE VERSUS SYSTEM INPUT, OUTPUT, OR USER 69 STRONG VERSUS WEAK ENTITY TYPES 69 NAMING AND DEFINING ENTITY TYPES 70
Attributes 72 REQUIRED VERSUS OPTIONAL ATTRIBUTES 72 SIMPLE VERSUS COMPOSITE ATTRIBUTES 73 SINGLE-VALUED VERSUS MULTIVALUED ATTRIBUTES 73 STORED VERSUS DERIVED ATTRIBUTES 74 IDENTIFIER ATTRIBUTE 74 NAMING AND DEFINING ATTRIBUTES 76
Modeling Relationships 77
Basic Concepts and Definitions in Relationships 78 ATTRIBUTES ON RELATIONSHIPS 79 ASSOCIATIVE ENTITIES 80
Degree of a Relationship 81 UNARY RELATIONSHIP 81 BINARY RELATIONSHIP 82 TERNARY RELATIONSHIP 83
Attributes or Entity? 84
Cardinality Constraints 86 MINIMUM CARDINALITY 87 MAXIMUM CARDINALITY 87
Some Examples of Relationships and Their Cardinalities 87 A TERNARY RELATIONSHIP 88
Modeling Time-Dependent Data 89
Modeling Multiple Relationships Between Entity Types 92
Naming and Defining Relationships 93
E-R Modeling Example: Pine Valley Furniture Company 95
Database Processing at Pine Valley Furniture 97
Showing Product Information 97
Showing Product Line Information 98
Showing Customer Order Status 98
Showing Product Sales 100 Summary 100 • Key Terms 101 • Review Questions 101 • Problems and Exercises 102 • Field Exercises 108 • References 109 • Further Reading 109 • Web Resources 110
� CASE: Mountain View Community Hospital 111
Chapter 3 The Enhanced E-R Model 113 Learning Objectives 113
Introduction 113
xii Contents
Representing Supertypes and Subtypes 114
Basic Concepts and Notation 115 AN EXAMPLE OF A SUPERTYPE/SUBTYPE RELATIONSHIP 116 ATTRIBUTE INHERITANCE 117 WHEN TO USE SUPERTYPE/SUBTYPE RELATIONSHIPS 117
Representing Specialization and Generalization 118 GENERALIZATION 118 SPECIALIZATION 119 COMBINING SPECIALIZATION AND GENERALIZATION 120
Specifying Constraints in Supertype/Subtype Relationships 121
Specifying Completeness Constraints 121 TOTAL SPECIALIZATION RULE 121 PARTIAL SPECIALIZATION RULE 122
Specifying Disjointness Constraints 122 DISJOINT RULE 122 OVERLAP RULE 122
Defining Subtype Discriminators 123 DISJOINT SUBTYPES 123 OVERLAPPING SUBTYPES 124
Defining Supertype/Subtype Hierarchies 125 AN EXAMPLE OF A SUPERTYPE/SUBTYPE HIERARCHY 125 SUMMARY OF SUPERTYPE/SUBTYPE HIERARCHIES 126
EER Modeling Example: Pine Valley Furniture Company 127
Entity Clustering 130
Packaged Data Models 133
A Revised Data Modeling Process with Packaged Data Models 135
Packaged Data Model Examples 137 Summary 142 • Key Terms 143 • Review Questions 143 • Problems and Exercises 144 • Field Exercises 147 • References 147 • Further Reading 147 • Web Resources 148
� CASE: Case: Mountain View Community Hospital 149
Part III Database Design 153 An Overview of Part Three 153
Chapter 4 Logical Database Design and the Relational Model 155
Learning Objectives 155
Introduction 155
The Relational Data Model 156
Basic Definitions 156 RELATIONAL DATA STRUCTURE 157 RELATIONAL KEYS 157 PROPERTIES OF RELATIONS 158 REMOVING MULTIVALUED ATTRIBUTES FROM TABLES 158
Sample Database 158
Integrity Constraints 160
Domain Constraints 160
Entity Integrity 160
Referential Integrity 162
www.itpub.net
Contents xiii
Creating Relational Tables 163
Well-Structured Relations 164
Transforming EER Diagrams into Relations 165
Step 1: Map Regular Entities 166 COMPOSITE ATTRIBUTES 166 MULTIVALUED ATTRIBUTES 167
Step 2: Map Weak Entities 167 WHEN TO CREATE A SURROGATE KEY 169
Step 3: Map Binary Relationships 169 MAP BINARY ONE-TO-MANY RELATIONSHIPS 169 MAP BINARY MANY-TO-MANY RELATIONSHIPS 170 MAP BINARY ONE-TO-ONE RELATIONSHIPS 170
Step 4: Map Associative Entities 171 IDENTIFIER NOT ASSIGNED 171 IDENTIFIER ASSIGNED 172
Step 5: Map Unary Relationships 173 UNARY ONE-TO-MANY RELATIONSHIPS 173 UNARY MANY-TO-MANY RELATIONSHIPS 174
Step 6: Map Ternary (and n-ary) Relationships 175
Step 7: Map Supertype/Subtype Relationships 176
Summary of EER-to-Relational Transformations 178
Introduction to Normalization 178
Steps in Normalization 179
Functional Dependencies and Keys 179 DETERMINANTS 181 CANDIDATE KEYS 181
Normalization Example: Pine Valley Furniture Company 182
Step 0: Represent the View in Tabular Form 182
Step 1: Convert to First Normal Form 183 REMOVE REPEATING GROUPS 183 SELECT THE PRIMARY KEY 184 ANOMALIES IN 1NF 184
Step 2: Convert to Second Normal Form 185
Step 3: Convert to Third Normal Form 186 REMOVING TRANSITIVE DEPENDENCIES 186
Determinants and Normalization 187
Step 4: Further Normalization 188
Merging Relations 188
An Example 188
View Integration Problems 189 SYNONYMS 189 HOMONYMS 189 TRANSITIVE DEPENDENCIES 190 SUPERTYPE/SUBTYPE RELATIONSHIPS 190
A Final Step for Defining Relational Keys 190 Summary 192 • Key Terms 194 • Review Questions 194 • Problems and Exercises 195 • Field Exercises 202 • References 202 • Further Reading 202 • Web Resources 202
� CASE: Case: Mountain View Community Hospital 203
xiv Contents
Chapter 5 Physical Database Design and Performance 207 Learning Objectives 207
Introduction 207
The Physical Database Design Process 208
Physical Database Design as a Basis for Regulatory Compliance 209
Data Volume and Usage Analysis 210
Designing Fields 211
Choosing Data Types 212 CODING TECHNIQUES 212 HANDLING MISSING DATA 214
Denormalizing and Partitioning Data 214
Denormalization 214 OPPORTUNITIES FOR AND TYPES OF DENORMALIZATION 215 DENORMALIZE WITH CAUTION 217
Partitioning 218
Designing Physical Database Files 220
File Organizations 221 SEQUENTIAL FILE ORGANIZATIONS 222 INDEXED FILE ORGANIZATIONS 222 HASHED FILE ORGANIZATIONS 225
Clustering Files 227
Designing Controls for Files 228
Using and Selecting Indexes 229
Creating a Unique Key Index 229
Creating a Secondary (Nonunique) Key Index 229
When to Use Indexes 230
Designing a Database for Optimal Query Performance 231
Parallel Query Processing 231
Overriding Automatic Query Optimization 232 Summary 233 • Key Terms 233 • Review Questions 234 • Problems and Exercises 234 • Field Exercises 237 • References 237 • Further Reading 237 • Web Resources 237
� CASE: Mountain View Community Hospital 238
Part IV Implementation 241 An Overview of Part Four 241
Chapter 6 Introduction to SQL 243 Learning Objectives 243
Introduction 243
Origins of the SQL Standard 245
The SQL Environment 246
Defining A Database in SQL 251
Generating SQL Database Definitions 252
Creating Tables 252
Creating Data Integrity Controls 255
Changing Table Definitions 256
Removing Tables 257
www.itpub.net
Contents xv
Inserting, Updating, and Deleting Data 257
Batch Input 258
Deleting Database Contents 259
Updating Database Contents 259
Internal Schema Definition in RDBMSs 260
Creating Indexes 260
Processing Single Tables 261
Clauses of the SELECT Statement 261
Using Expressions 263
Using Functions 264
Using Wildcards 267
Using Comparison Operators 267
Using Null Values 268
Using Boolean Operators 268
Using Ranges for Qualification 271
Using Distinct Values 271
Using IN and NOT IN with Lists 273
Sorting Results: The ORDER BY Clause 274
Categorizing Results: The GROUP BY Clause 275
Qualifying Results by Categories: The HAVING Clause 276
Using and Defining Views 278 MATERIALIZED VIEWS 281 Summary 281 • Key Terms 282 • Review Questions 282 • Problems and Exercises 283 • Field Exercises 286 • References 286 • Further Reading 287 • Web Resources 287
� CASE: Mountain View Community Hospital 288
Chapter 7 Advanced SQL 289 Learning Objectives 289
Introduction 289
Processing Multiple Tables 290
Equi-join 291
Natural Join 292
Outer Join 293
Union Join 295
Sample Join Involving Four Tables 295
Self-Join 297
Subqueries 298
Correlated Subqueries 303
Using Derived Tables 304
Combining Queries 305
Conditional Expressions 307
More Complicated SQL Queries 308
Tips for Developing Queries 310
Guidelines for Better Query Design 311
Ensuring Transaction Integrity 313
xvi Contents
Data Dictionary Facilities 314
SQL:200n Enhancements and Extensions to SQL 317
Analytical and OLAP Functions 317
New Data Types 318
Other Enhancements 319
Programming Extensions 319
Triggers and Routines 320
Triggers 321
Routines 323
Embedded SQL and Dynamic SQL 326 Summary 328 • Key Terms 329 • Review Questions 329 • Problems and Exercises 330 • Field Exercises 333 • References 333 • Further Reading 333 • Web Resources 333
� CASE: Mountain View Community Hospital 334
Chapter 8 Database Application Development 335 Learning Objectives 335
Location, Location, Location! 335
Introduction 336
Client/Server Architectures 336
Partitioning an Application 337
Databases in a Two-Tier Architecture 339
A VB.NET Example 341
A Java Example 343
Three-Tier Architectures 344
Web Application Components 346
Languages for Creating Web Pages 348
Databases in Three-Tier Applications 348
A JSP Web Application 349
A PHP Example 353
An ASP.NET Example 353
Key Considerations in Three-Tier Applications 355
Stored Procedures 356
Transactions 357
Database Connections 359
Key Benefits of Three-Tier Applications 359
Extensible Markup Language (XML) 360
Storing XML Documents 362
Retrieving XML Documents 362
Displaying XML Data 365
XML and Web Services 365 Summary 369 • Key Terms 369 • Review Questions 370 • Problems and Exercises 370 • Field Exercises 371 • References 371 • Further Reading 371 • Web Resources 371
� CASE: Mountain View Community Hospital 373
www.itpub.net
Contents xvii
Chapter 9 Data Warehousing 375 Learning Objectives 375
Introduction 375
Basic Concepts of Data Warehousing 377
A Brief History of Data Warehousing 378
The Need for Data Warehousing 378 NEED FOR A COMPANY-WIDE VIEW 378 NEED TO SEPARATE OPERATIONAL AND INFORMATIONAL SYSTEMS 380
Data Warehousing Success 381
Data Warehouse Architectures 382
Independent Data Mart Data Warehousing Environment 382
Dependent Data Mart and Operational Data Store Architecture: A Three-Level Approach 384
Logical Data Mart and Real-Time Data Warehouse Architecture 386
Three-Layer Data Architecture 389 ROLE OF THE ENTERPRISE DATA MODEL 390 ROLE OF METADATA 390
Some Characteristics of Data Warehouse Data 390
Status Versus Event Data 390
Transient Versus Periodic Data 391
An Example of Transient and Periodic Data 391 TRANSIENT DATA 391 PERIODIC DATA 393 OTHER DATA WAREHOUSE CHANGES 393
The Derived Data Layer 394
Characteristics of Derived Data 394
The Star Schema 395 FACT TABLES AND DIMENSION TABLES 395 EXAMPLE STAR SCHEMA 396 SURROGATE KEY 398 GRAIN OF THE FACT TABLE 398 DURATION OF THE DATABASE 399 SIZE OF THE FACT TABLE 399 MODELING DATE AND TIME 400
Variations of the Star Schema 401 MULTIPLE FACT TABLES 401 FACTLESS FACT TABLES 402
Normalizing Dimension Tables 403 MULTIVALUED DIMENSIONS 403 HIERARCHIES 404
Slowly Changing Dimensions 406
Determining Dimensions and Facts 408
Column Databases: A New Alternative for Data Warehouses 410
The User Interface 411
Role of Metadata 412
SQL OLAP Querying 412
Online Analytical Processing (OLAP) Tools 414 SLICING A CUBE 415 DRILL-DOWN 415 SUMMARIZING MORE THAN THREE DIMENSIONS 415
xviii Contents
Data Visualization 415
Business Performance Management and Dashboards 417
Data-Mining Tools 418 DATA-MINING TECHNIQUES 418 DATA-MINING APPLICATIONS 419 Summary 420 • Key Terms 420 • Review Questions 421 • Problems and Exercises 421 • Field Exercises 425 • References 426 • Further Reading 426 • Web Resources 426
� CASE: Mountain View Community Hospital 428
Part V Advanced Database Topics 431 An Overview of Part Five 431
Chapter 10 Data Quality and Integration 433 Learning Objectives 433
Introduction 433
Data Governance 434
Managing Data Quality 435
Characteristics of Quality Data 436 EXTERNAL DATA SOURCES 437 REDUNDANT DATA STORAGE AND INCONSISTENT METADATA 438 DATA ENTRY PROBLEMS 438 LACK OF ORGANIZATIONAL COMMITMENT 438
Data Quality Improvement 438 GET THE BUSINESS BUY-IN 438 CONDUCT A DATA QUALITY AUDIT 439 ESTABLISH A DATA STEWARDSHIP PROGRAM 440 IMPROVE DATA CAPTURE PROCESSES 441 APPLY MODERN DATA MANAGEMENT PRINCIPLES AND
TECHNOLOGY 441 APPLY TQM PRINCIPLES AND PRACTICES 441
Summary of Data Quality 442
Master Data Management 442
Data Integration: An Overview 443
General Approaches to Data Integration 444 DATA FEDERATION 444 DATA PROPAGATION 444
Data Integration for Data Warehousing: The Reconciled Data Layer 445
Characteristics of Data After ETL 446
The ETL Process 446 MAPPING AND METADATA MANAGEMENT 447 EXTRACT 447 CLEANSE 448 LOAD AND INDEX 450
Data Transformation 452
Data Transformation Functions 452 RECORD-LEVEL FUNCTIONS 452 FIELD-LEVEL FUNCTIONS 453 Summary 455 • Key Terms 455 • Review Questions 456 • Problems and Exercises 456 • Field Exercises 457 • References 457 • Further Reading 458 • Web Resources 458
� CASE: Mountain View Community Hospital 459
www.itpub.net
Contents xix
Chapter 11 Data and Database Administration 461 Learning Objectives 461 Introduction 462
The Roles of Data and Database Administrators 463
Traditional Data Administration 463
Traditional Database Administration 465
Trends in Database Administration 466
Data Warehouse Administration 468
Summary of Evolving Data Administration Roles 469
The Open Source Movement and Database Management 469
Managing Data Security 471
Threats to Data Security 471
Establishing Client/Server Security 473 SERVER SECURITY 473 NETWORK SECURITY 473
Application Security Issues in Three-Tier Client/Server Environments 473 DATA PRIVACY 475
Database Software Data Security Features 476 Views 476 Integrity Controls 477 Authorization Rules 479 User-Defined Procedures 480 Encryption 480 Authentication Schemes 481
PASSWORDS 481 STRONG AUTHENTICATION 482
Sarbanes-Oxley (SOX) and Databases 482 IT Change Management 483 Logical Access to Data 483
PERSONNEL CONTROLS 483 PHYSICAL ACCESS CONTROLS 483
IT Operations 484 Database Backup and Recovery 484
Basic Recovery Facilities 484 BACKUP FACILITIES 484 JOURNALIZING FACILITIES 485 CHECKPOINT FACILITY 485 RECOVERY MANAGER 486
Recovery and Restart Procedures 486 DISK MIRRORING 486 RESTORE/RERUN 487 MAINTAINING TRANSACTION INTEGRITY 487 BACKWARD RECOVERY 488 FORWARD RECOVERY 489
Types of Database Failure 490 ABORTED TRANSACTIONS 490 INCORRECT DATA 490 SYSTEM FAILURE 491 DATABASE DESTRUCTION 491
Disaster Recovery 491
xx Contents
Controlling Concurrent Access 492
The Problem of Lost Updates 492
Serializability 492
Locking Mechanisms 493 LOCKING LEVEL 493 TYPES OF LOCKS 494 DEADLOCK 495 MANAGING DEADLOCK 495
Versioning 496
Data Dictionaries and Repositories 498
Data Dictionary 498
Repositories 498
Overview of Tuning the Database for Performance 500
Installation of the DBMS 500
Memory and Storage Space Usage 501
Input/Output (I/O) Contention 501
CPU Usage 502
Application Tuning 502
Data Availability 503
Costs of Downtime 503
Measures to Ensure Availability 504 HARDWARE FAILURES 504 LOSS OR CORRUPTION OF DATA 504 HUMAN ERROR 504 MAINTENANCE DOWNTIME 504 NETWORK-RELATED PROBLEMS 505 Summary 505 • Key Terms 505 • Review Questions 506 • Problems and Exercises 507 • Field Exercises 509 • References 509 • Further Reading 510 • Web Resources 510
� CASE: Mountain View Community Hospital 511
Chapter 12 Overview: Distributed Databases 512 Learning Objectives 512
Overview 512
Objectives and Trade-offs 513
Options for Distributing a Database 513
Distributed DBMS 514
Query Optimization 514 Chapter Review 515 • References 515 • Further Reading 515 • Web Resources 515
Chapter 13 Overview: Object-Oriented Data Modeling 516 Learning Objectives 516
Overview 516
Unified Modeling Language 517
Object-Oriented Data Modeling 517
Representing Aggregation 523 Chapter Review 523 • References 523 • Further Reading 524 • Web Resources 524
www.itpub.net
Contents xxi
Chapter 14 Overview: Using Relational Databases to Provide Object Persistence 525
Learning Objectives 525
Overview 525
Providing Persistence for Objects Using Relational Databases 526 CALL-LEVEL APPLICATION PROGRAMMING INTERFACES 527 SQL QUERY MAPPING FRAMEWORKS 527 OBJECT-RELATIONAL MAPPING FRAMEWORKS 527 PROPRIETARY APPROACHES 527 SELECTING THE RIGHT APPROACH 528
Object-Relational Mapping Example 529 MAPPING FILES 529
Responsibilities of Object-Relational Mapping Frameworks 532 Summary 533 • Chapter Review 534 • References 534 • Further Reading 534 • Web Resources 534
Appendix A Data Modeling Tools and Notation 535 Comparing E-R Modeling Conventions 535
Visio Professional 2003 Notation 535 ENTITIES 539 RELATIONSHIPS 539
CA ERwin Data Modeler r7.3 Notation 539 ENTITIES 539 RELATIONSHIPS 539
Sybase PowerDesigner 15 Notation 541 ENTITIES 542 RELATIONSHIPS 542
Oracle Designer Notation 542 ENTITIES 542 RELATIONSHIPS 542
Comparison of Tool Interfaces and E-R Diagrams 542
Appendix B Advanced Normal Forms 545 Boyce-Codd Normal Form 545
Anomalies in Student Advisor 545
Definition of Boyce-Codd Normal Form (BCNF) 546
Converting a Relation to BCNF 546
Fourth Normal Form 547
Multivalued Dependencies 549
Higher Normal Forms 549 Key Terms 550 • References 550 • Web Resources 550
Appendix C Data Structures 551 Pointers 551
Data Structure Building Blocks 552
Linear Data Structures 554
Stacks 555
Queues 555
xxii Contents
Sorted Lists 556
Multilists 558
Hazards of Chain Structures 558
Trees 559
Balanced Trees 559 Reference 562
Glossary of Acronyms 563
Glossary 565
Index 573
www.itpub.net
www.pearsonhighered.com/hoffer
xxiii
ONLINE CHAPTERS
Chapter 12 Distributed Databases 12-1 Learning Objectives 12-1 Introduction 12-1
Objectives and Trade-offs 12-4
Options for Distributing a Database 12-6
Data Replication 12-6 SNAPSHOT REPLICATION 12-7 NEAR-REAL-TIME REPLICATION 12-8 PULL REPLICATION 12-8 DATABASE INTEGRITY WITH REPLICATION 12-8 WHEN TO USE REPLICATION 12-8
Horizontal Partitioning 12-9
Vertical Partitioning 12-10
Combinations of Operations 12-11
Selecting the Right Data Distribution Strategy 12-12
Distributed DBMS 12-13
Location Transparency 12-15
Replication Transparency 12-16
Failure Transparency 12-17
Commit Protocol 12-17
Concurrency Transparency 12-18 TIME-STAMPING 12-19
Query Optimization 12-19
Evolution of Distributed DBMSs 12-21
Remote Unit of Work 12-22
Distributed Unit of Work 12-22
Distributed Request 12-23
Distributed DBMS Products 12-23 Summary 12-24 • Key Terms 12-25 • Review Questions 12-25 • Problems and Exercises 12-26 • Field Exercises 12-27 • References 12-28 • Further Reading 12-28 • Web Resources 12-28
Chapter 13 Object-Oriented Data Modeling 13-1 Learning Objectives 13-1 Introduction 13-1 Unified Modeling Language 13-3
Object-Oriented Data Modeling 13-4
Representing Objects and Classes 13-4
Types of Operations 13-6
Representing Associations 13-7
Representing Association Classes 13-10
Representing Derived Attributes, Derived Associations, and Derived Roles 13-12
Representing Generalization 13-12
Interpreting Inheritance and Overriding 13-17
www.pearsonhighered.com/hoffer
xxiv Online Chapters
Representing Multiple Inheritance 13-18
Representing Aggregation 13-19
Business Rules 13-22
Object Modeling Example: Pine ValleyFurniture Company 13-23 Summary 13-25 • Key Terms 13-26 • Review Questions 13-26 • Problems and Exercises 13-29 • Field Exercises 13-35 • References 13-35 • Further Reading 13-36 • Web Resources 13-36
Chapter 14 Using Relational Databases to Provide Object Persistence 14-1
Learning Objectives 14-1 Introduction 14-1 Object-Relational Impedance Mismatch 14-3
Providing Persistence for Objects Using Relational Databases 14-6
Common Approaches 14-6 CALL-LEVEL APPLICATION PROGRAMMING INTERFACES 14-6 SQL QUERY MAPPING FRAMEWORKS 14-7 OBJECT-RELATIONAL MAPPING FRAMEWORKS 14-7 PROPRIETARY APPROACHES 14-7
Selecting the Right Approach 14-8 CALL-LEVEL APIS 14-8 SQL QUERY MAPPING FRAMEWORKS 14-9 ORM FRAMEWORKS 14-9
Object-Relational Mapping Example Using Hibernate 14-10
Foundation 14-10
Mapping Files 14-11
Hibernate Configuration 14-15
Mapping Object-Oriented Structures to a Relational Database 14-16
Class 14-16
Inheritance: Superclass–Subclass 14-17
One-to-One Association 14-17
Many-to-One and One-to-Many Associations 14-17
Aggregation and Composition 14-19
Many-to-Many Associations 14-19
Responsibilities of Object-Relational Mapping Frameworks 14-20
HQL 14-21 Summary 14-25 • Key Terms 14-25 • Review Questions 14-26 • Problems and Exercises 14-26 • Field Exercises 14-27 • References 14-27 • Further Reading 14-27 • Web Resources 14-27
www.itpub.net
PREFACE
xxv
This text is designed to be used with an introductory course in database management. Such a course is usually required as part of an information systems curriculum in business schools, computer technology programs, and applied computer science departments. The Association for Information Systems (AIS), the Association for Computing Machinery (ACM), and the International Federation of Information Processing Societies (IFIPS) curriculum guidelines (e.g., IS 2010) all outline this type of database management course. Previous editions of this text have been used successfully for more than 27 years at both the undergraduate and graduate levels, as well as in management and professional development programs.
WHAT’S NEW IN THIS EDITION?
This 10th edition of Modern Database Management updates and expands materials in areas undergoing rapid change due to improved managerial practices, database design tools and methodologies, and database technology. Later we detail changes to each chapter. The themes of this 10th edition reflect the major trends in the information systems field and the skills required of modern information systems graduates:
• Data quality and database processing accuracy, which are extremely important with the national and international regulations such as the Sarbanes-Oxley Act, Basel II, COSI, and HIPAA that now require organizations to comply with stan- dards for reporting accurate financial data and ensuring data privacy. Material on data quality and master data management has been updated with a stronger coverage of the people, process, and technology aspects and internationally accepted best practices for information systems development and management (specifically, ITIL).
• Integration of data from multiple internal and external databases and data sources, which is now common for building data warehouses and other types of enterprise systems, and dealing with the rapid organizational changes in informa- tion systems brought on by corporate reorganizations, mergers, and acquisitions. These first two bullets are implemented with the revised Chapter 10 on data quality and integration, which updates and improves the focus of the material and introduces the latest principles in these areas.
• Demonstrating knowledge of how to use databases in the context of developing database applications in two and three-tier client/server environments. In this 10th edition (in Chapters 8 and 14), we provide examples of how to connect to databases from popular programming languages such as Java and VB.NET as well Web development languages such as Java Server Pages (JSP), ASP.NET, and PHP. Coverage of XML has also been revised to emphasize the role of XML in data storage and retrieval.
• Linking object-oriented information systems development environments (such as Java Technology and Microsoft .NET) with mainstream technology for maintain- ing organizational data—relational databases—and in the process dealing with significant paradigm differences between object-oriented and relational frame- works. This major change that was introduced for the ninth edition and has been updated for the 10th edition reflects what is a rapidly changing environment for database processing.
Also, we are very excited to now provide on the student Companion Web site sev- eral new, custom-developed short videos that address key concepts and skills from dif- ferent sections of the book. These videos, produced using Camtasia by the textbook au- thors, help students to learn difficult material by using both the printed text and a mini lecture or tutorial. Videos have been developed to support Chapters 1 (introduction to database), 2 and 3 (conceptual data modeling), 4 (normalization), and 6 and 7 (SQL).
xxvi Preface
More will be produced with future editions. Look for special icons on the opening page of these chapters to call attention to these videos, and go to www.pearsonhighered .com/hoffer to find these videos.
Specific improvements to the textbook have been made in the following areas:
• Arranged the Problems and Exercises into roughly increasing order of difficulty to make it easier for instructors and students to select problems and exercises for practice and assignments.
• Applied standard data naming conventions throughout the book to make it easier for students to distinguish data elements from conceptual to physical forms.
• Clarified system requirements through systems modeling and design and out- lined a process to use the increasingly popular industry and business function commercial data models to speed up the systems development process. The new material focuses on changes to the database development process when an organ- ization uses packaged data models. Students are now better prepared to understand why these data models are important and how to read and work with (tailor) them.
• Expanded coverage of SQL, with a few more frequently used components of the language. We have also created new figures to graphically depict the set process- ing logic of SQL queries, which gives students, especially visual learners, new tools to use when writing queries.
• Included new screen captures to reflect the latest database technologies and an updated Web Resources section in each chapter that lists Web sites that can pro- vide the student with information on the latest database trends and expanded background details on important topics covered in the text.
• Reduced the length of the printed book, which we began doing with the eighth edition. The reduced length is more consistent with what our reviewers say can be covered in a database course today, given the need for depth of coverage in the most important topics. Specifically, for the 10th edition, we combined the first two chapters from the ninth edition into one, so that students can more quickly cover/review background topics and then dig into the material central to database management. We have also combined the two chapters from the ninth edition on client/server and Internet databases into one chapter address- ing database issues in a multitier computing environment. We continue to update the chapters on distributed databases, the object-oriented data model, and using relational databases to provide object persistence, including an overview in the printed textbook and full versions on the textbook’s Web site. Care has been given to the layout of figures and tables to also reduce the length of the book, while adding some new figures and figure elements to better link the text narrative with the figures. The reduced length should encourage more students to purchase and read the text, without any loss of coverage and learning. The book is also now available through CourseSmart, an innovative e-book delivery system.
MODERN DATABASE MANAGEMENT: A RETRO AND FUTURE PERSPECTIVE
This 10th edition is a humbling milestone. We are extremely grateful for the support of adopters, reviewers, students, colleagues, editors, and publisher staff who have been with us for some or, in a few cases, all of the past 27 years. Database technology has “grown up” over these years, from a resource for only the most sophisticated organiza- tions to being a mainstay of almost any computing environment. Some topics, such as relational databases, have been a central part of the text from the beginning; other top- ics, such as data warehousing, business intelligence, object-oriented databases, and databases on the Internet, are newer topics. Database management used to be able to be explained in 531 pages that were about 80 percent the size of current pages, and now it takes 624 larger pages (really, we aren’t just wordier). One of the original authors of this text is still a co-author, while a newer generation of database academic
www.itpub.net
www.pearsonhighered.com/hoffer
www.pearsonhighered.com/hoffer
Preface xxvii
experts now contributes to these pages with zest and creativity. The original book authors were educated in fields other than business information systems, whereas today our newer authors are experienced and educated in this rich field central to the success of modern organizations.
As a book that we believe has succeeded in leading the database management textbook market, this book is positioned to continue (in some printed or electronic form) for at least another 27 years. Writing this book has been and remains an awesome responsibility. We authors realize that the course that this text supports will be the foun- dation for student careers with databases. Over the years, we’ve seen students reading our book on airplanes while traveling on business, and, believe it or not, reading it on a Florida beach during spring break. The authors remain committed to presenting mate- rial with sound pedagogy, including topics (both easy and difficult, traditional and emerging) that are critical for the practical success of database professionals, and being informed by research that reveals what will be the “next big thing” in database manage- ment. It is in this spirit that we celebrate our milestone edition, and lay the foundation for many more editions to come.
FOR THOSE NEW TO MODERN DATABASE MANAGEMENT
Modern Database Management has been a leading text since its first edition in 1983. In spite of this market leadership position, some instructors have used other good data- base management texts. Why might you want to switch at this time? There are several good reasons to switch to Modern Database Management, including:
• One of our goals, in every edition, has been to lead other books in coverage of the latest principles, concepts, and technologies. See what we have added for the 10th edition in “What’s New in This Edition.” In the past, we have led in coverage of object-oriented data modeling and UML, Internet databases, data warehousing, and the use of CASE tools in support of data modeling. For the 10th edition, we are taking the lead on database development for Internet-based applications, data quality and integration, the linking of object-oriented development environments with relational databases, and the increasingly important role of packaged data- base model as a component of agile, rapid development of information systems. We also have for the first time Camtasia-produced tutorial videos to accompany the book, with more to come for future editions.
• While remaining current, this text focuses on what leading practitioners say is most important for database developers. We work with many practitioners, including the professionals of the Data Management Association (DAMA) and The Data Warehousing Institute (TDWI), leading consultants, technology leaders, and authors of articles in the most widely read professional publications. We draw on these experts to ensure that what the book includes is important and covers not only important entry-level knowledge and skills, but also those fundamentals and mindsets that lead to long-term career success.
• In this highly successful book in its 10th edition, material is presented in a way that has been viewed as very accessible to students. Our methods have been re- fined through continuous market feedback for over 27 years, as well as through our own teaching. Overall, the pedagogy of the book is sound. We use many illus- trations that help to make important concepts and techniques clear. We use the most modern notations. The organization of the book is flexible, so you can use chapters in whatever sequence makes sense for your students. We supplement the book with data sets to facilitate hands-on, practical learning, and with new media resources to make some of the more challenging topics more engaging.
• You may have particular interest in introducing SQL early in your course. Our text can accommodate this. First, we cover SQL in depth, devoting two full chapters to this core technology of the database field. Second, we include many SQL examples in early chapters. Third, many instructors have successfully used the two SQL chapters early in their course. Although logically appearing in the life cycle of systems development as Chapters 6 and 7, part of the implementation section of the text, many instructors have used these chapters immediately after
xxviii Preface
Chapter 1 or in parallel with other early chapters. Finally, we use SQL through- out the book, for example, to illustrate Web application connections to rela- tional databases in Chapter 8, online analytical processing in Chapter 9, and accessing relational databases from object-oriented development environments in Chapter 14.
• We have the latest in supplements and Web site support for the text. See the supplement package for details on all the resources available to you and your students.
• This text is written to be part of a modern information systems curriculum with a strong business systems development focus. Topics are included and addressed so as to reinforce principles from other typical courses, such as systems analysis and design, networking, Web site design and development, MIS principles, and com- puter programming. Emphasis is on the development of the database component of modern information systems and on the management of the data resource. Thus, the text is practical, supports projects and other hands-on class activities, and encourages linking database concepts to concepts being learned throughout the curriculum the student is taking.
SUMMARY OF ENHANCEMENTS TO EACH CHAPTER
The following sections present a chapter-by-chapter description of the major changes in this edition. Each chapter description presents a statement of the purpose of that chap- ter, followed by a description of the changes and revisions that have been made for the 10th edition. Each paragraph concludes with a description of the strengths that have been retained from prior editions.
Part I: The Context of Database Management CHAPTER 1: THE DATABASE ENVIRONMENT AND DEVELOPMENT PROCESS This chapter discusses the role of databases in organizations and previews the major topics in the remainder of the text. This chapter has undergone extensive reorganization for the 10th edition because it is a consolidation of two previous chapters, allowing students to more quickly cover material that previews the rest of the book. After presenting a brief introduction to the basic terminology associated with storing and retrieving data, the chapter presents a well organized comparison of traditional file-processing systems and modern database technology. The chapter then introduces the core components of a database environment and the range of database applications that are currently in use within organizations—personal, two-tier, multitier, and enterprise applications. The explanation of enterprise databases includes databases that are part of enterprise resource planning systems and data warehouses. A brief history of the evolution of database technology, from pre-database files to modern object-relational technologies, is also presented. The chapter then goes on to explain the process of database develop- ment in the context of structured life cycle, prototyping, and agile methodologies. The presentation remains consistent with the companion systems analysis text by Hoffer, George, and Valacich. The chapter also discusses important issues in database develop- ment, including management of the diverse group of people involved in database development and frameworks for understanding database architectures and technolo- gies (e.g., the three-schema architecture). Reviewers frequently note the compatibility of this chapter with what students learn in systems analysis and design classes.