Pearson RENTAL EDITION - RENTAL ONLY, NOT FOR SALE I
THIRTEENTH EDITION
MODERN DATABASE
JEFFREY A. HOFFER
V. RAMESH
HEIKKI TOPI
Complete Listing of Chapter Opening Cases, Insight Cases, E-commerce in Action Cases, and Case Studies
CHAPTER 1 THE REVOLUTION IS JUST BEGINNING Opening Case: Everything on Demand: The "Uberization" of E-commerce
Insight on Technology: Will Ap ps Make the Web Irrelevant?
Insight on Business: Startup Boot Camp
Insight on Society: Facebook and the Age of Privacy
Case Study: Pinterest: A Picture Is Worth a Thousand Words
CHAPTER 2 E-COMMERCE BUSINESS MODELS AND CONCEPTS Opening Case: Tweet Tweet: Will Twitter Ever Find a Business Model that Works?
Insight on Society: Foursquare: Check Your Privacy at the Door
Insight on Business: Crowdfunding Takes Off
Insight on Technology: Will the Connected Car Become the Next Hot Ente rtainment Vehicle?
Case Study: Dollar Shave Club: From Viral Video to $I Bill ion in Just Five Years
CHAPTER 3 E-COMMERCE INFRASTRUCTURE: THE INTERNET, WEB, AND MOBILE PLATFORM
Opening Case: Voice-Controlled Intell igent Digital Ass istants: Will They Revolutionize E-commerce?
Insight on Society: Government Regu lation and Surveillance of the Internet
Insight on Technology: The Rise ofHTMLS
Insight on Business: The Apple \.Vatch: Bringing The Internet of Things to Your Wrist
Case Study: Akamai Technologies: Attempting to Keep Supply Ahead of Demand
CHAPTER 4 BUILDING AN E-COMMERCE PRESENCE: WEBSITES, MOBILE SITES, AND APPS Opening Case: The Wall Street Jou rnal: Redesign ing for Today's Platforms
Insight on Business: Weebly lviakes Creating \.Yebsites Easy
Insight on Society: Designing for Accessibility
Insight on Technology: Carnival Cruise Ships Go Mobi le
Case Study: Dick's Sporting Goods: Taking Control of Its E-commerce Operations
CHAPTER 5 E-COMMERCE SECURITY AND PAYMENT SYSTEMS Opening Case: Cyberwar: MAD 2.0
Insight on Society: Equifax: Really Big Data Hacked
Insight on Technology: Think Your Smartphone Is Secure?
Insight on Business: Bitcoin
Case Study: The Mobi le Payment lviarketplace: Goat Rodeo
CHAPTER 6 E-COMMERCE MARKETING AND ADVERTISING CONCEPTS Opening Case: Video Ads: Shoot, Click, Buy
Insight on Business: Are the Very Rich Different From You and lvie?
Insight on Technology: The Long Ta il: Big Hits and Big Misses
Insight on Society: Every Move You Take, Every Click You Make, We'll Be Tracking You
Case Study: Programmatic Advertising: Real-Time lviarketing
CHAPTER 7 SOCIAL, MOBILE, AND LOCAL MARKETING
Opening Case: Facebook: Putting Social Marketing to Work
Insight on Technology: Optimizing Social Marketing with Simply Measured
Insight on Society: Marketing to Children of the Web in the Age of Social Networks
Insight on Business: Mobile Marketing Goes 3-D
Case Study: ExchangeHunte rJumper.com: Bui lding a Brand with Social lvlarketing
CHAPTER 8 ETHICAL, SOCIAL, AND POLITICAL ISSUES IN E-COMMERCE
Opening Case: The Right To Be Forgotten: Europe Leads on Internet Privacy
Insight on Technology: Apple: Defender of Privacy?
Insight on Business: Internet Sales Tax Battle
Insight on Society: The Internet Drug Bazaar
Case Study: The Pirate Bay: Searching for a Safe Haven
CHAPTER 9 ONLINE RETAILING AND SERVICES
Opening Case: Blue Nile Sparkles for Your Cleopatra
E-commerce in Action: Amazon
Insight on Technology: Big Data and Predictive Marketing
Insight on Society: Phony Reviews
Insight on Business: Food on Demand: Instacart and GrubHub
Case Study: OpenTable: Your Reservation Is Waiting
CHAPTER 10 ONLINE CONTENT AND MEDIA
Opening Case: Cord Cutte rs and Cord Shavers: The Emerging Internet Broadcasting System (!BS)
Insight on Society: Are lvli llennials Really All That Different?
Insight on Business: Vox: Native Digita l News
Insight on Technology: Hollywood and the Internet: Let's Cut a Deal
Case Study: Netfl ix: How Does This Movie End?
CHAPTER 11 SOCIAL NETWORKS, AUCTIONS, AND PORTALS
Opening Case: Social Network Fever Spreads to the Professions
Insight on Society: The Dark Side of Social Networks
Insight on Technology: Trapped Inside the Facebook Bubble?
Insight on Business: Verizon Doubles Down on Portals
Case Study: eBay Evolves
CHAPTER 12 828 E-COMMERCE: SUPPLY CHAIN MANAGEMENT AND COLLABORATIVE COMMERCE
Opening Case: Amazon Takes on B2B with Amazon Business
Insight on Society: Where's My !Pad? Supply Chain Risk and Vulne rability
Insight on Technology: Your Shoes Are in the Cloud
Insight on Business: Walmart Develops a Private Industrial Network
Case Study: Elemica: Cooperation, Collaboration, and Community
OTHER MIS TITLES OF INTEREST
Introductory MIS
Experienc ing MIS, 8/e Kroenke & Boyle ©2019
Using MIS, 10/e Kroenke & Boyle ©2018
Management Information Systems, 15/e Laudon & Laudon ©2018
Essentials of MIS, 13/e Laudon & Laudon ©2019
Processes, Systems, and Information: An Introduction to MIS, 3/e McKinney & Kroenke ©2019
Information Systems Today, 8/e Valacich & Schneider ©2018
Introduction to Information Systems, 3/ e Wallace ©2018
Database
Hands-on Database, 2/e Conger ©2014
Modern Database Management, 13/e Hoffer, Ramesh & Topi ©2019
Database Concepts, 8/e Kroenke, Auer, Vandenburg, Yoder ©2018
Database Processing, 15/e Kroenke & Auer ©2019
Systems Analysis and Design
Modern Systems Analysis and Design, 8/e Hoffer, George & Valacich ©2017
Systems Analysis and Design, 10/e Kendall & Kendall ©2019
Decision Support Systems
Business Intelligence, Analytics, and Data Science, 4/e Sharda, Delen & Turban ©2018
Business Intelligence and Analytics: Systems for Decision Support, 10/ e Sharda, Delen & Turban ©2014
Data Communications & Networking
Applied Networking Labs, 2/e Boyle ©2014
Digital Business Networks Dooley ©2014
Business Data Networks and Security, 11/e Panko & Panko ©2019
Electronic Commerce
E-commerce 2018: Business. Technology. Society, 14/e Laudon & Traver ©2019
Enterprise Resource Planning
Enterprise Systems for Management, 2/e Motiwalla & Thompson ©2012
Project Management
Project Management: Process, Technology and Practice Vaidyanathan ©2013
THIRTEENTH EDITION
MODERN DATABASE MANAGEMENT
TH I RTEENTH EDITION
MODERN DATABASE MANAGEMENT
1) Pearson •
Jeffrey A. H offer University of Dayton
V. Ramesh Indiana University
Heikki Topi Bentley University
330 Hudson Street, NY NY 10013
Vice Pres ident, IT & Careers: Andrew Gilfillan Senior Portfolio Man ager: Samantha Lewis Managing Prod ucer: Laura Burgess Associate Content Producer: Stephany Harrington Portfolio Management Assis tant: Madeline Houpt Director of Prod uct Marketing: Brad Parkins Product Marketing Man ager: Heather Taylor Product Marketing Assistant Jesika Bethea Field Marketing Manager: Molly Schmid t Field Marketing Assistant: Kelli Fisher Cover Im age: VICTOR HABBICK VISIONS/ Getty Images
Vice President, Prod uct Model Management: Jason Fournier Senior Product Model Manag er: Eric Hakanson Lead, Production and Dig ital Studio: Heather Darby Digital Studio Course Prod ucer: Jaimie Noy Program Monitor: Danica Monzor, SPi Global Full-Service Project Manaf ment:
Neha Bhargava, Cenveo Publisher Services Comp os ition: Cenveo Publisher Services Printer/ Binder: LSC Communications Cover Printer: Phoenix Color Text Font Palatino LT Pro
Credits and acknowledgments borrowed from other sources and reproduced, with permission, in this textbook appear on the appropriate page within text.
Microsoft and / or its respective suppliers make no representations about the suitability of the information contained in the documents and related graphics published as part of the services for any purpose. All such documents and related g raphics are provided "as is" without warranty of any kind. M;crosoft and / or its respective suppliers hereby disclaim all warranties and conditions w ith regard lo this information, including aU warranties and conditions of merchantability, w hether express, implied or s tatutory, fitness for a particular purpose, title and non- infringemenl. In no event shall Microsoft and/ or its respective suppliers be liable for any special, indirect or consequential damages or any damages whatsoever resulting from loss of use, data or profits, whether in an action o f contract, neg ligence or other tortious action, arising out of or in connection w ith the use or performance o f information available from the senrices.
The documents and related graphics contained herein could include technical inaccuracies or typographical errors. Changes are periodically added to the information herein. Microsoft and/or its respective suppliers may make improvements and/ or changes in the product(s) and/ or the program(s) described herein al any time. Partial screen shots may be viewed in full within the software version specified.
Trademarks Microsoft® Windows®, and Microsoft Office® are registered trademarks of the Microsoft Corporation in the U.S.A. and other countries. This book is not sponsored or endorsed by or affiliated with the Microsoft Corporation.
Copyright© 2019, 2016, 2013 by Pearson Education, Inc. All rights reserved. Manufactured in the United States of America. This publication is protected by Copyright, and permission should be obtained from the publisher prior lo any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding permissions, request forms and the appropriate contacts within the Pearson Education Global Rights & Permissions department, please visit www.pearsoned.com I permissions.
Acknowledgements of third party content appear on the appropriate page within the text, which constitutes an extension of this copyright page.
Unless otherwise indicated herein, any third-party trademarks that may appear in this work are the proper ty of their respective owners and any references to third-party trademarks, logos or other trade dress are for demonstrative or descriptive purposes only. Such references are not intended to imply any sponsorship, endorsement, authorization, or promotion of Pearson's products by the owners of such marks, or any relationship between the owner and Pearson Education, Inc. or its affiliates, authors, licensees or distributors.
Library of Congress Cataloging-in-Publication Data
Names: Hoffer, Jeffrey A., author. I Ramesh, V. (Venkataraman), author. I Topi, Heikki, author.
Title: Modern database management I Jeffrey A. Hoffer, University of Dayton, V. Ramesh, Indiana University, Heikki Topi, Bentley University.
Description: Thirteenth edition. I Boston: Pearson Education, [2017) I Includes bibliographical references and index.
Identifiers: LCCN 2017048422 1 ISBN 9780134773650 (alk. paper) I ISBN 0133544613 (alk. paper)
Subjects: LCSH: Database management. Classification: LCC QA76.9.D3 M395 2017 I DOC 005.7~c23 LC record available al https:/ / lccn.Joc.gov / 2017048422
10 9 8 7 6 5 4 3 2 1
1) Pearson • ISBN 10: 0-13-477365-9 ISBN 13: 978-0-13-477365-0
To Patty, for her sacrifices, encouragement, and support for more than 35 years of being a textbook author widow. To my students and colleagues, for being
receptive and critical and for challenging me to be a better teacher.
- J.A.H.
To Gayathri, for her sacrifices and patience these past 25 years. To my parents, for letting me make the journey abroad, and to my cat, Raju, who was a part of our
family for more than 20 years.
- V.R.
To Anne-Louise, for her loving support, encouragement, and patience. To Leila and Saara, whose laughter and joy of life continue to teach me about what is
truly important. To my teachers, colleagues, and students, from whom I continue to learn every day.
- H.T.
BRIEF CONTENTS
Part I The Context of Database Management 1 Chapter 1 The Database Environment and Development Process 3
Part II Database Analysis and Logical Design 53
Chapter 2 Modeling Data in the Organization 55 Chapter 3 The Enhanced E-R Model 115
Chapter 4 Logical Database Design and the Relational Model 153
Part Ill Database Implementation and Use 205
Chapter 5 Introduction to SQL 207
Chapter 6 Advanced SQL 251
Chapter 7 Databases in Applications 297
Chapter 8 Physical Database Design and Database Infrastructure 333
Part IV Advanced Database Topics 385
Chapter 9 Data Warehousing and Data Integration 387
Chapter 10 Big Data Technologies 444 Chapter 11 Analytics and Its Implications 474
Chapter 12 Data and Database Administration w ith Focus on Data Quality 503
Glossary of Acronyms 529 Glossary of Terms 531
Index 539
Available Online at www.pearsonhighered.com/hoffer Chapter 13 Distributed Databases 13-1
Chapter 14 Object-Oriented Data Modeling 14-1
Appendices
Appendix A Data Modeling Tools and Notation A-1 Appendix B Advanced Normal Forms B-1
Appendix C Data Structures C-1
VII
CONTENTS
Preface xxv
Part I The Context of Database Management 1 An Overview of Part I 1
Chapter 1 The Database Envi ronment and Development Process 3 Learning Objectives 3
Data Matter! 4
Introduction 5
Basic Concepts and Definit ions 6
Data 6
Data versus Informat ion 7
Metadat a 8
Traditional File Processing Systems 9
Fi le Processing Systems at Pine Valley Furniture Company 9
Disadvantages of Fi le Processing Systems 10 PROGRAM· DATA DEPENDENCE 10
DUPLICATION OF DATA 10 LIMITED DATA SHARING 10
LENGTHY DEVELOPMENT T IMES 10
EXCESSIVE PROGRAM MAINTENANCE 11
The Database Approach 11
Data Models 11 ENTITIES 11 RELATIONSH IPS 11
Relat ional Databases 12
Database Management Systems 13
Advant ages of t he Database Approach 13 PROGRAM-DATA INDEPENDENCE 13 PLANNED DATA REDUNDANCY 14
IMPROVED DATA CONSISTENCY 14
IMPROVED DATA SHARING 14
INCREASED PRODUCTIVITY OF APPLICATION DEVELOPMENT 14
ENFORCEMENT OF STANDARDS 15
IMPROVED DATA QUALITY 15 IMPROVED DATA ACCESSIBILITY AND RESPONSIVENESS 15
REDUCED PROGRAM MAINTENANCE 16
IMPROVED DECISION SUPPORT 16 CAUTIONS ABOUT DATABASE BENEFITS 16
COSTS AND RISKS OF THE DATABASE APPROACH 16
New, SPECIALIZED PERSONNEL 16 INSTALLATION AND MANAGEMENT (OST AND COMPLEXITY 17
CONVERSION COSTS 17
N EED FOR EXPLICIT BACKUP AND RECOVERY 17 ORGANIZATIONAL CONFLICT 17
Integrated Data Management Framework 17
Components of t he Dat abase Environment 18
IX
x Contents
The Database Development Process 20
Systems Development Life Cycle 21 PLANNING- ENTERPRISE MODELING 21
PLANNING-CONCEPTUAL DATA MODELING 21 ANALYSIS-CONCEPTUAL DATA MODELING 22
DESIGN- LOGICAL DATABASE DESIGN 23 D ESIGN- PHYSICAL DATABASE D ESIGN AND DEFINITION 23 IMPLEMENTATION- DATABASE IMPLEMENTATION 23 MAINTENANCE-DATABASE MAINTENANCE 24
Alternative Information Systems Development Approaches 24
Three-Schema Architecture for Database Development 25
Managing the People Involved in Database Development 27
Evolution of Database Systems 27
1960s 29
1970s 29
1980s 29
1990s 30
2000 and Beyond 30
The Range of Database Applications 30
Personal Databases 31
Departmental Multi-Tiered Client/Server Databases 31
Enterprise Applications 32 ENTERPRISE SYSTEMS 32 DATA WAREHOUSES 33 DATA LAKE 34
Developing a Database Application for Pine Valley Furniture Company 35
Database Evolution at Pine Valley Furniture Company 36
Project Planning 36
Analyzing Database Requirements 37
Designing the Database 40
Using the Database 42
Administering the Database 43
Future of Databases at Pine Valley 43 Summary 44 • Key Terms 45 • Review Questions 45 • Problems and Exercises 46 • Field Exercises 48 • References 49 • Further Reading 49 • Web Resources 50
... CASE: Forondo Artist Management Excellence Inc. 51
Part II Database Analysis and Logical Design 53 An Overview of Part II 53
Chapter 2 Modeling Data in the Organization 55 Learning Objectives 55
Introduction 55
The E-R Model: An Overview 58
Sample E-R Diagram 58
E-R Model Notation 60
Modeling the Rules of the Organization 61
Overview of Business Rules 62 THE BUSINESS RULES PARADIGM 62
Scope of Business Rules 63 GOOD BUSINESS RULES 63
GATHERING BUSINESS RULES 64
Data Names and Definitions 64 DATA N AMES 64 DATA DEFINITIONS 65
GOOD DATA DEFINITIONS 65
Modeling Entit ies and Attributes 67
Ent ities 67 ENTITY TYPE VERSUS ENTITY INSTANCE 67 ENTITY TYPE VERSUS SYSTEM INPUT, OUTPUT, OR USER 67
STRONG VERSUS WEAK ENTITY TYPES 68
NAMING AND DEFINING ENTITY TYPES 69
Attributes 71 REQUIRED VERSUS OPTIONAL ATTRIBUTES 7 1
SIMPLE VERSUS COMPOSITE ATTRIBUTES 72
SINGLE-VALUED VERSUS M ULTIVALUED ATTRIBUTES 72 STORED VERSUS DERIVED ATTRIBUTES 73
IDENTIFIER ATTRIBUTE 73
NAMING AND DEFINING ATTRIBUTES 74
Modeling Relationships 76
Basic Concepts and Def init ions in Relat ionships 77 ATTRIBUTES ON RELATIONSHIPS 78
AsSOCIATIVE ENTITIES 78
Degree of a Relationship 80 UNARY RELATIONSHIP 81 B INARY RELATIONSHIP 82
TERNARY RELATIONSHIP 82
Attributes or Entity? 83
Cardinality Constraints 85 M INIMUM CARDINALITY 85 MAXIMUM CARDINALITY 86
Some Examples of Relationships and Their Cardinalities 86 A TERNARY RELATIONSHIP 87
Modeling Time-Dependent Data 88
Modeling Mult iple Relationships Between Entity Types 90
Naming and Defining Relationships 92 E-R Modeling Example: Pine Val ley Furniture Company 93
Database Processing At Pine Valley Furniture 96 Showing Product Information 96
Showing Product Line Information 96
Showing Customer Order Status 97
Showing Product Sales 98 Summary 99 • Key Terms 100 • Review Questions 100 • Problems and Exercises 101 • Field Exercises 111 • References 112 • Further Reading 112 • Web Resources 112
.,. CASE: ForondoArtist Management Excellence Inc. 11 3
Contents xi
PINE VALLEY FURNITURE
xu Contents
Chapter 3 The Enhanced E-R Model 115 Learning Objectives 115
Introduction 115
Representing Supertypes and Subtypes 116
Basic Concepts and Notation 117 AN EXAMPLE OF A SUPERTYPEISUBTYPE RELATIONSHIP 118
ATTRIBUTE INHERITANCE 119
WHEN TO USE SuPERTYPEISUBTYPE RELATIONSHIPS 119
Representing Specialization and Generalizat ion 120 GENERALIZATION 120
SPECIALIZATION 121
COMBINING SPECIALIZATION AND GENERALIZATION 122
Specifying Constraints in Supertype/Subtype Relationships 123
Specifying Completeness Constraints 123 TOTAL SPECIALIZATION RULE 123
PARTIAL SPECIALIZATION RULE 123
Specifying Disjointness Const raints 124 D ISJOINT RULE 124
OVERLAP RULE 125
Defining Subtype Discriminators 125 D ISJOINT SUBTYPES 12 5
OVERLAPPING SUBTYPES 126
Defining Supertype/Subtype Hierarchies 127 AN EXAMPLE OF A SUPERTYPEISUBTYPE HIERARCHY 128
SUMMARY OF SUPERTYPE/SUBTYPE H IERARCHIES 128
EER Modeling Example: Pine Valley Furniture Company 128
Entity Clustering 132
Packaged Data Models 135
A Revised Data Modeling Process with Packaged Data Models 137
Packaged Data Model Examples 139 Summary 144 • Key Terms 145 • Review Questions 145 • Problems and Exercises 146 • Field Exercises 149 • References 149 • Further Reading 150 • Web Resources 150
.,. CASE: Forondo Artist Management Excellence Inc. 151
Chapter 4 Logical Database Design and the Relat ional Model 153 Learning Objectives 153
Introduction 153
The Relational Data Model 154
Basic Definitions 154 RELATIONAL D ATA STRUCTURE 155
RELATIONAL KEYS 155
PROPERTIES OF RELATIONS 156
REMOVING MULTIVALUED ATTRIBUTES FROM TABLES 156
Sample Database 157
Integrity Constraints 158
Domain Constraints 158
Entity Integrity 158
Referent ial Integrity 160
Creating Relational Tables 161
Well-Structured Relations 162
Transforming EER Diagrams into Relations 163
Step 1: Map Regular Entities 164 COMPOSITE AITRIBUTES 164
MULTIVALUED AITRIBUTES 165
Step 2: Map Weak Entities 165 WHEN TO CREATE A SURROGATE KEY 166
Step 3: Map Binary Relationships 167 MAP BINARY ONE-TO-MANY RELATIONSHIPS 167
MAP BINARY M ANY-TO· MANY RELATIONSHIPS 168
MAP BINARY 0NE-T0· 0NE RELATIONSHIPS 168
Step 4: Map Associative Entities 169 IDENTIFIER NOT ASSIGNED 169
IDENTIFIER AsSIGNED 170
Step 5: Map Unary Relationships 171 UNARY 0NE·TO· M ANY RELATIONSHIPS 171
UNARY M ANY-TO· MANY RELATIONSHIPS 172
Step 6: Map Ternary (and n-ary) Relationships 173
Step 7: Map Supertype/Subtype Relationships 174
Summary of EER-to-Relational Transformations 176
Introduction to Normalization 176
Steps in Normalization 177
Functional Dependencies and Keys 177 DETERMINANTS 179
CANDIDATE KEYS 179
Normalization Example: Pine Val ley Furniture Company 180
Step 0: Represent the View in Tabular Form 180
Step 1: Convert to First Normal Form 181 REMOVE REPEATING GROUPS 181
SELECT THE PRIMARY KEY 182
ANOMALIES IN 1 NF 182
Step 2: Convert to Second Normal Form 183
Step 3: Convert to Third Normal Form 184 REMOVING TRANSITIVE DEPENDENCIES 184
Determinants and Normalization 185
Step 4: Further Normalization 185
Merging Relations 186
An Example 186
View Integration Problems 186 SYNONYMS 187
HOMONYMS 187
TRANSITIVE D EPENDENCIES 187
SuPERTYPE/SuSTYPE RELATIONSHIPS 188
A Final Step for Defining Relational Keys 188 Summary 191 • Key Terms 191 • Review Questions 191 • Problems and Exercises 192 • Field Exercises 201 • References 202 • Further Reading 202 • Web Resources 202
.,. CASE: Forondo Artist Management Excellence Inc. 203
1'
Contents XIII
PINE VALLEY FURNITURE
,- ·-=-- . I I'~
xiv Contents
1' ·--·-C'- • I r~
Part Ill Database Implementation and Use 205 An Overview of Part Ill 205
Chapter 5 Introduction to SQL 207 Learning Objectives 207
Introduction 207
Origins of t he SQL Standard 209
The SQL Environment 211
SQL Data Types 213
Def ining A Dat abase in SQL 216
Generating SQL Database Definitions 216
Creating Tables 217
Creating Data Integrity Controls 220
Changing Table Definitions 221
Removing Tables 221
Inserting, Updating, and Deleting Data 222
Bat ch Input 223
Deleting Database Contents 223
Updating Database Contents 224
Internal Schema Definition in RDBMSs 225
Creating Indexes 225
Processing Single Tables 226
Clauses of the SELECT Statement 226
Using Expressions 228
Using Functions 229
Using Wildcards 232
Using Comparison Operators 232
Using Null Values 233
Using Boolean Operators 233
Using Ranges for Qualif ication 236
Using Distinct Values 236
Using IN and NOT IN w ith Lists 238
Sorting Results: The ORDER BY Clause 239
Cat egorizing Results: The GROUP BY Clause 240
Qualifying Results by Cat egories: The HAVING Clause 241 Summary 243 • Key Terms 243 • Review Questions 243 • Problems and Exercises 244 • Field Exercises 248 • References 248 • Further Reading 249 • Web Resources 249
"" CASE: Forondo Artist Management Excellence Inc. 250
Chapter 6 Advanced SQL 251 Learning Objectives 251
Introduction 251
Processing Multiple Tables 252
Equi-Join 253
Natura l Join 254
Outer Join 255
Sample Join Involving Four Tables 257
Self-Join 258
Subqueries 260
Correlated Subqueries 265
Using Derived Tables 267
Combinings Queries 267
Conditional Expressions 269
More Complicated SQL Queries 270
Tips for Developing Queries 272
Guidelines for Better Query Design 274
Using and Defining Views 275
Materialized Views 279
Triggers and Routines 279
Triggers 280
Routines and Other Programming Extensions 282
Example Routine in Oracle's PUSQL 284
Data Dictionary Faci lities 285
Recent Enhancements and Extensions to SQL 287
Analytical and OLAP Functions 287
New Temporal Features in SQL 288
Other Enhancements 288 Summary 289 • Key Terms 290 • Review Questions 290 • Problems and Exercises 291 • Field Exercises 294 • References 294 • Further Reading 295 • Web Resources 295
.,. CASE: Forondo Artist Management Excellence Inc. 296
Chapter 7 Databases in Applications 297 Learning Objectives 297
Location, Location, Location! 297
Introduction 298
Cl ient/Server Architectures 298
Databases in Three-Tier Applications 302
A Java Web Application 303
A Python Web Application 307
Key Considerations in Three-Tier Applications 313
Stored Procedures 313
Transactions 313
Database Connections 315
Key Benefits of Three-Tier Applications 31 S
Transaction Integrity 316
Controlling Concurrent Access 318
The Problem of Lost Updates 318
Serial izabi lity 319
Locking Mechanisms 319 LOCKING LEVEL 319 TYPES OF LOCKS 320
DEADLOCK 321
MANAGING DEADLOCK 321
Versioning 322
Contents xv
xvi Contents
Managing Data Security in an Application Context 324
Threats t o Data Security 324
Est ablishing Client/Server Security 325 SERVER SECURITY 326
NETWORK SECURITY 326
Application Security Issues in Three-Tier Client/Server Environments 326 DATA PRIVACY 327
Summary 329 • Key Terms 329 • Review Questions 329 • Problems and Exercises 330 • Field Exercises 331 • References 331 • Further Reading 331 • Web Resources 331
,.. CASE: Forondo Artist Management Excellence Inc. 332
Chapter 8 Physical Database Design and Database Infrastructure 333 Learning Objectives 333
Introduction 334
The Physical Database Design Process 335
Who Is Responsible for Physical Database Design? 335
Physical Database Design as a Basis for Regulatory Compliance 336
SOX and Databases 337 IT CHANGE MANAGEMENT 337
LOGICAL ACCESS TO DATA 337
IT OPERATIONS 338
Data Volume and Usage Analysis 338
Designing Fields 340
Choosing Data Types 340 CODING TECHN IQUES 341
CONTROLLING DATA INTEGRITY 342
HANDLING M ISSING DATA 343
Denormalizing and Partitioning Data 343
Denormalization 343 OPPORTUNITIES FOR AND TYPES OF DENORMALIZATION 344
DENORMALIZE W ITH CAUTION 345
Partitioning 347
Designing Physical Database Files 348
Fi le Organizations 350 HEAP FILE ORGANIZATION 350
SEQUENTIAL FILE ORGANIZATIONS 350
INDEXED FILE ORGANIZATIONS 352
HASHED f lLE ORGANIZATIONS 353
Clustering Files 353
Designing Controls for Files 354
Using and Selecting Indexes 354
Creating a Unique Key Index 354
Creating a Secondary (Nonunique) Key Index 355
When t o Use Indexes 355
Designing a Database for Opt imal Query Performance 356
Parallel Query Processing 357
Overriding Automatic Query Optimizat ion 358
Data Dictionaries and Repositories 358
Data Dictionary 359
Repositories 359
Database Software Data Security Features 361
Views 361
Integrity Controls 362
Authorization Rules 363
User-Defined Procedures 365
Encryption 365
Authent ication Schemes 365 PASSWORDS 366 STRONG AUTHENTICATION 366
Database Backup and Recovery 367
Basic Recovery Facilities 367 BACKUP FACILITIES 367
JOURNALIZING FACILITIES 368 CHECKPOINT FACILITY 368
RECOVERY MANAGER 369
Recovery and Rest art Procedures 369 D ISK M IRRORING 369
RESTORE/RERUN 370
BACKWARD RECOVERY 370
FORWARD RECOVERY 371
Types of Database Failure 371 ABORTED TRANSACTIONS 372 INCORRECT DATA 372 SYSTEM FAILURE 372
DATABASE DESTRUCTION 372
Disaster Recovery 373
Cloud-Based Database Infrastructure 373
Cloud-Based Models for Providing Data Management Services 373
Benefits and Downsides of Using Cloud-Based Data Management Services 374 Summary 375 • Key Terms 376 • Review Questions 377 • Problems and Exercises 378 • Field Exercises 382 • References 383 • Further Reading 383 • Web Resources 383
... CASE: Forondo Artist Management Excellence Inc. 384
Part IV Advanced Database Topics 385 An Overview of Part IV 385
Chapter 9 Data Warehousing and Data Integration 387 Learning Objectives 387
Introduction 387
Basic Concepts of Data Warehousing 390
A Brief History of Data Warehousing 390
The Need for Data Warehousing 390 N EED FOR A COMPANY- W IDE V IEW 390
N EED TO SEPARATE OPERATIONAL AND INFORMATIONAL SYSTEMS 393
Data Warehouse Architectures 393
;::-!. . ' • •
rz
Contents xv11
xv111 Contents
Independent Data Mart Data Warehousing Environment 394
Dependent Data Mart and Operational Data Store Architecture: A Three-Level Approach 395
Logical Data Mart and Real-Time Data Warehouse Architecture 397
Three-Layer Data Architecture 400 ROLE OF THE ENTERPRISE DATA MODEL 400 ROLE OF METADATA 400
Some Characteristics of Dat a Warehouse Data 401
Status versus Event Data 401
Transient versus Periodic Data 402
An Example of Transient and Periodic Data 402 TRANSIENT DATA 404
PERIODIC DATA 404 OTHER DATA WAREHOUSE CHANGES 404
The Derived Data Layer 405
Characteristics of Derived Data 405
The Star Schema 406 FACT TABLES AND D IMENSION TABLES 406
EXAMPLE STAR SCHEMA 407 SURROGATE Kev 408
GRAIN OF THE FACT TABLE 409
DURATION OF THE DATABASE 410 SIZE OF THE FACT TABLE 410
MODELING DATE AND T IME 41 1
Variations of the St ar Schema 412 MULTIPLE FACT TABLES 412 FACTLESS FACT TABLES 413
Normalizing Dimension Tables 414 MULTIVALUED DIMENSIONS 414
HIERARCHIES 415
Slowly Changing Dimensions 417
Determining Dimensions and Facts 420
Data Integration: An Overview 422
General Approaches to Dat a Integration 422 DATA FEDERATION 423 DATA PROPAGATION 423
Data Integration for Data Warehousing: The Reconci led Dat a Layer 424
Characteristics of Data after ETL 424
The ETL Process 425 MAPPING AND METADATA MANAGEMENT 425 EXTRACT 426
CLEANSE 427 LOAD AND INDEX 429
Data Transformation 430
Data Transformation Functions 431 RECORD- LEVEL FUNCTIONS 431 FIELD-LEVEL FUNCTIONS 432
Data Warehouse Administrat ion 434
The Future of Data Warehousing: Integration with Other Forms of Data Management and Analytics 434
Speed of Processing 435 Moving the Data Warehouse into the Cloud 435
Dealing w ith Unstructured Data 436 Summary 436 • Key Terms 437 • Review Questions 437 • Problems and Exercises 438 • Field Exercises 442 • References 442 • Further Reading 443 • Web Resources 443
Chapter 10 Big Data Technologies 444 Learning Objectives 444
Introduction 444
Moving Beyond Transactional and Data Warehousing Databases 446
Big Data 446
NoSQL 448 Classif ication of NoSQL DBMSs 450
KEY·VALUE STORES 450
DOCUMENT STORES 451
WIDE· COLUMN STORES 451
G RAPH-O RIENTED DATABASES 451
NoSQL Examples 451 REDIS 451
MONGODB 452
A PACHE CASSANDRA 452
NE04J 452
A NoSQL Example: MongoDB 452 DOCUMENTS 452
COLLECTIONS 454
RELATIONSH IPS 454
QUERYING MONGODB 455
Impact of NoSQL on Database Professionals 456
Hadoop 458
Components of Hadoop 459 THE HADOOP DISTRIBUTED FILE SYSTEM (HDFS) 459
MAPREDUCE 459
PIG 461
HIVE 461
HBASE 462
A Practical Introduction to Pig 462 LOADING DATA 462
TRANSFORMING DATA 463
A Practical Introduction to Hive 465 CREATING A TABLE 465
LOADING DATA INTO THE TABLE 465
PROCESSING THE DATA 466
Integrated Analytics and Data Science Platforms 466 HP H AVEN 466
TERADATA ASTER 467
IBM BIG DATA PLATFORM 469
Contents xix
xx Contents
Putting It Al l Together: Int egrated Data Architecture 469 Summary 471 • Key Terms 471 • Review Questions 471 • Problems and Exercises 472 • References 472 • Further Reading 473 • Web Resources 473
Chapter 11 Analytics and Its Implications 474 Learning Objectives 474
Introduction 474
Analytics 475
Types of Analytics 475
Use of Descript ive Analytics 477 SQL OLAP QUERYING 478 OLAP TOOLS 480 DATA V ISUALIZATION 482
BUSINESS PERFORMANCE MANAGEMENT AND DASHBOARDS 483
Use of Predictive Analytics 484 DATA M INING TOOLS 485 EXAMPLES OF PREDICTIVE ANALYTICS 486
Use of Prescript ive Analytics 487
Key User Tools for Analytics 488 ANALYTICAL AND OlAP FUNCTIONS 489
R 490 PYTHON 491
APACHE SPARK 492
Data Management Inf rastructure for Analytics 493
Impact of Big Data and Analytics 495
Applications of Big Data and Analytics 495 BUSINESS 496 E-GOVERNM ENT AND POLITICS 496
SCIENCE AND TECHNOLOGY 496
SMART HEALTH AND WELL-B EING 497 SECURITY AND PUBLIC SAFETY 497
Implicat ions of Big Data Analytics and Decision Making 497 PERSONAL PRIVACY VERSUS COLLECTIVE BEN EFITS 498
OWNERSHIP AND Access 498
QUALITY AND R EUSE OF DATA AND A LGORITHMS 498 TRANSPARENCY AND VALIDATION 498
(HANGING N ATURE OF WORK 499 DEMANDS FOR WORKFORCE CAPABILITIES AND EDUCATION 499
Summary 499 • Key Terms 500 • Review Questions 500 • Problems and Exercises 500 • References 501 • Further Reading 502
Chapter 12 Data and Database Administration w ith Focus on Data Quality 503
Learning Objectives 503
Introduction 503
Overview of Data and Database Administration 505
Data Administ rat ion 505
Database Administration 506
TRADITIONAL DATABASE ADMINISTRATION 506
TRENDS IN DATABASE ADMINISTRATION 508
Evolving Data Administration Roles 510
The Open Source Movement and Database Management 511 Data Governance 512
Managing Data Quality 513 Characteristics of Quality Data 514
EXTERNAL DATA SOURCES 515 REDUNDANT D ATA STORAGE AND INCONSISTENT METADATA 516 DATA ENTRY PROBLEMS 516 LACK OF ORGANIZATIONAL COMMITMENT 516
Data Quality Improvement 516 GET THE BUSINESS 8UY· IN 516 CONDUCT A DATA QUALITY AUDIT 517 ESTABLISH A DATA STEWARDSHIP PROGRAM 518
IMPROVE DATA CAPTURE PROCESSES 518 APPLY MODERN DATA MANAGEMENT PRINCIPLES AND TECHNOLOGY 519
APPLY TQM PRINCIPLES AND PRACTICES 519
Summary of Data Quality 519
Data Availabil ity 520
Costs of Downtime 520 Measures to Ensure Availabil ity 521
HARDWARE FAILURES 521 Loss OR CORRUPTION OF D ATA 521
HUMAN ERROR 521 MAINTENANCE DOWNTIME 521 N ETWORK-RELATED PROBLEMS 521
Master Data Management 521 Summary 523 • Key Terms 523 • Review Questions 524 • Problems and Exercises 524 • Field Exercises 526 • References 526 • Further Reading 527 • Web Resources 527
Glossary of Acronyms 529
Glossary of Terms 531
Index 539
Contents xxi
xxii Online Chapters
ONLINE CHAPTERS
Chapter 13 Distributed Databases 13-1 Learning Objectives 13-1
Introduction 13-1
Objectives and Trade-Offs 13-4
Options for Distributing a Database 13-6
Data Replication 13-6 SNAPSHOT REPLICATION 13-7
N EAR-REAL-TIME REPLICATION 13·8
PULL REPLICATION 13-8
DATABASE INTEGRITY W ITH REPLICATION 13-8
WHEN TO Use REPLICATION 13-9
Horizontal Partitioning 13-9
Vertical Partitioning 13-10
Combinations of Operations 13-11
Selecting the Right Data Distribution Strategy 13-12
Distributed DBMS 13-13
Location Transparency 13-1 S
Replication Transparency 13-16
Failure Transparency 13-17
Commit Protocol 13-17
Concurrency Transparency 13-18 TIME STAMPING 13-19
Query Optimization 13-19
Evolution of Distributed DBMSs 13-22 REMOTE UNIT OF WORK 13-22
D ISTRIBUTED UNIT OF WORK 13-22
D ISTRIBUTED REQUEST 13-23
Summary 13-23 • Key Terms 13-24 • Review Questions 13-24 • Problems and Exercises 13-25 • Field Exercises 13-27 • References 13-27 • Further Reading 13-27 • Web Resources 13-27
Chapter 14 Object-Oriented Data Modeling 14-1 Learning Objectives 14-1
Introduction 14-1
Unified Modeling Language 14-3
Object-Oriented Data Modeling 14-4
Representing Objects and Classes 14-4
Types of Operations 14-7
Representing Associations 14-7
Representing Association Classes 14-11
Representing Derived Attributes, Derived Associations, and Derived Roles 14-12
Representing Generalization 14-13
Interpret ing Inheritance and Overriding 14-18
Representing Multiple Inheritance 14-19
Representing Aggregation 14-19
Business Rules 14-22
www.pearsonhighered.com/ hoffer xx111
Object Modeling Example: Pine Valley Furniture Company 14-23 Summary 14-25 • Key Terms 14-26 • Review Questions 14-26 • Problems and Exercises 14-30 • Field Exercises 14-37 • References 14-37 • Further Reading 14-38 • Web Resources 14-38
Appendix A Data Modeling Tools and Notation A-1 Comparing E-R Modeling Conventions A-1
Visio Professiona I 2016 Notation A-1 ENTITIES A-5
RELATIONSHIPS A-5
CA ERwin Data Modeler 9.7 Notation A-5 ENTITIES A·S RELATIONSHIPS A-5
SAP Sybase PowerDesigner 16.6 Notation A-7 ENTITIES A·8 RELATIONSH IPS A·8
Oracle Designer Notation A-8 ENTITIES A·8
RELATIONSHIPS A·8
Comparison of Tool Interfaces and E-R Diagrams A-8
Appendix B Advanced Normal Forms 8·1 Boyce-Codd Normal Form B-1
Anomalies in Student Advisor B-1
Definition of Boyce-Codd Normal Form (BCNF) B-2
Converting a Relation to BCNF B-2
Fourth Normal Form B-3
Multivalued Dependencies B-5
Higher Normal Forms B-S Key Terms 8-6 • References 8-6 • Web Resources 8-6
Appendix C Data Structures C-1 Pointers C-1
Data Structure Building Blocks C-2
Linear Data Structures C-4
Stacks C-5
Queues C-S
Sorted Lists C-6
Multilists C-8
Hazards of Chain Structures C-8
Trees C-9
Balanced Trees C-9 References C-12
PREFACE
This text is designed for introductory courses in database management. Such a course is usually required as part of an information systems curriculum in business schools, comp uter technology programs, and applied computer science departments. The Association for Information Systems (AIS), the Association for Computing Machinery (ACM), and the International Federation of Information Processing Societies (IFIPS) curricul um guidelines (e.g., IS 2010 and MSIS 2016) all outline this type of database management course or the competencies a student completing the course is expected to have. Previous editions of this text have been used successfully for more than 35 years at both the undergrad uate and graduate levels as well as in management and professional development programs.
WHATS NEW IN THIS EDITION?
This 13th edition of Modern Database Managernent up dates and expands materials in areas undergoing rapid change as a result of improved managerial practices, database design tools and methodologies, and database technology. Later, we detail changes to each chapter. The themes of this 13th ed ition reflect the major trends in the information systems field and the skills required of modern information systems grad uates. The most important changes are as follows:
• The book has been restructured in several important ways. Chapter 7 on databases in applications now also includes segments on transaction integ- ri ty, designing multi-user solutions, and application level security, b ringing these important perspectives together w ith their context. The revised chap- ter on physical database design and database infrastructure (new Chapter 8) includes also coverage of database secu rity, backup and recovery, cloud -based database solutions, and other essential database infrastructure topics. Th is new comprehensive structure on physical design and infrastructure is now placed after the SQL ch apters. The new version of Chapter 9 integrates mate- rial on data warehousing and data integrity in a conceptually natural pair- ing. Recognizing the way in which analytics capabili ties rely on all types of data management solutions, Chapter 11, on analytics and implications, is now separate from Chapter 10, on big data. Finally, Chapter 12 brings together data and database administration with data quality, emphasizing the essential connections between the three.
• The part structure of the book has been redesigned to be fully aligned with the new chapter structure.
• We have introduced a new overarching framework (Fig ure 1-5), which gives our readers a clearer overview of structure of the book and its core topic areas. The framework communicates clearly the increasing importance of informational systems (divided into Analytics- Data Warehousing and Analytics- Big Data) in addition to this book's traditional strength of transactional systems.
• Given the continued and still increasing interest in big data and analytics, we have continued to expand content in this area. The book has now separate chapters on big data technologies (Chapter 10) and analytics (Chapter 11). In addition to general coverage of NoSQL and Hadoop technologies, Chapter 10 provides also d etailed examples of MongoDB, Pig, and Hive. Chapter 11 includes extended coverage of R, Python, and Apache Spark- all essential technologies for analytics professionals that allow a link between analytics and data management architectures.
• We emphasize the increasing importance of cloud-based database solutions, mobile technologies, and agile development throughout the book.
• Chapter 1 now better recognizes the broad range of enterprise level applications data management solutions enable and support, including enterprise systems, data warehouses, and data lakes.
xxv
xxvi Preface
• Chapter 7 on databases in applications now includes an extensive example dem- onstrating the use of Python in the context of database-driven applications.
• The instructor's manual will have more material to support the case Forondo Artist Management Excellence that was introduced in the 12th edition.
In addition to the new topics covered, specific improvements to the textbook have been made in the following areas:
• Every chapter went through significant edits to streamline coverage to ensure rel- evance with current technologies and eliminate redundancies.
• The entire book has been edited so that its language clearly reflects its focus on the readers as learners instead of authors as teachers
• End-of-chapter material (review questions, problems and exercises, and/or field exercises) in every chapter has been revised with new and modified questions and exercises.
• We continued to update the figures in several chapters to reflect the changing landscape of technologies that are being used in modern organizations.
• The Web Resources section in each chapter was updated to ensure that students have information on the latest database trends and expanded background details on important topics covered in the text.
• The book continues to be available through VitalSource, an innovative e-book delivery system, and as an electronic book in the Kindle format.
Also, we continue to provide on the student Companion Web site several custom-developed short videos that address key concepts and skills from different sections of the book. These videos, produced by the textbook authors, help students learn difficult material by using both the printed text and a mini-lecture or tutorial. Videos have been developed to support Chapters 1 (introduction to database), 2 and 3 (conceptual data modeling), 4 (normalization), and 6 and 7 (SQL). Look for special icons on the opening page of these chapters to call attention to these videos, and go to www.pearsonhighered.com/hoffer to find these videos.
FOR THOSE NEW TO MODERN DATABASE MANAGEMENT
Modern Database Manngernent has been a leading text since its first edition in 1983. In spite of this market leadership position, some instructors have used other good data- base management texts. Why might you want to switch at this time? There are several good reasons:
• One of our goals, in every edition, has been to lead other books in coverage of the latest principles, concepts, and technologies. See what we have added for the 13th edition in ''What's New in This Edition?" In the past, we have Jed in coverage of object-oriented data modeling and UML, Internet databases, data warehous- ing, and the use of CASE tools in support of data modeling. For the 13th edition, we continue this tradition by continuing to expand and improve coverage of big data and analytics, focusing on what every database student needs to understand about these topics.
• While remaining current, this text focuses on what leading practitioners say is most important for database developers. We work with many practitioners, including the professionals of the Data Management Association (DAMA) and The Data Warehousing Institute (TDWI), leading consultants, technology leaders, and authors of articles in the most widely read professional publications. We draw on these experts to ensure that what the book includes is important and covers not only important entry-level knowledge and skills but also those fundamentals and mind-sets that lead to long-term career success.
• In the 13th edition of this highly successful book, material is presented in a way that has been viewed as very accessible to students. Our methods have been refined through continuous market feedback for more than 35 years as well as through our own teaching. Overall, the pedagogy of the book is sound, and we believe that the new framework that we introduced in Chapter 1 will htrther strengthen our students'
understanding of the big picture of data management. We use many illustrations that help make important concepts and techniques clear. We use the most modem nota- tions. The organization of the book is flexible, so you can use chapters in whatever sequence makes sense for your students. We supplement the book with data sets to facilitate hands-on, practical learning and with new media resources to make some of the more challenging topics more engaging.
• Our text can accommodate structural flexibility. For example, you may have partic- ular interest in introducing SQL early in your course. Our text makes this possible. First, we cover SQL in depth, devoting two full chapters to this core technology of the database field. Second, we include many SQL examples in early chapters. Third, many instructors have successfully used the two SQLchapters early in their course. Although logically appearing in the life cycle of systems development as Chapters 5 and 6, part of the implementation section of the text, many instructors have used these chapters immediately after Chapter 1 or in parallel with other early chapters. Finally, we use SQL throughout the book, for example, to illustrate Web application connections to relational databases in Chapter 7 and online ana- lytical processing in Chapter 11.
• We have the latest in supplements and Web site support for the text. See the sup- plement package for details on all the resources available to you and your students.
• This text is written to be part of a modern information systems curriculum with a strong business systems development focus. Topics are included and addressed so as to reinforce principles from other typical courses, such as systems analysis and design, networking, Web site design and development, MIS principles, and appli- cation development. Emphasis is on the development of the database component of modern information systems and on the management of the data resource. Thus, the text is practical, supports projects and other hands-on class activities, and encourages linking database concepts to concepts being learned throughout the curriculum the student is taking.
SUMMARY OF ENHANCEMENTS TO EACH CHAPTER
The following sections present a chapter-by-chapter description of the major changes in this edition. Each chapter description presents a statement of the purpose of that chapter, followed by a description of the changes and revisions that have been made for the 13th ed ition. Each paragraph concludes with a description of the strengths that have been retained from prior editions.
PART I: THE CONTEXT OF DATABASE MANAGEMENT
Chapter 1: The Database Environment and Development Process This chapter d iscusses the role of databases in organizations and previews the major topics in the remainder of the text. The primary change to this chapter has been the introduction of a new integrated data management framework (Figure 1-5) and sup- porting text accompanying it. This framework recognizes the increasing importance of the infonnational systems in addition to the trad itional focus of this book on transac- tional systems. After presenting a brief introduction to the basic terminology associated with storing and retrieving data, the chapter presents a well-organized comparison of traditional file processing systems and modern database technology. The chapter then introduces the core components of a database environment. It then goes on to explain the process of database development in the context of structured life cycle, prototyp- ing, and agile methodologies. The chapter also discusses important issues in data- base development, including management of the d iverse group of people involved in database development and frameworks for understanding database architectures and technologies (e.g., the three-schema architecture). Reviewers frequently note the compatibility of this chapter with what students learn in systems analysis and design classes. A brief history of the evolution of database technology, from pre-database files to modern object-relational technologies, is presented. The chapter also provides
Preface xxvi i
xxvii i Preface
an overview of the range of database applications that are currently in use within organizations- personal, multi-tier, and enterprise applications. The explanation of enterprise databases includes databases that are part of enterprise resource planning systems and data warehouses. The chapter concludes with a description of the process of developing a database in a fictitious company, Pine Valley Furniture. This descrip- tion closely mirrors the steps in database development described earlier in the chapter. The first chapter provides an introd uction to the FAME case, which then continues through the book until Chapter 8.
PART II: DATABASE ANALYSIS AND LOGICAL DESIGN
Chapter 2: Modeling Data in the Organization This chapter presents a thorough introd uction to conceptual data modeling with the entity-relationship (E-R) model. The chapter title emphasizes the reason for the E-R model: to unambiguously document the rules of the b usiness that influence database design. Specific subsections explain in detail how to name and define elements of a data model, which are essential in developing an unambiguous E-R diagram. The chapter continues to proceed from simple to more complex examples, and it concludes with a comprehensive E-R d iagram for the Pine Valley Furniture Company. In the 13th ed ition, we have provided six new problems and exercises; these new exercises present some more modern situations, such as Internet of Things applications for databases. A variety of other problems and exercises as well as review questions have been changed to emphasize important topics of the chapter. Appendix A provides information on dif- ferent data modeling tools and notations.
Chapter 3: The Enhanced E-R Model This chapter presents a discussion of several advanced E-R data model constructs, pri- marily supertype/subtype relationships. As in Chapter 2, problems and exercises have been revised, with three new exercises and several building on or extending the new exer- cises from Chapter 2. The third part of the new FAME case is presented in this chapter. The chapter continues to present thorough coverage of supertype/subtype relationships and includes a comprehensive example of an extended E-R data model for the Pine Valley Furniture Company.
Chapter 4: Logical Database Design and the Relational Model This chapter describes the process of converting a conceptual data model to the relational data model, as well as how to merge new relations into an existing normalized database. It provides a conceptually sound and practically relevant introd uction to normalization, emphasizing the importance of the use of functional dependencies and determinants as the basis for normalization. Concepts of normalization and normal forms are extended in Append ix B. The chapter features a discussion of the characteristics of foreign keys and introd uces the important concept of a nonintelligent enterprise key. Enterprise keys (also called surrogate keys for data warehouses) are emphasized as some concepts of object- orientation have migrated into the relational technology world . New problems and exer- cises are included that d raw upon the new problems and exercises from Chapters 2 and 3 for relational modeling and normalization. The chapter continues to emphasize the basic concepts of the relational data model and the role of the database designer in the logical design process.
PART Ill: DATABASE IMPLEMENTATION AND USE
Chapter 5: Introduction to SQL This chapter (Chapter 6 in 12th edition) presents a thorough introd uction to the SQL used by most DBMSs (SQL:1999) and introduces the changes that are included in the latest standards (SQL: 2011 and SQL:2016). This edition adds coverage of the new features of SQL:2016, including row pattern recognition, JSON support, and extended analytical
capabilities. The new edition also clarifies coverage of SQL data types and, overall, makes it easier to move from relational design in Chapter 4 directly to database implementation without the material on physical database design (now in Chapter 8). The coverage of SQL is extensive and divided between this chapter and Chapter 6. This chapter includes exam- ples of SQL code, using mostly SQL:1999 and SQL:2016 syntax, as well as some Oracle 12c and Microsoft SQL Server syntax. Some unique features of MySQL are mentioned. In this edition, coverage of views has been moved to Chapter 6. Chapter 5 explains the SQL commands needed to create and maintain a database and to program single-table queries. Five review questions and 13 problems and exercises have been added to the chapter or modified extensively. The chapter continues to use the Pine Valley Furniture Company case to illustrate a wide variety of practical queries and query results.
Chapter 6: Advanced SQL This chapter (Chapter 7 in 12th edition) continues the description of SQL, with a care- h tl explanation of mttltiple-table queries, transaction integrity, data dictionaries, dynamic and materialized views, triggers and stored proced ures (the differences between them are now more clearly explained), and embedding SQL in other programming language programs. All forms of the OUTER JOIN command are covered. Standard SQL (with an updated focus on SQL:2016) is also used. The revised version of the chapter includes now thorough coverage of views and the purposes for which they are used, including their role in enabling security and privacy solutions. This chapter illustrates how to store the results of a query in a derived table, the CAST command to convert data between different data types, and the CASE command for doing conditional processing in SQL. Emphasis continues on the set-processing style of SQL compared with the record processing of pro- gramming languages with which the student may be familiar. The section on routines has been revised to provide clarified, expanded, and more current coverage of this topic. The material of transaction integrity, has, however been moved to Chapter 7, where it most naturally belongs. The chapter continues to contain a clear explanation of subqueries and correlated subqueries, two of the most complex and powerhtl constructs in SQL. At the end, the chapter discusses material that is new to this chapter: data dictionary facilities (in practice, using SQL to understand the structure of the database) and recent extensions and enhancements to SQL. Chapter review material has been updated with 13 new problems and exercises and three new review questions.
Chapter 7: Databases in Applications This chapter (Chapter 8 in 12th edition) provides a modern discussion of the concepts of client/server architecture and applications, middleware, and da tabase access in contemporary database environments. The chapter has been structurally significantly modified to p rovide additional clarity, including the integration of material on a two- tiered architecture into the section on three-tiered architecture. In addition to a revised example of writing a Java web application, there is an entire new section- including an extensive and detailed example-on writing Web applications with Python, a widely used general purpose p rogramming language that has become very popular in ana- lytics. Sections on transaction integrity, concurrent access, and application level data security have been revised and moved to this chapter to provide add itional conceptual clarity. Material on cloud computing has been moved to Chapter 8 on database infra- structure. Review questions and problems and exercises have been updated.
Chapter 8: Physical Database Design and Database Infrastructure This chapter (Chapter 5 in the 12th edition) describes the steps that are essential in achiev- ing an efficient database design, with a strong focus on those aspects of database design and implementation that are typically within the control of a database professional in a modern database environment. In addition, several new topics on database infrastruc- ture have been integrated into this chapter to improve the structural clarity of the book, including data dictionaries and repositories, general database software security features, and database backup and recovery. A revised and extended section on cloud-based database irtfrastructure completes the chapter. Overall, the chapter emphasizes ways to
Preface xx1x
xxx Preface
improve database performance, with references to specific techniques available in Oracle and other DBMSs to achieve this goal. The discussion of indexes includes descriptions of the types of indexes that are widely available in database technologies as techniques to improve query processing speed. Appendix C provides excellent background on funda- mental data structures for programs of study that need coverage of this topic. The chapter continues to emphasize the physical design process and the goals of that process. Review questions and problems and exercises have been updated and extended based on the new structure and content of the chapter.
PART IV: ADVANCED DATABASE TOPICS
Chapter 9: Data Warehousing and Data Integration This chapter describes the basic concepts of data warehousing, the reasons data ware- housing is regarded as critical to competitive advantage in many organizations, and the database design activities and structures unique to data warehousing. The most important change of this chapter is the integration of material on data integration (formerly in Chapter 10 in the 12th ed ition) into it. This change strengthens the read- ers' ability to understand the essential role of data integration in data warehousing (particularly in ETL and other aspects of data preparation), and it clarifies the struc- ture of the book. Topics covered in this chapter include alternative data warehouse architectures and the dimensional data model (or star schema) for data warehouses. In this ed ition, additional attention is given to cloud-based implementation of data warehouses. Throughout the chapter, several details have been updated to ensure technical correctness. Operational data store and independent, dependent, and logi- cal data marts are defined. The chapter includes multiple new and revised review questions and problems and exercises.
Chapter 10: Big Data Technologies This chapter incorporates big data infrastructure material from Chapter 11 in the 12th edition, significantly expanding it and making it more directly applicable with sub- stantial detailed descriptive examples of MongoDB (the most popular NoSQL data- base) and Pig (scripting language and task a utomation environment for Hadoop) and Hive (an SQL-like declarative language for querying data stored in Hadoop). This new version of the material gives the students a much more p ractical, hand s-on sense of the p urposes for which these well-known tools can be used and how they can serve the goals of b ig data management. The chapter also includes several new prob lems and exercises based on these environments. Overall, the chapter helps the readers understand how big data technologies have expanded the possibilities for analytics-driven innovation through advanced informational systems that are pushing boundaries further in terms of volume, velocity, and variety of data wh ile paying continuous attention to value and veracity of big data .
Chapter 11 : Analytics and its Implications Chapter 11 offers integrated coverage of analytics, including descriptive, p redictive, and prescriptive analytics. It is based on material on analytics in the b ig data and analytics chapter in the 12th edition, expand ing it with comprehensive new sections on R, Python, and Apache Spark and bringing in material on analytical functions in SQL. The d iscussion on analytics is linked not only to the coverage of big data b ut also the material on data warehousing in Chapter 9 and the general d iscussion on data management in Chapter 1 (as indicated in the new framework in Chapter 1). The chapter also covers approaches and technologies used by analytics profession- als, such as on-line analytical p rocessing, data visualization, business performance management and dashboards, d ata mining, and text m ining. Finally, the chapter integrates the coverage of big data and analytics technologies to the individual, orga- nizational, and societal implications of these capabilities. Review q uestions on the new material have been added.
Chapter 12: Data and Database Administration with Focus on Data Quality This chapter p resen ts a thorough d iscussion of the importance and roles of d ata and da tabase ad ministration and describes a number of the key issues that ar ise when these functions are performed. This chapter emphasizes the changing roles and approaches of data and database administration, with a renewed and strength- ened emphasis on data quali ty. The chapter both d iscusses essential characteristics of high-quality data and the mechanisms that organizations need to p ut in place to enable data quality improvement. Data governance, data availability, and master data management are also covered. The chapter continues to emphasize the criti- cal importance of data and database management in managing data as a corporate asset.
Chapter 13: Distributed Databases This chapter- available on the book's Web site- reviews the role, technologies, and unique database design opportunities of d istrib uted databases. The objectives and trade-offs for distrib uted databases, data replication alternatives, factors in selecting a data distribution strategy, and distrib uted database vendors and products are covered. This chapter provides thorough coverage of database concurrency access controls. Many reviewers have ind icated that they are seldom able to cover this chapter in an introductory course, but having the material available is critical for advanced students or special topics.
Chapter 14: Object-Oriented Data Modeling This chapter presents an introduction to object-oriented modeling using Object Management Group's Unified Modeling l anguage (UML). This chapter has been care- fully reviewed to ensure consistency with the latest UML notation and best ind ustry practices. UML p rovides an industry-standard notation for representing classes and objects. The chapter continues to emphasize basic object-oriented concepts, such as inheritance, encapsulation, composition, and polymorphism. As with Chapter 13, Chapter 14 is available on the textbook's Web site.
APPENDICES
In the 13th edition three appendices are available on the book's Web site and are intended for those who wish to explore certain topics in greater depth.
Appendix A: Data Modeling Tools and Notation This append ix add resses a need raised by many readers- how to translate the E-R notation in the text into the form used by the CASE tool or the DBMS used in class. Specifically, this appendix compares the notations of CA ERwin Data Modeler r9.7, Oracle SQL Data Modeler 4.2, SAP Sybase PowerDesigner 16.6, and Microsoft Visio Professional 2016. Tables and illustrations show the notations used for the same constructs in each of these popular software packages.
Appendix B: Advanced Normal Forms This appendix presents a description (with examples) of Boyce-Codd and fourth normal forms, including an example of BCNF to show how to handle overlapping candidate keys. Other normal forms are briefly introduced . The Web Resources section includes a reference for information on many advanced normal form topics.
Appendix C: Data Structures This appendix describes several data structures that often underlie database imple- mentations. Topics include the use of pointers, stacks, queues, sorted lists, inverted lists, and trees.
Preface xxxi
xxxii Preface
PEDAGOGY
A number of additions and improvements have been made to end-of-chapter materials to provide a wider and richer range of choices for the user. The most important of these improvements are the following:
1. Revietv Questions Questions have been updated to support new and enhanced chapter material.
2. Proble1ns and Exercises This section has been reviewed in every chapter, and many chapters contain new problems and exercises to support updated chapter material. Of special interest are questions in many chapters that give students opportunities to use the data sets provided for the text. Problems and exercises are presented in roughly increasing order of difficulty, which should help instructors and students find exercises appropriate for what they want to accomplish.
3. Field Exercises This section provides a set of "hands-on" mini-cases that can be assigned to individual students or to small teams of students. Field exercises range from directed field trips to Internet searches and other types of research exercises.
4. Case The 13th edition of this book includes the same mini-case that was introduced in the 12th edition: Forondo Artist Management Excellence Inc. (FAME). In the first three chapters, the case begins with a description provided in the "voice" of one or more stakeholders, revealing a new dimension of requirements to the reader. Each chapter has project assignments intended to provide guidance on the types of deliv- erables instructors could expect from students, some of which tie together issues and activities across chapters. These project assignments can be completed by individual students or by small project teams. This case provides an excellent means for stu- dents to gain hands-on experience with the concepts and tools they have studied. The instructor's manual will include new materials to support the use of the case.
5. Web Resources Each chapter contains a list of updated and validated URLs for Web sites that contain information that supplements the chapter. These Web sites cover online publication archives, vendors, electronic publications, ind ustry stan- dards organizations, and many other sources. These sites allow students and instructors to find updated product information, innovations that have appeared since the printing of the book, background information to explore topics in greater depth, and resources for writing research papers.
We continue to provide several pedagogical features that help make the 13th edi- tion widely accessible to instructors and students. These features include the following:
1. Lean1i11g objectives appear at the beginning of each chapter, as a preview of the major concepts and skills students will learn from that chapter. The learning objectives- carefully updated to be aligned with the new chapter stn1cture-also provide a great study review aid for students as they prepare for assignments and examinations.
2. Chapter introductions and s111111naries both encapsulate the main concepts of each chapter and link material to related chapters, providing students with a compre- hensive conceptual framework for the course.
3. TI1e chapter review includes the Review Questions, Problems and Exercises, and Field Exercises discussed earlier and also contains a Key Terms list to test the stu- dent's grasp of important concepts, basic facts, and significant issues.
4. A running glossary defines key terms in the page margins as they are discussed in the text. These terms are also defined at the end of the text, in the Glossary of Terms. Also included is the end-of-book Glossary of Acronyms for abbreviations commonly used in database management.
ORGANIZATION
We encourage instructors to customize their use of this book to meet the needs of both their curriculum and student career paths. The mod ular nature of the text, its broad cov- erage, its extensive illustrations, and its inclusion of advanced topics and emerging issues make customization easy. The many references to current publications and Web sites
can help instructors develop supplemental reading lists or expand classroom discussion beyond material presented in the text. The use of appendices for several advanced topics allows instructors to easily include or omit these topics.
The modular nature of the text allows the instructor to omit certain chapters or to cover chapters in a different sequence. For example, an instructor who wishes to emphasize data modeling may cover Chapter 14 (available on the book's Web site) on object-oriented data modeling along with or instead of Chapters 2 and 3. An instructor who wishes to cover only basic entity-relationship concepts (but not the enhanced E-R model) may skip Chapter 3 or cover it after Chapter 4 on the relational model.
We have contacted many adopters of Modern Database Manage1nent and asked them to share with us their syllabi. Most adopters cover the chapters in sequence, but several alternative sequences have also been successh1l. These alternatives include the following:
• Some instructors cover Chapter 12 on data and database administration immediately after Chapter 8 on physical database design and the relational model.
• To introd uce SQL as early as possible, many instn1ctors have effectively covered 12th edition Chapters 6 and 7 (SQL) immediately after Chapter 4; therefore, we have now placed them as Chapters 5 and 6. Some have even covered the new Chapter 5 immediately after Chapter 1, which the book makes possible.
• Many instructors have students read appendices along with chapters, such as reading Appendix on data modeling notations with Chapter 2 or Chapter 3 on E-R modeling, Appendix B on advanced normal forms with Chapter 4 on the relational model, and Appendix Con data structures with Chapter 8.
THE SUPPLEMENT PACKAGE: WWW.PEARSONHIGHERED.COM/HOFFER
A comprehensive and flexible technology support package is available to enhance the teaching and learning experience. All instructor and student supplements are available on the text Web site: www.pearsonhighered.com/hoffer.
For Students The following online resources are available to students:
• Coniplete d·zapters on distributed databases and object-oriented data 1nodeling as well as appendices focusing on data modeling notations, advanced normal forms, and data structures allow you to learn in depth about topics that are not covered in the textbook.
• Accon1panying databases are also provided. Two versions of the Pine Valley Furniture Company case have been created and populated for the 13th edition. One version is scoped to match the textbook examples. A second version is fleshed out with more data and tables. This version is not complete, however, so that students can create missing tables and additional forms, reports, and modules. Databases are provided in several formats (ASCII tables, Oracle script, and Microsoft Access), but formats vary for the two versions. Some documentation of the databases is also provided. Both versions of the PVFC database are also provided on Teradata University Network.
• $(!l)eral custo1n-d(!l)e/oped short videos that address key concepts and skills frorn different sections of the book help students learn material that may be more difficult to under- stand by using both the printed text and a mini lecture.
For Instructors The following online resources are available to instructors:
• The Instructor's Resource Manual by Heikki Topi, Bentley University, provides chapter-by-chapter instructor objectives, classroom ideas, and answers to Review Questions, Problems and Exercises, Field Exercises, and Project Case Questions.
Preface xxxiii
xxxiv Preface
The Instructor's Resource Manual is available for download on the instructor area of the text's Web site.
• The Test Bank and TestGen, by John Russo, Wentworth Institute of Technology, includes a comprehensive set of test questions in multiple-choice, true/false, and short-answer format, ranked according to level of difficulty and referenced with page numbers and topic headings from the text. The Test Bank is available in Microsoft Word and as the computerized TestGen. TestGen is a comprehensive suite of tools for testing and assessment. It allows instructors to easily create and distrib- ute tests for their courses, either by printing and distributing through traditional methods or by online delivery via a local area network (LAN) server. Test Manager features Screen Wizards to assist you as you move through the program, and the software is backed with full technical support.
• PowerPoint presentation slides, by Michel Mitri, James Madison University, feature lecture notes that highlight key terms and concepts. Instructors can customize the presentation by adding their own slides or editing existing ones.
• The linage Libran; is a collection of the text art organized by chapter. It includes all figures, tables, and screenshots (as permission allows) and can be used to enhance class lectures and PowerPoint slides.
• Accompanying databases are also provided. Two versions of the Pine Valley Furniture Company case have been created and populated for the 13th edition. One version is scoped to match the textbook examples. A second version is fleshed out with more data and tables. This version is not complete, however, so that students can create missing tables and additional forms, reports, and modules. Databases are provided in several formats (ASCII tables, Oracle script, and Microsoft Access), but formats vary for the two versions. Some documentation of the databases is also provided. Both versions of the PVFC database are also available on Teradata University Network.
VITALSOURCE eTEXTBOOK
VitalSource eTextbooks were developed for students looking to save on required or recommended textbooks. Students simply select their eText by title or author and pur- chase immediate access to the content for the duration of the course using any major credit card . With a VitalSource eText, students can search for specific key words or page numbers, take notes online, p rint out reading assignments that incorporate lecture notes, and bookmark important passages for later review. For more information or to purchase a Vita!Source eTextbook, visit www.vitalsource.com.
ACKNOWLEDGMENTS
We are grateful to numerous individuals who contributed to the p reparation of Modern Database Managen1en.t, 13th ed ition. First, we w ish to thank our reviewers for their detailed suggestions and insigh ts, characteristic of their thoughtful teaching style. As always, analysis of topics and depth of coverage provided by the reviewers were crucial. Our reviewers and others who gave us many useful comments to improve the text include Tamara Babaian, Bentley University; Subhajyoti Bandyopadhyay, University of Florida; Gary Baram, Temple University; Bijoy Bordoloi, Southern Illinois University, Edwardsville; Timothy Bridges, University of Central Oklahoma; Traci Carte, University of Oklahoma; Laurie Crawford, Franklin University; Wingyan Chung, Santa Clara University; Jagdish Gangolly, State University of New York at Albany; Jon Gant, Syracuse University; Jinzh u Gao, University of the Pacific; Monica Garfield, Bentley University; Rick Gibson, American University; Joy Godin, Georgia College & State University; Jian Guan, University of Louisville; Chengqi Guo, James Madison University; Connie Hecker, Missouri Western State University; William H. Hochstettler Ill, Franklin University; Dinakar Jayarajan, Illinois Institute of Technology; Michael Johnson, Christopher Newport University; Weiling Ke, Clarkson University; Dongwon Lee, Pennsylvania State University; Ingyu Lee, Troy University; Linda
LeSage, Davenport University; Chang-Yang Lin, Eastern Kentucky University; Brian Mennecke, Iowa State University; Kazuo Nakatani, Florida Gulf Coast University; Oat-Dao Nguyen, California State University, Northridge; Fred Niederman, Saint Louis University; Selwyn Piramuthu, University of Florida; Lara Preiser-Houy, California State Polytechnic University, Pomona; John Russo, Wentworth Institute of Technology; Becky Rutherfoord, Kennesaw State University; Ioulia Rytikova, George Mason University; Richard Segall, Arkansas State University; Sharlene Smith, Gaston College; John Snyder, Colorado Mesa University; Josephine Stanley-Brown, Norfolk State University; Chelley Vician, University of St. Thomas; Ruth Weldon, University of St. Francis; and Daniel S. Weaver, Messiah College; Zuopeng Zhang, State University of New York Plattsburgh; Dana Zhu, Iowa State University; Songhua Zu, New Jersey Institute of Technology.
We received excellent input from experts in industry, including Steve Williams (President, DecisionPath Consulting), Tom Victory (DecisionPath Consulting), Todd Walter, Carrie Ballinger, Rob Armstrong, and David Schoeff (all of Teradata Corp); Chad Gronbach and Philip DesAutels (Microsoft Corp.); Peter Gauvin (Ball Aerospace); and Michael Alexander (Open Access Technology, International).
We are very thankful to Ge Yan, Indiana University, for his contributions to some of the technical material in Chapter 7. We also want to thank Heikki Topi, Bentley University, for his role as author of the Instructor's Resource Man.uni. In addition to his d uties as author, Heikki took on this additional task and has been diligent in preparing the Instructor's Resource Manual; in the process he has helped us clarify and fix vari- ous parts of the text. We also want to recognize the important role played by Chelley Vician of the University of St. Thomas, the author of several previous editions of the Instructor's Resource Manual; her work added great value to this book. We also thank Sven Aelterman, Troy University, for his many excellent suggestions for improvements and clarifications throughout the text.
We are also grateful to the staff and associates of Pearson for their support and guidance throughout this project. In particular, we wish to thank Senior Portfolio Manager Samantha Lewis for her support through this revision process; Program Monitor Danica Monzor (SPi Global), and Associate Project Manager Neha Bhargava (Cenveo), who kept us on track and made sure everything was complete; and Associate Content Producer Stephany Harrington.
While finalizing this edition of Modern Database Manngernent, we pause to remem- ber with deep gratitude the contributions of Dr. Fred McFadden and Dr. Mary Prescott, coauthors of previous editions of this text. Fred and Mary are not with us anymore, but their contributions to MDBM, both content and spirit, continue to be directly and indirectly included in this book.
Finally, we give immeasurable thanks to our spouses, who endured many eve- nings and weekends of solitude for the thrill of seeing a book cover hang on a den wall. In particular, we marvel at the commitment of Patty Hoffer, who has lived the lonely life of a textbook author's spouse through 13 editions over more than 35 years of late-night and weekend writing. We also want to sincerely thank Anne-Louise Klaus for being willing to continue her wholehearted support for Heikki's involve- ment in the project. Although the book project was no longer new for Gayathri Mani, her continued support and understanding are very much appreciated. Much of the value of this text is due to their patience, encouragement, and love, b ut we alone bear the responsibility for any errors or omissions between the covers.
Jeffrey A. Hoffer
V. Ramesh
Heikki Topi
Preface xxxv
PARTI
The Context of Database Manage111ent
AN OVERVIEW OF PART I
In this chapter and opening part of the book, we set the context and provide basic database concepts and definitions used throughout t he text. In this part, you w ill understand database management as an excit ing, challenging, and growing f ield that provides numerous career opportunit ies for information systems students. Databases cont inue to become a more common part of everyday living and a more central component of business operat ions. From t he database t hat stores contact information in your smartphone or tablet to the very large databases that support ent erprise-wide information systems and provide important insights for organizational decision makers, databases have become the central points of data storage that were originally envisioned decades ago. Customer relationship management and Internet shopping are examples of two database-dependent activities that have developed in recent years. The development of data warehouses and "big data" repositories that provide managers the opportunity for deeper and broader historical analysis of data and specific guidance for f uture actions also continues to take on more importance.
We begin by providing basic definit ions of data, database, metadata, database management system, data warehouse, and other terms associated w it h this environment. We compare databases with the older fi le management systems they replaced and describe several important advantages t hat are enabled by the carefully planned use of databases. You w ill see a framework that provides an integrated perspective to bot h transactional and analytic use of various data management technologies to be used throughout this book and in your career.
The chapter also describes the general steps followed in the analysis, design, implementation, and administration of databases. Further, t his chapter also illustrates how the database development process fits into the overall information systems development process. Database development for both structured life cycle and prototyping methodologies is explained. We introduce enterprise data modeling, w hich sets the range and general contents of organizational databases. This is often the f irst step in database development. You w ill learn about the concept of schemas and the three-schema architecture, which is the dominant approach in modern database systems. We describe the major components of t he database environment and the types of applications as wel l as multi-t ier and ent erprise databases. Enterprise databases include those that are used to support ent erprise resource planning syst ems and data
Chapter 1 The Database Env ironment and Dev elopment Process
1
2 Part I • The Context of Database Managen1ent
warehouses. Finally, we describe the ro les in w hich you might be involved as part of a database development project. The Pine Val ley Furniture Company case is introduced and used to illustrate many of the principles and concepts of database management. This case is used t hroughout t he text as a continuing example of the use of database management systems.
The Database Environm.ent and Developm.ent Process LEARNING OBJECTIVES After studying this chapter, you should be able to:
a Concisely define each of the following key terms: database, data, information, metadata, database application, data model, entity, relational database, database management system (DBMS), data independence, user view, constrai nt, data mod eling and d esign tools, repository, enterprise data modeling, systems development life cycle (SDLC), conceptual schema, logical schema, physical schema, prototyping, agile software development, project, enterprise resource planning (ERP) system, data warehouse, and data lake.
a Name several limitations of conventional file processing systems.
a Explain at least 10 advantages of the database approach compared to trad itional file processing.
a Identify several costs and risks of the database approach.
a Distinguish between operational (transactional) and informational (data warehousing and big data) data management approaches and related technologies.
a List and briefly describe nine components of a typical database environment.
a Identify four categories of applications that use databases and their key characteristics.
a Describe the life cycle of a systems development project, with an emphasis on the purpose of database analysis, design, and implementation activities.
a Explain the prototyping and agile-d evelopment approaches to database and application development.
a Explain the roles of individ uals who design, implement, use, and administer databases.
a Explain the differences between personal, m ulti-tiered, and enterprise-level data management solutions.
a Explain the differences among external, conceptual, and internal schemas and the reasons for the use of a three-schema architecture for databases.
Visit www.pearsonhighered.com/ hotter t o view the accompanying video for th is chapter.
3
4 Part I • The Context of Database Managen1ent
DATA MATIER!
The amount of data being generated, stored, and processed is growing by leaps and bounds. According to a McKinsey Global Institute Report (Manyika et al., 2011), it is estimated that in 2010 alone, global enterprises stored more than 7 exabytes of data (an exabyte is a bill ion gigabytes), w hile consumers stored more t han 6 exabytes of new data on devices such as personal computers, smartphones, tablets, and notebooks. That is a lot of data! As more and more of the world becomes digital and the products we use every day, such as watches, refrigerators, and so forth, become smarter, the amount of data t hat needs to be generated, stored, and processed w ill only continue to grow.
The availability of al l of these data is also opening up unparalleled opportunities for companies to leverage data for various purposes. A recent study by IBM (IBM, 2011) shows t hat one of the top priorit ies for CE Os in the coming years is t he ability to use insights and intel ligence t hat can be gleaned from data for competitive advantage. The McKinsey Global Institute Report (Manyika et al., 2011) estimates that by appropriately leveraging the data available to t hem, the U.S. retai l industry can see up to a 60 percent increase in net margin, and manufacturing can realize up to a SO percent reduction in product development costs.
The availability of large amounts of data is also fueling innovation in companies and allowing them to think different ly and creatively about various aspects of t heir businesses. Below you wil l find some examples from a variety of domains:
1. The Memorial Sloan-Kettering Cancer center is using IBM Watson (do you remember Watson beating Ken Jennings in Jeopardy?) to help ana- lyze the information from medical literature, research, past case histories, and best practices to help provide oncologists w ith evidence-based recom- mendations ( www-93 5. i bm .com/se rvices/ multi media/MS K_ Case_Study _ IM C14794.pdf) .
2. Cont inental Airlines (now United) invested in a real-time business intel li- gence capability and was able to dramatically improve its customer service and operations. For example, it can now track w hether a high-value customer is experiencing a delay in a trip, w here and when the customer will arrive at the airport, and the gate the customer must go to make t he next connection (Anderson-Lehman et al., 2004).
3. A leading fast-food chain uses video information from its fast-food lane to determine what food products to display on its (digital) menu board. If the lines are long, t he menu displays items that can be served quickly. If the lines are short, the menu displays higher-margin but slower-to-prepare items (Laskowski, 2014).
4. Nagoya Railroad analyzes data about its customers' travel habits along with their shopping and dining habits to better understand its customers. For example, it was able to ident ify that young women who used a particular train station for their commute also tended to eat at a particular type of res- taurant and buy from certain types of stores. This information allows Nagoya Rai lroad to create a targeted marketing campaign (http://public.dhe.ibm .com/common/ssi/ecm/en/ytc03707usen/ YTC03707USEN.PDF).
5. Kroger, a fast-growing grocery store chain, was able to increase the return rates of its direct mail campaigns to a very high level of 70 percent by per- sonalizing the offers based on the data the company had col lected regarding their customer's purchasing behavior (Groenfeldt, 2013).
At the heart of all the above examples is the ability to col lect, organize, and manage data. This is precisely t he focus of this textbook. This understanding w ill give you t he power to support any business strategy and the deep satisfaction that comes f rom knowing how to organize data so that f inancial, marketing, or customer service questions can be answered almost as soon as t hey are asked. Enjoy!
1 • The Database Environmen t and Developn1ent Process 5
INTRODUCTION
Over t he past two decades, data have become strategic assets for most organizations. Databases store, manipulate, and retrieve data in nearly every type of organization, including business, health care, education, government, libraries, and many scientific f ields. Individuals w ith various personal devices and employees using enterprise- wide distributed applications depend on database technology. Customers and other remote users access databases through diverse technologies, such as automated teller machines, Web browsers, smartphones, and intelligent living and office environments. Most Web-based applications depend on a database foundation.
Following this period of rapid growth, will t he demand for databases and database technology level off? Very likely not! In the highly competitive environment of today, there is every indication that database technology w ill assume even greater importance. Managers seek to use knowledge derived f rom databases for competitive advantage. For example, detailed sales databases can be mined to determine customer buying patterns as a basis for advertising and marketing campaigns. Organizations embed procedures called alerts in databases to warn of unusual condit ions, such as impending stock shortages or opportunities to sell additional products, and to trigger appropriate actions. Analytics in its various forms- including big data analytics- depends on databases and other data management technologies.
Although t he future of databases is assured, much work remains to be done. Many organizations have a prol iferation of incompatible databases that were developed to meet immediate needs rather than based on a planned strategy or a well-managed evolution. Enormous amounts of data are trapped in older, "legacy" systems, and the data are often of poor quality. New skills are required to design and manage data warehouses and other repositories of data and to fully leverage all the data that are being captured in the organization. There is a shortage of skills in areas such as database analysis, database design, database applicat ion development, and business analytics. You w ill learn about these and other important issues in this textbook to equip you for the jobs of the future.
A course in database management has emerged as one of the most important courses in the information systems curriculum today. Further, many schools have added addit ional elective courses in data warehousing, data mining, and other aspects of business analytics to provide in-depth coverage of t hese important topics. As information systems professionals, you must be prepared to analyze database requirements and design and to implement databases within the context of information systems development. You also must be prepared to consult with end users and show t hem how t hey can use databases (or data warehouses) to build decision models and systems for competitive advantage. The widespread use of databases attached to Web sites t hat return dynamic information to users of these sites requires that you understand not only how to link databases to the Web- based applications but also how to secure those databases so that their contents can be viewed but not compromised by outside users.
In this chapter, you w ill learn about the basic concepts of databases and database management systems (DBMSs). You wi ll review traditional file management systems and some of their shortcomings t hat led to the database approach. Next, you w ill consider the benefits, costs, and risks of using the database approach. We review the range of technologies used to build, use, and manage databases; describe the types of applicat ions that use databases (personal, multi-t ier, and enterprise); and describe how databases have evolved over the past f ive decades. The chapter also presents a framework that w ill help you understand traditional and emerging data management approaches and technologies in a joint context.
Because a database is one part of an information system, this chapter also examines how the database development process f its into the overall information systems development process. The chapter emphasizes the need to coordinate database development with all the other activit ies in the development of a complete
6 Part I • The Context of Database Managen1ent
information system. It includes highlights from a hypothetical database development process at Pine Valley Furniture Company. Using t his example, the chapter introduces tools for developing databases on personal computers and t he process of extracting data f rom enterprise databases for use in stand-alone applications.
There are several reasons for discussing database development at this point. First, alt hough you may have used the basic capabilities of a database management system, such as Microsoft Access, you may not yet have developed an understanding of how these databases were developed. Using simple examples, this chapter briefly illustrates what you will be able to do after you complete a database course using t his text. Thus, t his chapter helps you develop a vision and context for each topic developed in detail in subsequent chapters.
Second, many students learn best from a text full of concrete examples. Although all of t he chapters in this text contain numerous examples, i llustrations, and actual database designs and code, each chapter concentrates on a specific aspect of database management. This chapter wi ll help you understand, with minimal technical details, how all of t hese individual aspects of database management are related and how database development tasks and ski lls relate to what you are learning in other information systems courses.
Finally, many instructors want you to begin the initial steps of a database development group or individual project early in your database course. This chapter gives you an idea of how to structure a database development project sufficient to begin a course exercise. Obviously, because this is only the f irst chapter, many of the examples and notations you w ill encounter in this chapter are much simpler than those required for your project, for ot her course assignment s, or in a real organization.
One note of caut ion: You will not learn how to design or develop databases just f rom this chapter. Sorry! You wi ll discover that t he content of this chapter is introductory and simplif ied. Many of t he notations used in this chapter are not exactly like the ones you wi ll learn in subsequent chapters. Our purpose in this chapter is to give you a general understanding of the key steps and types of skills, not to teach you specific techniques. You w ill, however, learn fundamental concepts and definitions and develop an intuition and mot ivation for t he skills and knowledge presented in later chapters.
BASIC CONCEPTS AND DEFINITIONS
Database A database is an organized collection of logically related data . Not many words in the An organized collection of logically definition, but have you looked at the size of this book? There is a lot to do to fulfill this related data. definition .
A database may be of any size and complexity. For example, a salesperson may maintain a small database of customer contacts-consisting of a few megabytes of data-on her lap top computer. A large corporation may build a large d atabase con- sisting of several terabytes of data (a terabyte is a trillion bytes) on a large mainframe computer that is used for decision support applications. Very large data warehouses contain more than a petabyte of d ata. (A petabyte is a quadrillion bytes.) The assumption throughout the text is that all databases are computer based.
Data
Historically, the term data referred to facts concerning objects and events that could be recorded and stored on computer media. For example, in a salesperson's database, the data would include facts such as customer name, address, and telephone number. This type of data is called structured d ata. The most important structured data types are numeric, character, and d ates. Structured data are stored in tabular form (in tables, rela- tions, arrays, spreadsheets, and so forth) and are most commonly found in traditional databases and data warehouses.
The traditional definition of data now needs to be expanded to reflect a new reality: Databases today are used to store objects sucll as documents, e-mails, tweets, Facebook
1 • The Database Environment and Developn1ent Process 7
posts, GPS information, maps, photographic images, sound, and video segments in addition to structured data. For example, the salesperson's database might include a photo image of the customer contact. It might also include a sound recording or video clip about the most recent prod uct. This type of data is referred to as unstructured data, or as multimedia data. Today, structured and unstructured data are often combined in the same database to create a true multimedia environment. For example, an automo- bile repair shop can combine structured data (describing customers and automobiles) with multimedia data (photo images of the damaged autos and scanned images of insurance claim forms). One of the defining elements of "big data" technologies, which will also be covered later in this book, is that they provide capabilities to deal with highly heterogeneous data.
An expanded definition of data that includes structured and unstructured types is "a stored representation of objects and events that have meaning and importance in the user's environment."
Data versus Information
The terms data and inforrnation are closely related and in fact are often used interchange- ably. However, it is useful to distinguish between data and information. Information is data that have been processed in such a way that the knowledge of the person who uses the data is increased. For example, consider the following list of facts:
Baker, Kenneth D.
Doyle, Joan E.
Finkle, Clive R.
Lewis, John C.
Mcferran, Debra R.
324917628 476193248 548429344 551742186 409723145
These facts satisfy our definition of data, but most people would agree that the data are useless in their present form. Even if you guessed that this is a list of people's names paired with their Social Security numbers, the data remain useless because you would have no idea what the entries mean. Notice what happens when you see the same data in a context, as shown in Figure 1-la.
By adding a few additional data items and providing some structure, you are able to recognize a class roster for a particular course. This is useful information to some users, such as the course instructor and the registrar's office. Of course, as general awareness of the importance of strong data security has increased, few organizations use Social Security numbers as identifiers any longer. Instead, most organizations use an internally generated number for identification purposes.
Another way to convert data into information is to summarize them or otherwise process and present them for human interpretation. For example, Figure 1-lb shows
Class Roster
Course: MGT 500 Business Policy
Semester: Spring 2018
Section: 2
Name ID Major GPA
Baker, Kenneth D. 324917628 MGT 2.9 Doyle, Joan E. 476193248 MKT 3.4 Finkle, Clive R. 548429344 PRM 2.8 Lewis, John C. 551742186 MGT 3.7 Mcferran, Debra R. 409723145 IS 2.9 Sisneros, Michael 392416582 ACCT 3.3
Data
Stored representations of objects and events that have meaning and importance in the user's environment.
Information
Data that have been processed in such a way as to increase the knowledge of the person who uses the data.
FIGURE 1-1 Converting data to information
(a) Data in context
8 Part I • The Context of Database Managen1ent
FIGURE 1·1 (cont inued)
(b) Summarized data
Metadata
Data that describe the properties or characteris tics of end-user data and the context of those data.
IS (15%)
MGT (20%)
ACCT (25%)
Percent Enrollment by Major (2018)
Cl)
c (I) 'O "300
---- = actual •••••••••••• = estimated
••• ••
2013 2014 2015 2016 2017 2018
Year Enrollment Projections
summarized student enrollment data presented as graphical information. This informa- tion could be used as a basis for deciding whether to add new courses or to hire new faculty members.
In practice, according to our definitions, databases today may contain data, information, or both. For example, a database may contain an image of the class roster document shown in Figure 1-la. Also, data are often preprocessed and stored in sum- marized form in databases that are used for decision support. Throughout this text, we use the term database without distinguishing its contents as data or information.
Metadata
As discussed earlier, data become useful only when placed in some context. The primary mechanism for providing context for data is metadata. Metadata are data that describe the properties or characteristics of end-user data and the context of that data. Some of the properties that are typically described include data names, definitions, length (or size), and allowable values. Metadata describing data context include the source of the data, where the data are stored, ownership (or stewardship), and usage. Although it may seem circular, many people think of metadata as "data about data."
Some sample metadata for the Class Roster (Figure 1-la) are listed in Table 1-1. For each data item that appears in the Class Roster, the metadata show the data item name, the data type, length, minimum and maximum allowable values (where appropriate), a brief description of each data item, and the source of the data (sometimes called the systen1 of record). Notice the distinction between data and metadata. Metadata are once removed from data. That is, metadata describe the properties of data but are separate from that data. Thus, the metadata shown in Table 1-1 do not include any sample data from the Class Roster of Figure 1-l a. Metadata enable database designers and users to under- stand what data exist, what the data mean, and how to distinguish between data items
TABLE 1-1 Example Metadata for Class Roster
Data Item Metadata
Name Type Length Min Max Description Source
Course Alphanumeric 30 Course ID and name Academic Un it
Section Integer 1 1 9 Section number Registrar
Semester Alphanumeric 10 Semester and year Registrar Name Alphanumeric 30 Student name Student IS
ID Integer 9 Student ID (SSN) Student IS
Major Alphanumeric 4 Student major Student IS GPA Decimal 3 0.0 4.0 Student grade point average Academic Un it
1 • The Database Environment and Developn1ent Process 9
that at first glance look similar. Managing metadata is at least as crucial as managing the associated data because data without clear meaning can be confusing, misinterpreted, or erroneous. Typically, much of the metadata are stored as part of the database and may be retrieved using the same approaches that are used to retrieve data or information.
Data can be stored in files (think Excel sheets) or in databases. In the following sections, you will learn about the progression from file processing systems to databases and the advantages and disadvantages of each.
TRADITIONAL FILE PROCESSING SYSTEMS
When computer-based data processing was first available, there were no databases. To be useful for business applications, computers had to store, manipulate, and retrieve large files of data. Computer file processing systems were developed for this purpose. Although these systems have evolved over time, their basic structure and purpose have changed little over several decades.
As business applications became more complex, it became evident that traditional file processing systems had a number of shortcomings and limitations (described next). As a result, these systems have been replaced by database processing systems in most business applications today. Nevertheless, you should have at least some familiarity with file processing systems since understanding the problems and limitations inherent in file processing systems can help you avoid these same problems when designing database systems. It should be noted that Excel files, in general, fall into the same category as file systems and suffer from the same drawbacks listed below. Informal use of Excel for management of data is believed to continue to be quite widespread, although valid research results regarding this are difficult to find.
File Processing Systems at Pine Valley Furniture Company
Early computer applications at Pine Valley Furniture used the traditional file processing approach. This approach to information systems design met the data processing needs of individual departments rather than the overall information needs of the organization. The information systems group typically responded to users' requests for new systems by developing (or acquiring) new computer programs for individual applications, such as inventory control, accounts receivable, or human resource management. No overall map, plan, or model guided application growth.
Three of the computer applications based on the file processing approach are shown in Figure 1-2. The systems illustrated are Order Filling, Invoicing, and Payroll.
FIGURE 1-2 Old fi le processing systems at Pine Valley Furniture Company
Orders Department Accounting Department
• PINE i--.---1 VALLEY 1,----'• '---,l FURN ITURE
Payroll Department
Program A Program B Program C Program A Program B Program A Program B
l .--------'--i ~J Order Filling r Sy;em 1
Customer Master
File
Inventory Master
File
Back Order File
Invoicing System
Inventory Pricing
File
Customer Master
File
Payroll System
Employee Master
File
10 Part I • The Context of Database Managen1ent
Database application
An application program (or set of related programs) Iha! is used to perform a series of database activities (create, read, update, and delete) on behalf of database users.
TABLE 1-2 Disadvantages of File Processing Systems
Program-data dependence
Duplication of data
Limited data sharing
Lengthy development times
Excessive program maintenance
The figure also shows the major data files associated with each application. A file is a collection of related records. For example, the Order Filling System has three files: Customer Master, Inventory Master, and Back Order. Notice that there is duplication of some of the files used by the three applications, which is typical of file processing systems.
Disadvantages of File Processing Systems
Several disadvantages associated with conventional file processing systems are listed in Table 1-2 and described briefly next. It is important to understand these issues because if you do not follow the database management practices described in this book, some of these disadvantages can also become issues for databases as well.
PROGRAM-DATA DEPENDENCE File descriptions are stored within each database application program that accesses a given file. For example, in the Invoicing System in Figure 1-2, Program A accesses the Inventory Pricing File and the Customer Master File. Because the program contains a detailed file description for these files, any change to a file structure requires changes to the file descriptions for all programs that access the file.
Notice in Figure 1-2 that the Customer Master File is used in the Order Filling System and the Invoicing System. Suppose it is decided to change the customer address field length in the records in this file from 30 to 40 characters. The file descriptions in each program that is affected (up to five programs) would have to be modified. It is often difficult even to locate all programs affected by such changes. Worse, errors are often introduced when making such changes.
DUPLICATION OF DATA Because applications are often developed independently in file processing systems, unplanned duplicate data files are the rule rather than the exception. For example, in Figure 1-2, the Order Filling System contains an Inventory Master File, whereas the Invoicing System contains an Inventory Pricing File. These files contain data describing Pine Valley Furniture Company's products, such as prod- uct description, unit price, and quantity on hand . This duplication is wasteful because it requires additional storage space and increased effort to keep all files up to date. Data formats may be inconsistent, data values may not agree, or both . Reliable metadata are very difficult to establish in file processing systems. For example, the same data item may have different names in different files, or, conversely, the same name may be used for different data items in different files.
LIMITED DATA SHARING With the traditional file processing approach, each applica- tion has its own private files, and users have little opportunity to share data outside their own applications. Notice in Figure 1-2, for example, that users in the Accounting Department have access to the Invoicing System and its files, but they probably do not have access to the Order Filling System or to the Payroll System and their files. Man- agers often find that a requested report requires a major programming effort because data must be drawn from several incompatible files in separate systems. When different organizational units own these d ifferent files, additional management barriers must be overcome.
LENGTHY DEVELOPMENT TIMES With traditional file processing systems, each new application requires that the developer essentially start from scratch by designing
1 • The Database Environn1ent and Development Process 11
new file formats and descriptions and then writing the file access logic for each new program. The lengthy development times required are inconsistent with tod ay's fast- paced business environment, in which time to market (or time to prod uction for an information system) is a key business success factor.
EXCESSIVE PROGRAM M AINTENANCE The preceding factors all combined to cre- ate a heavy program maintenance load in organizations that relied on traditional file processing systems. In fact, as much as 80 percent of the total information system's development budget might be devoted to program maintenance in such organizations. This, in turn, means that resources (time, people, and money) are not being spent on developing new applications.
As d iscussed above, these d isadvantages are true also in situations when individ- uals and organizational units maintain important organizational data in Excel spread- sheets. Further, it is important to note that many of the disadvantages of file processing you have learned about can also be limitations of databases if an organization does not properly apply the database approach . For example, if an organization develops many separately managed databases (say, one for each division or business function) with little or no coordination of the metadata, uncontrolled data d uplication, limited data sharing, lengthy development time, and excessive program maintenance can occur. Thus, the database approach, which is explained in the next section, is as much a way to manage organizational data as it is a set of technologies for defining, creating, maintain- ing, and using these data.
THE DATABASE APPROACH
So, how do you overcome the flaws of file processing? No, you do not call Ghostbusters, but you can do something better: You should follow the database approach. You will first learn some core concepts that are hmdamental in understand ing the database approach to managing data. You will then discover how the database approach can overcome the limitations of the file processing approach.
Data Models
Designing a database properly is hmdamental to establishing a database that meets the needs of the users. Data models capture the nature of and relationships among data and are used at different levels of abstraction as a database is conceptualized and designed . The effectiveness and efficiency of a database is directly associated with the structure of the database. Various graphical systems exist that convey this structure and are used to produce data models that can be understood by end users, systems analysts, and database designers. Chapters 2 and 3 are devoted to developing your understand- ing of data modeling, as is Chapter 14, on the book's Web site, which addresses a differ- ent approach using object-oriented data modeling. A typical data model is made up of entities, attributes, and relationsh ips, and the most common data modeling representa- tion is the entity-relationship model. A brief description is presented next. More details will be forthcoming in Chapters 2 and 3.
ENTITIES Customers and orders are objects about which a business maintains infor- mation. They are referred to as "entities." An entity is like a noun in that it describes a person, a place, an object, an event, or a concept in the b usiness environment for which information must be recorded and retained. CUSTOMER and ORDER are entities in Figure 1-3a. The data you are interested in capturing about the entity (e.g., Customer Name) is called an attribute. Data are recorded for many customers. Each customer's information is referred to as an in.stance of CUSTOMER.
RELATIONSHIPS A well-structured database establishes the relationships between entities that exist in organizational data so that desired information can be retrieved. Most relationships are one-to-many (1:M) or many-to-many (M:N). A customer can place (the Places relationship) more than one order with a company. However, each
Data model
Graphical systems used to capture the natu re and relationships among data.
Entity
A person, a place, an object, an event, or a concept in the user environment about which the organization wishes to maintain data.
12 Part I • The Context of Database Managen1ent
FIGURE 1-3 Comparison of enterprise- and project-level data models
(a) Segment of an enterprise data model
(b) Segment of a project data model
Relational database
A database that represents data as a collection of tables in w hich all data relationships are represented by common values in related tables.
CUSTOMER
Places
I I\ Is Plac ed By
ORDER
\ I/ Contain s
I I\ Is Cont ained In
PRODUCT
CUSTOMER PRODUCT Customer ID Product ID Customer Name Standard Price
Places Has
I I\ Is Placed By I \Is For
ORDER Order ID Contains / ORDER LINE Customer ID Is Contained I 11' Quantity Order Date
order is usually associated with (the Is Placed By relationship) a particular customer. Figure 1-3a shows the 1:M relationship of customers who may place one or more orders; the 1:M nature of the relationship is marked by the crow's foot attached to the rectangle (entity) labeled ORDER. This relationship appears to be the same in Figures 1-3a and 1-3b. However, the relationship between orders and products is M:N. An order may be for one or more products, and a product may be included on more than one order. It is worthwhile noting that Figure 1-3a is an enterprise-level model, where it is necessary to include only the higher-level relationships of customers, orders, and products. The project-level diagram shown in Figure 1-3b includes additional levels of details, such as the further details of an order.
Relational Databases
Relational databases establish the relationships between entities by means of common fields included in a file, called a relatwn. The relationship between a customer and the customer's order depicted in the data models in Figure 1-3 is established by including the customer's number with the customer's order. Thus, a customer's identification number
1 • The Database Environn1ent and Development Process 13
is included in the file (or relation) that holds customer information such as name, address, and so forth. Every time the customer places an order, the customer identification number is also included in the relation that holds order information. Relational databases use the identification number to establish the relationship between customer and order.
Database Management Systems
A database management system (D BMS) is a software system that enables the use of a database approach. The primary purpose of a DBMS is to provide a systematic method of creating, updating, storing, and retrieving the data stored in a database. It enables end users and application programmers to share data, and it enables data to be shared among multiple applications rather than propagated and stored in new files for every new application (Mullins, 2002). A DBMS also provides facilities for controlling data access, enforcing data integrity, managing concurrency control, and restoring a database. You will learn about these DBMS features in detail in Chapters 7 and 8.
Now that you understand the basic elements of a database approach, you are in a good position to try to understand the differences between a database approacll and a file-based approacll. Let us begin by comparing Figures 1-2 and 1-4. Figure 1-4 depicts a representation (entities) of how the data can be considered to be stored in the database. Notice that unlike Figure 1-2, in Figure 1-4, there is only one place where the CUSTOMER information is stored rather than the two Customer Master Files. Both the Order Filling System and the Invoicing System will access the data contained in the single CUSTOMER entity. Further, what CUSTOMER information is stored, how it is stored, and how it is accessed are likely not closely tied to either of the two systems. All of this enables you to acllieve the advantages listed in the next section. Of course, it is important to note that a real-life database will likely include thousands of entities and relationships among them.
Advantages of the Database Approach
The primary advantages of a database approach, enabled by DBMSs, are summarized in Table 1-3 and described next.
PROGRAM-DATA INDEPENDENCE The separation of data descriptions (metadata) from the application programs that use the data is called data independence. With the database approach, data descriptions are stored in a central location called the repository.
INVENTORY CUSTOMER PRICING
HISTORY
Places \ I/ Keeps Price Changes For
I I\ Is Placed By Has Pric e Changes Of -~----~
ORDER ...._ Contains _/
INVENTORY / Is Contained In'
EMPLOYEE
Generates Is Contai ned In
I I\ Completes
-..... BACKORDER /
Contains
Database management system (DBMS)
A software system that is used to create, maintain, and provide oontrolled access lo user databases.
Data independence
The separation of data descriptions from the application programs that use the data.
FIGURE 1-4 Enterprise model for Figure 1-3 segments
14 Part I • The Context of Database Managen1ent
User view
A logical description of some portion of the database that is required by a user to perform some tas k.
TABLE 1-3 Advantages of the Database Approach
Program-data independence
Planned data redundancy
Improved data consistency
Improved data sharing
Increased product ivity of application development
Enforcement of standards
Improved data quality
Improved data accessibility and responsiveness
Reduced program maintenance
Improved decision support
This property of database systems allows an organization's data to change and evolve (within limits) without changing the application programs that process the data.
PLANNED DATA REDUNDANCY Good database design attempts to integrate previously separate (and redundant) data files into a single, logical structure. Ideally, each primary fact is recorded in only one place in the database. For example, facts about a product, such as the Pine Valley oak computer desk, its finish, price, and so forth, are recorded together in one place in the Product table, which contains data about each of Pine Val- ley's products. The database approach does not eliminate redundancy entirely, but it enables the designer to control the type and amount of redundancy. At other times, it may be desirable to include some limited redundancy to improve database perfor- mance, as you will see in later chapters.
IMPROVED DATA CONSISTENCY By eliminating or controlling data redundancy, you can greatly reduce the opportunities for inconsistency. For example, if a customer's address is stored only once, we cannot d isagree about the customer 's address. When the customer's address changes, recording the new address is greatly simplified because the address is stored in a single place. Finally, you avoid wasting storage space that results from redundant data storage.
IMPROVED DATA SHARING A database is designed as a shared corporate resource. Authorized internal and external users are granted permission to use the database, and each user (or group of users) is provided one or more user views into the database to facilitate this use. A user view is a logical description of some portion of the database that is required by a user to perform some task. A user view is often developed by identifying a form or report that the user needs on a regular basis. For example, an employee working in human resources will need access to confidential employee data; a customer needs access to the product catalog available on Pine Valley's Web site. The views for the human resources employee and the customer are drawn from completely different areas of one unified database.
INCREASED PRODUCTIVITY OF APPLICATION DEVELOPMENT A major advantage of the database approach is that it greatly reduces the cost and time for developing new business applications. There are three important reasons that database applications can often be developed much more rapidly than conventional file applications:
1. Assuming that the database and the related data capture and maintenance appli- cations have already been designed and implemented, the application developer can concentrate on the specific h1nctions required for the new application without having to worry about file design or low-level implementation details.
2. The database management system provides a number of high-level productivity tools, such as forms and report generators, and high-level languages that automate
1 • The Database Environn1ent and Development Process 15
some of the activities of database design and implementation. You will learn about many of these tools in subsequent chapters.
3. Significant improvement in application developer productivity, estimated to be as high as 60 percent (Long, 2005), is currently being realized through the use of Web services based on the use of standard Internet protocols and a universally accepted data format (XML).
ENFORCEMENT OF STANDARDS When the database approach is implemented with hill management support, the database administration function should be granted single-point authority and responsibility for establishing and enforcing data standards. These standards will include naming conventions, data quality standards, and uniform procedures for accessing, updating, and protecting data. The data repository provides database administrators with a powerhil set of tools for developing and enforcing these standards. Unfortunately, the failure to implement a strong database administration h1nction is perhaps the most common source of database failures in organizations. You will learn about the database administration (and related data administration) func- tions in Chapter 12.
IMPROVED DATA QUALITY Concern with poor quality data is a common theme in strategic planning and database administration today. In 2011 alone, poor data qual- ity is estimated to have cost the U.S. economy almost $3 trillion, almost twice the size of the federal deficit (http://hollistibbetts.sys-con.com/nod e/1975126). The database approach provides a number of tools and processes to improve data quality. Two of the more important are the following:
1. Database designers can specify integrity constraints that are enforced by the DBMS. A constraint is a rule that cannot be violated by database users. We describe Constraint numerous types of constraints (also called "business rules") in Chapters 2 and 3. A rule that cannot be violated by If a customer places an order, the constraint that ensures that the customer and database users. the order remain associated is called a "relational integrity constraint," and it pre- vents an order from being entered without specifying who placed the order.
2. One of the objectives of a data warehouse environment is to clean up (or "scrub") operational data before they are placed in the data warehouse (Jordan, 1996). Do you ever receive multiple copies of a catalog? The company that sends you three copies of each of its mailings could recognize significant postage and printing savings if its data were scrubbed, and its understanding of its customers would also be enhanced if it could determine a more accurate count of existing customers. You will learn about data warehouses and data integration in Chapter 9 and the potential for improving data quality in Chapter 12.
IMPROVED DATA ACCESSIBILITY AND RESPONSIVENESS With a relational database, end users without programming experience can often retrieve and display data, even when they cross traditional departmental boundaries. For example, an employee can display information about computer desks at Pine Valley Furniture Company with the following query:
SELECT*
FROM Product_T
WHERE ProductDescription = "Computer Desk";
The language used in this query is called Structured Query Language, or SQL. (You will study this language in detail in Chapters 5 and 6.) Although the queries constructed can be 111.uch more complex, the basic structure of the query is easy for even novice, non- prograrnmers to grasp. If they understand the structure and names of the data that fit within their view of the database, they soon gain the ability to retrieve answers to new questions without having to rely on a professional application developer. This can be dangerous; queries should be thoroughly tested to be sure they are returning accurate data before relying on their results, and novices may not understand that challenge.
16 Part I • The Context of Database Managen1ent
REDUCED PROGRAM MAINTENANCE Stored data must be changed frequently for a variety of reasons: New data item types are added, data formats are changed, and so forth. A celebrated example of this problem was the well-known "year 2000" problem, in which common two-digit year fields were extended to four digits to accommodate the rollover from the year 1999 to the year 2000.
In a file processing environment, the data descriptions and the logic for accessing data are built into individual application programs (this is the program-data depen- dence issue described earlier). As a result, changes to data formats and access methods inevitably result in the need to modify application programs. In a database environment, data are more independent of the application programs that use them. Within limits, you can change either the data or the application programs that use the data without necessitating a change in the other factor. As a result, program maintenance can be sig- nificantly reduced in a modern database environment.
IMPROVED DECISION SUPPORT Some databases are designed expressly for decision support applications. For example, some databases are designed to support customer relationship management, whereas others are designed to support financial analysis or supply chain management. You will study how databases are tailored for different decision support applications and analytical styles in Chapters 9 through 11.
Cautions about Database Benefits
The previous section identified 10 major potential benefits of the database approach. However, we must caution you that many organizations have been frustrated in attempting to realize some of these benefits. For example, the goal of data indepen- dence (and, therefore, reduced program maintenance) has proven elusive due to the limitations of older data models and database management software. Fortunately, the relational model and the newer object-oriented model provide a significantly better environment for achieving these benefits. Another reason for failure to achieve the intended benefits is poor organizational p lanning and database implementation; even the best data management software cannot overcome such deficiencies. For this reason, you will learn about the importance of database planning and design throughout this text.
Costs and Risks of the Database Approach
A database is not a silver bullet, and it does not have the magic power of Harry Potter. As with any other business decision, the database approach entails some additional costs and risks that must be recognized and managed when it is implemented (see Table 1-4).
NEW, SPECIALIZED PERSONNEL Frequently, organizations that adopt the database approach need to hire or train individuals to design and implement databases, provide database administration services, and manage a staff of new people. Further, because of the rapid changes in technology, these new people will have to be retrained or upgraded on a regular basis. This personnel increase may be more than offset by other productivity gains, but an organization should recognize the need for these specialized skills, which are required to obtain the most from the potential benefits. You will learn about the staff requirements for database management in Chapter 12.
TABLE 1-4 Costs and Risks of the Database Approach
New, specialized personnel
Installation and management cost and complexity
Conversion costs
Need for explicit backup and recovery
Organizational conflict
1 • The Database Environn1ent and Development Process 17
INSTALLATION AND MANAGEMENT COST AND COMPLEXITY A multi-user database management system is a large and complex suite of software that has a high initial cost, requires a staff of trained personnel to install and operate, and has s ubstantial annual maintenance and support costs. Installing such a system may also require upgrades to the hardware and data communications systems in the organization. Substantial training is normally required on an ongoing basis to keep up with new releases and upgrades. Additional or more sophisticated and costly database soft- ware may be needed to provide security and to ensure proper concurrent updating of shared data.
CONVERSION COSTS The term Iegaci; syste1n is widely used to refer to older applications in an organization that are based on file processing and/ or older database technology. The cost of converting these older systems to modern database technology- measured in terms of dollars, time, and organizational commitment- may often seem prohibi- tive to an organization. The use of data warehouses is one strategy for continuing to use older systems while at the same time exploiting modern database technology and techniques (Ritter, 1999).
NEED FOR EXPLICIT BACKUP AND RECOVERY A shared corporate database must be accurate and available at all times. This requires that comprehensive procedures be developed and used for providing backup copies of data and for restoring a database when damage occurs. These considerations have acquired increased urgency in today's security-conscious environment. A modern database management system normally automates many more of the backup and recovery tasks than a file system. You will learn about procedures for security, backup, and recovery in Chapter 8.
ORGANIZATIONAL CONFLICT A shared database requires a consensus on data defi- nitions and ownership as well as responsibilities for accurate data maintenance. Experience has shown that conflicts on data definitions, data formats and coding, rights to update shared data, and associated issues are frequent and often difficult to resolve. Handling these issues requires organizational commitment to the database approach, organizationally astute database administrators, and a sound evolutionary approach to database development.
If strong top management support of and commitment to the database approach are Jacking, end-user development of stand-alone databases is likely to proliferate. These databases do not follow the general database approach that we have described, and they are unlikely to provide the benefits described earlier. In the extreme, they may lead to a pattern of inferior decision making that threatens the well-being or existence of an organization.
INTEGRATED DATA MANAGEMENT FRAMEWORK
The database approach described above is associated with relational database technologies and used primarily as a foundation for the design and implementation of transaction processing systems (operational syste111s) . Data management technologies are also increasingly often used as infonnational systetns, as a foundation for analytics, or the systematic analysis and interpretation of data to improve our understanding of a real-world domain. Transactional systems are still the core of this book with a focus on relational databases and the SQL language. These technologies continue, in practice, to be a fundamental source of data for all areas of data management, and no other technology is as widely used. They form the foundation on which business activities of modem organizations are built because they enable the way in which organizations interact and do business with their stakeholders.
This book does, however, also cover data management technologies intended primarily for enabling and supporting analytics. They can be divided into two major categories: data warehousing and big data. Data warehousing has existed as a concept since late 1980s, and, as you will learn in Chapter 9, both conceptual approaches and implementation technologies for data warehousing are already well developed and
18 Part I • The Context of Database Managen1ent
FIGURE 1-5 Integrated data management framework Operational Informational
Transactional Analytical- Analytical- Data Warehousing Big Data
Technology Relational Relational Non-relational
Modeling Conceptual data modeling with (E)ER (Chapters 2 and 3)
Logical data modeling with the Design relational model; Normalization
(Chapter 4) Data warehousing Big data technologies,
Physical design of relational and data integration including Hadoop &
Infrastructure databases; Security; Cloud (Chapter 9) NoSOL (Chapter 10)
computing (Chapter 8)
SOL (Chapters 5 and 6) Access
Applications with SOL (Chapter 7)
Data analysis Analytics and its implications (Chapter 11)
Governance Lifecycle (Chapter 1) and data Governance, data quality, and master data management (Chapter 12) management
mature. Indeed, most traditional data warehouses are implemented using the same relational technologies as transactional systems. Big data technologies have emerged as another category of informational systems since the early 2010s. They are charac- terized by their ability to deal with large volurnes of data with a variety of data types arriving to the organizational systems with high velocihJ (the so-called three Vs of big data). A major d ifference between big data systems compared to both data warehousing and operational, transaction-focused systems is that structures of the latter are typically expected to be carefully designed before data are stored in them ("schema on write"), whereas many of the big data analytics technologies are intended to be used in the "schema on read" mode. In the latter approach, the structure of the data and the rela- tionships between the data elements will be determined later, either right before or at the time of the use of the data.