FUNDAMENTALS OF DATABASE
MANAGEMENT SYSTEMS
Second Edition
MARK L. GILLENSON Fogelman College of Business and Economics
University of Memphis
John Wiley & Sons, Inc.
CREDITS
VP & PUBLISHER Don Fowley EDITOR Beth Lang Golub EDITORIAL ASSISTANT Elizabeth Mills MARKETING MANAGER Christopher Ruel DESIGNER James O’Shea SENIOR PRODUCTION MANAGER Janis Soo SENIOR PRODUCTION EDITOR Joyce Poh
This book was set in 10/12 TimesNewRoman by LaserWords and printed and bound by RR Donnelley. The cover was printed by RR Donnelley.
This book is printed on acid free paper.
Founded in 1807, John Wiley & Sons, Inc. has been a valued source of knowledge and understanding for more than 200 years, helping people around the world meet their needs and fulfill their aspirations. Our company is built on a foundation of principles that include responsibility to the communities we serve and where we live and work. In 2008, we launched a Corporate Citizenship Initiative, a global effort to address the environmental, social, economic, and ethical challenges we face in our business. Among the issues we are addressing are carbon impact, paper specifications and procurement, ethical conduct within our business and among our vendors, and community and charitable support. For more information, please visit our website: www.wiley.com/go/citizenship.
Copyright © 2012, 2005 John Wiley & Sons, Inc. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc. 222 Rosewood Drive, Danvers, MA 01923, website www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030-5774, (201)748-6011, fax (201)748-6008, website http://www.wiley.com/go/permissions.
Evaluation copies are provided to qualified academics and professionals for review purposes only, for use in their courses during the next academic year. These copies are licensed and may not be sold or transferred to a third party. Upon completion of the review period, please return the evaluation copy to Wiley. Return instructions and a free of charge return mailing label are available at www.wiley.com/go/returnlabel. If you have chosen to adopt this textbook for use in your course, please accept this book as your complimentary desk copy. Outside of the United States, please contact your local sales representative.
Library of Congress Cataloging-in-Publication Data
Gillenson, Mark L. Fundamentals of database management systems / Mark L. Gillenson.—2nd ed.
p. cm. Includes index. ISBN 978-0-470-62470-8 (pbk.) 1. Database management. I. Title. QA76.9.D3G5225 2011 005.74—dc23
2011039274
Printed in the United States of America 10 9 8 7 6 5 4 3 2 1
http://www.wiley.com/go/citizenship
http://www.copyright.com
http://www.wiley.com/go/permissions
http://www.wiley.com/go/returnlabel
OTHER JOHN WILEY & SONS, INC. DATABASE BOOKS BY MARK L. GILLENSON
Strategic Planning, Systems Analysis, and Database Design (with Robert Goldberg), 1984
DATABASE Step-by-Step 1st edition, 1985 2nd edition, 1990
To my mother Sunny’s memory
and to my favorite mother-in-law, Moo
BRIEF CONTENTS
Preface xiii
About The Author xvii
CHAPTER 1 DATA: THE NEW CORPORATE RESOURCE 1 CHAPTER 2 DATA MODELING 19 CHAPTER 3 THE DATABASE MANAGEMENT SYSTEM CONCEPT 41 CHAPTER 4 RELATIONAL DATA RETRIEVAL: SQL 67 CHAPTER 5 THE RELATIONAL DATABASE MODEL: INTRODUCTION 105 CHAPTER 6 THE RELATIONAL DATABASE MODEL: ADDITIONAL CONCEPTS 137 CHAPTER 7 LOGICAL DATABASE DESIGN 157 CHAPTER 8 PHYSICAL DATABASE DESIGN 199 CHAPTER 9 OBJECT-ORIENTED DATABASE MANAGEMENT 247 CHAPTER 10 DATA ADMINISTRATION, DATABASE ADMINISTRATION, AND DATA
DICTIONARIES 269 CHAPTER 11 DATABASE CONTROL ISSUES: SECURITY, BACKUP AND RECOVERY,
CONCURRENCY 291 CHAPTER 12 CLIENT/SERVER DATABASE AND DISTRIBUTED DATABASE 315 CHAPTER 13 THE DATA WAREHOUSE 335 CHAPTER 14 DATABASES AND THE INTERNET 365
Index 385
CONTENTS
Preface xiii About The Author xvii
CHAPTER 1 DATA: THE NEW CORPORATE RESOURCE 1
Introduction 2 The History of Data 2
The Origins of Data 2 Data Through the Ages 5 Early Data Problems Spawn Calculating Devices 7 Swamped with Data 8 Modern Data Storage Media 9
Data in Today’s Information Systems Environment 12 Using Data for Competitive Advantage 12 Problems in Storing and Accessing Data 12 Data as a Corporate Resource 13 The Database Environment 14
Summary 15
CHAPTER 2 DATA MODELING 19
Introduction 20 Binary Relationships 20
What is a Binary Relationship? 20 Cardinality 23 Modality 24 More About Many-to-Many Relationships 25
Unary Relationships 28 One-to-One Unary Relationship 28 One-to-Many Unary Relationship 29 Many-to-Many Unary Relationship 29
Ternary Relationships 31 Example: The General Hardware Company 31 Example: Good Reading Book Stores 34 Example: World Music Association 35 Example: Lucky Rent-A-Car 36 Summary 37
viii Contents
CHAPTER 3 THE DATABASE MANAGEMENT SYSTEM CONCEPT 41
Introduction 42 Data Before Database Management 43
Records and Files 43 Basic Concepts in Storing and Retrieving Data 46
The Database Concept 48 Data as a Manageable Resource 48 Data Integration and Data Redundancy 49 Multiple Relationships 56 Data Control Issues 58 Data Independence 60
DBMS Approaches 60 Summary 63
CHAPTER 4 RELATIONAL DATA RETRIEVAL: SQL 67
Introduction 68 Data Retrieval with the SQL SELECT Command 68
Introduction to the SQL SELECT Command 68 Basic Functions 70 Built-In Functions 81 Grouping Rows 83 The Join 85 Subqueries 86 A Strategy for Writing SQL SELECT Commands 89
Example: Good Reading Book Stores 90 Example: World Music Association 92 Example: Lucky Rent-A-Car 95 Relational Query Optimizer 97
Relational DBMS Performance 97 Relational Query Optimizer Concepts 97
Summary 99
CHAPTER 5 THE RELATIONAL DATABASE MODEL: INTRODUCTION 105
Introduction 106 The Relational Database Concept 106
Relational Terminology 106 Primary and Candidate Keys 109 Foreign Keys and Binary Relationships 111
Data Retrieval from a Relational Database 124 Extracting Data from a Relation 124 The Relational Select Operator 125 The Relational Project Operator 125 Combination of the Relational Select and Project Operators 126 Extracting Data Across Multiple Relations: Data Integration 127
Example: Good Reading Book Stores 129 Example: World Music Association 130 Example: Lucky Rent-A-Car 132 Summary 132
Contents ix
CHAPTER 6 THE RELATIONAL DATABASE MODEL: ADDITIONAL CONCEPTS 137
Introduction 138 Relational Structures for Unary and Ternary Relationships 139
Unary One-to-Many Relationships 139 Unary Many-to-Many Relationships 143 Ternary Relationships 146
Referential Integrity 150 The Referential Integrity Concept 150 Three Delete Rules 152
Summary 153
CHAPTER 7 LOGICAL DATABASE DESIGN 157
Introduction 158 Converting E-R Diagrams into Relational Tables 158
Introduction 158 Converting a Simple Entity 158 Converting Entities in Binary Relationships 160 Converting Entities in Unary Relationships 164 Converting Entities in Ternary Relationships 166 Designing the General Hardware Co. Database 166 Designing the Good Reading Bookstores Database 170 Designing the World Music Association Database 171 Designing the Lucky Rent-A-Car Database 173
The Data Normalization Process 174 Introduction to the Data Normalization Technique 175 Steps in the Data Normalization Process 177 Example: General Hardware Co. 185 Example: Good Reading Bookstores 186 Example: World Music Association 188 Example: Lucky Rent-A-Car 188
Testing Tables Converted from E-R Diagrams with Data Normalization 189 Building the Data Structure with SQL 191 Manipulating the Data with SQL 192 Summary 193
CHAPTER 8 PHYSICAL DATABASE DESIGN 199
Introduction 200 Disk Storage 202
The Need for Disk Storage 202 How Disk Storage Works 203
File Organizations and Access Methods 207 The Goal: Locating a Record 207 The Index 207 Hashed Files 215
Inputs to Physical Database Design 218 The Tables Produced by the Logical Database Design Process 219 Business Environment Requirements 219 Data Characteristics 219
x Contents
Application Characteristics 220 Operational Requirements: Data Security, Backup, and Recovery 220
Physical Database Design Techniques 221 Adding External Features 221 Reorganizing Stored Data 224 Splitting a Table into Multiple Tables 226 Changing Attributes in a Table 227 Adding Attributes to a Table 228 Combining Tables 230 Adding New Tables 232
Example: Good Reading Book Stores 233 Example: World Music Association 234 Example: Lucky Rent-A-Car 235 Summary 237
CHAPTER 9 OBJECT-ORIENTED DATABASE MANAGEMENT 247
Introduction 248 Terminology 250 Complex Relationships 251
Generalization 251 Inheritance of Attributes 253 Operations, Inheritance of Operations, and Polymorphism 254 Aggregation 255 The General Hardware Co. Class Diagram 256 The Good Reading Bookstores Class Diagram 256 The World Music Association Class Diagram 259 The Lucky Rent-A-Vehicle Class Diagram 260
Encapsulation 260 Abstract Data Types 262 Object/Relational Database 263 Summary 264
CHAPTER 10 DATA ADMINISTRATION, DATABASE ADMINISTRATION, AND DATA DICTIONARIES 269
Introduction 270 The Advantages of Data and Database Administration 271
Data as a Shared Corporate Resource 271 Efficiency in Job Specialization 272 Operational Management of Data 273 Managing Externally Acquired Databases 273 Managing Data in the Decentralized Environment 274
The Responsibilities of Data Administration 274 Data Coordination 274 Data Planning 275 Data Standards 275 Liaison to Systems Analysts and Programmers 276 Training 276 Arbitration of Disputes and Usage Authorization 277 Documentation and Publicity 277
Contents xi
Data’s Competitive Advantage 277 The Responsibilities of Database Administration 278
DBMS Performance Monitoring 278 DBMS Troubleshooting 278 DBMS Usage and Security Monitoring 279 Data Dictionary Operations 279 DBMS Data and Software Maintenance 280 Database Design 280
Data Dictionaries 281 Introduction 281 A Simple Example of Metadata 282 Passive and Active Data Dictionaries 284 Relational DBMS Catalogs 287 Data Repositories 287
Summary 287
CHAPTER 11 DATABASE CONTROL ISSUES: SECURITY, BACKUP AND RECOVERY, CONCURRENCY 291
Introduction 292 Data Security 293
The Importance of Data Security 293 Types of Data Security Breaches 294 Methods of Breaching Data Security 294 Types of Data Security Measures 296
Backup and Recovery 303 The Importance of Backup and Recovery 303 Backup Copies and Journals 303 Forward Recovery 304 Backward Recovery 305 Duplicate or ‘‘Mirrored’’ Databases 306 Disaster Recovery 306
Concurrency Control 308 The Importance of Concurrency Control 308 The Lost Update Problem 308 Locks and Deadlock 309 Versioning 310
Summary 311
CHAPTER 12 CLIENT/SERVER DATABASE AND DISTRIBUTED DATABASE 315
Introduction 316 Client/Server Databases 316 Distributed Database 321
The Distributed Database Concept 321 Concurrency Control in Distributed Databases 325 Distributed Joins 327 Partitioning or Fragmentation 329 Distributed Directory Management 330 Distributed DBMSs: Advantages and Disadvantages 331
Summary 332
xii Contents
CHAPTER 13 THE DATA WAREHOUSE 335
Introduction 336 The Data Warehouse Concept 338
The Data is Subject Oriented 338 The Data is Integrated 339 The Data is Non-Volatile 339 The Data is Time Variant 339 The Data Must Be High Quality 340 The Data May Be Aggregated 340 The Data is Often Denormalized 340 The Data is Not Necessarily Absolutely Current 341
Types of Data Warehouses 341 The Enterprise Data Warehouse (EDW) 342 The Data Mart (DM) 342 Which to Choose: The EDW, the DM, or Both? 342
Designing a Data Warehouse 343 Introduction 343 General Hardware Co. Data Warehouse 344 Good Reading Bookstores Data Warehouse 348 Lucky Rent-A-Car Data Warehouse 350 What About a World Music Association Data Warehouse? 351
Building a Data Warehouse 352 Introduction 352 Data Extraction 352 Data Cleaning 354 Data Transformation 356 Data Loading 356
Using a Data Warehouse 357 On-Line Analytic Processing 357 Data Mining 357
Administering a Data Warehouse 360 Challenges in Data Warehousing 361 Summary 362
CHAPTER 14 DATABASES AND THE INTERNET 365
Introduction 366 Database Connectivity Issues 367 Expanded Set of Data Types 373 Database Control Issues 374
Performance 374 Availability 375 Scalability 376 Security and Privacy 376
Data Extraction into XML 379 Summary 381
INDEX 385
PREFACE
PURPOSE OF THIS BOOK
A course in database management has become well established as a required course in both undergraduate and graduate management information systems degree programs. This is as it should be, considering the central position of the database field in the information systems environment. Indeed, a solid understanding of the fundamentals of database management is crucial for success in the information systems field. An IS professional should be able to talk to the users in a business setting, ask the right questions about the nature of their entities, their attributes, and the relationships among them, and quickly decide whether their existing data and database designs are properly structured or not. An IS professional should be able to design new databases with confidence that they will serve their owners and users well. An IS professional should be able to guide a company in the best use of the various database-related technologies.
Over the years, at the same time that database management has increased in importance, it has also increased tremendously in breadth. In addition to such fundamental topics as data modeling, relational database concepts, logical and physical database design, and SQL, a basic set of database topics today includes object-oriented databases, data administration, data security, distributed databases, data warehousing, and Web databases, among others. The dilemma faced by database instructors and by database books is to cover as much of this material as is reasonably possible so that students will come away with a solid background in the fundamentals without being overwhelmed by the tremendous breadth and depth of the field. Exposure to too much material in too short a time at the expense of developing a sound foundation is of no value to anyone. We believe that a one-semester course in database management should provide a firm grounding in the fundamentals of databases and provide a solid survey of the major database subfields, while deliberately not being encyclopedic in its coverage. With these goals in mind, this book:
■ Is designed to be a carefully and clearly written, friendly, narrative introduction to the subject of database management that can reasonably be completed in a one-semester course.
■ Provides a clear exposition of the fundamentals of database management while at the same time presentng a broad survey of all of the major topics of the field.
xiv Preface
It is an applied book of important basic concepts and practical material that can be used immediately in business.
■ Makes extensive use of examples. Four major examples are used throughout the text where appropriate, plus two minicases that are included among the chapter exercises at the end of every chapter. Having multiple examples solidifies the material and helps the student not miss the point because of the peculiarities of a particular example.
■ Starts with the basics of data and file structures and then builds up in a progressive, step-by-step way through the distinguishing characteristics of database.
■ Has a story and accompanying photograph of a real company’s real use of database management at the beginning of every chapter. This is both for motivational purposes and to give the book a more practical, real-world feel.
■ Includes a chapter on SQL that concentrates on the data-retrieval aspect and applies to essentially every relational database product on the market.
NEW IN THE SECOND EDITION
It is important to reflect advances in the database management systems environment in this book as the world of information systems continues to progress. Furthermore, we want to continue adding materials for the benefit of the students who use this book. Thus we have made the following changes to the second edition.
■ A ‘‘mobile chapter’’ on data retrieval with SQL that can be covered early in the book, where it appears as Chapter 4, or later in the book after the chapters on database design. This is introduced in response to a large reviewer survey that indicated a roughly 50–50 split between instructors who like to introduce data retrieval with SQL early in their courses to engage their students in hands-on exercises as soon as possible to pique their interest and instructors who feel that data retrieval with SQL should come after database design.
■ Internet-accessible databases that match the four main examples running through the book’s chapters for hands-on student practice in data retrieval with SQL, plus additional hands-on material.
■ The conversion of the book’s entity-relationship diagrams to today’s standard practice format that is compatible with MS Visio, among other software tools.
■ The addition of examples for creating and updating databases using SQL. ■ The addition of ‘‘It’s Your Turn’’ exercises and the new formatting of the
‘‘Concepts in Action’’ real example vignettes. ■ The merging of the material about disk devices and access methods and file
organizations into the chapter on physical database design, to create a complete package on this subject in one chapter.
ORGANIZATION OF THIS BOOK
The book effectively divides into two halves. After the introduction in Chapter 1, Chapters 2 lays the foundation of data modeling. Chapter 3 describes the fundamental concepts of databases and contrasts them with ordinary files. Importantly, this is done separately from and prior to the discussion of relational databases. Chapter 4 is the ‘‘mobile chapter’’ on data retrieval with SQL that can be covered as Chapter 4
Preface xv
or can be covered after the chapters on database design. Chapters 5 and 6 explain the major concepts of relational databases. In turn, this is done separately from and prior to the discussion of logical database design in Chapter 7 and physical database design (yes, a whole chapter on this subject) in Chapter 8. Separating out general database concepts from relational database concepts from relational database design serves to bring the student along gradually and deliberately with the goal of a solid understanding at the end.
Then, in the second half of the book, each chapter describes one or more of the major database subfields. These latter chapters are generally independent and for the most part can be approached in any order. They include Chapter 9 on object- oriented database, Chapter 10 on data administration, database administration, and data dictionaries, Chapter 11 on security, backup and recovery, and concurrency, Chapter 12 on client/server database and distributed database, Chapter 13 on the data warehouse, and Chapter 14 on database and the Internet.
SUPPLEMENTS (www.wiley.com/college/gillenson)
The Web site includes several resources designed to aid the learning process:
■ PowerPoint slides for each chapter that instructors can use as is or tailor as they wish and that students can use both to take notes on in the classroom and to help in studying at home.
■ Quizzes for each chapter that students can take on their own to test their knowledge.
■ For instructors: The Instructors’ Manual, written by the author. For each chapter it includes a guide to presenting the chapter, discussion stimulation points, and answers to every question, exercise, and minicase at the end of each chapter.
■ For instructors: The Test Bank, written by the author. Questions are organized by chapter and are designed to test the level of understanding of the chapter’s concepts, as well as such basic knowledge as the definitions of key terms presented in the chapter.
Database Software
Now available to educational institutions adopting this Wiley textbook is a free 3-year membership to the MSDN Academic Alliance. The MSDN AA is designed to provide the easiest and most inexpensive way for academic departments to make the latest Microsoft software available in labs, classrooms, and on student and instructor PCs.
Database software, including Access and SQL Server, is available through this Wiley and Microsoft publishing partnership, free of charge with the adoption of Gillenson’s textbook. (Note that schools that have already taken advantage of this opportunity through Wiley are not eligible again, and Wiley cannot offer free membership renewals.) Each copy of the software is the full version with no time limitation, and can be used indefinitely for educational purposes. Contact your Wiley sales representative for details. For more information about the MSDN AA program, go to http://msdn.microsoft.com/academic.
xvi Preface
ACKNOWLEDGMENTS
I would like to thank the reviewers of the manuscript for their time, their efforts, and their insightful comments:
Paul Bergstein University of Massachusetts Dartmouth
Susan Bickford Tallahassee Community College
Jim Q. Chen St. Cloud State University
Shamsul Chowdhury Roosevelt University
Deloy Cole Greenville College
Terrence Fries Indiana University of Pennsylvania
Dick Grant Seminole Community College
Betsy Headrick Chattanooga State Community College
Shamim Khan Columbus State University
Barbara Klein University of Michigan—Dearborn
Karl Konsdorf Sinclair Community College
Yunkai Liu Gannon University
Margaret McClintock Mississippi University for Women
Thomas Mertz Kansas State University
Keith R. Nelms Piedmont College
Bob Nielson Dixie State College
Rachida F. Parks Pennsylvania State University
Lara Preiser-Houy California State University Pomona
Il-Yeol Song Drexel University
Brian West Univeristy of Louisiana at Lafayette
R. Alan Whitehurst Southern Virginia University
Diana Wolfe Oklahoma State University at Oklahoma City
Hong Zhou Saint Joseph College
In addition, I would like to acknowledge and thank several people who read and provided helpful comments on specific chapters and portions of the manuscript: Mark Cooper of FedEx Corp., Satish Puranam of the University of Memphis, David Tegarden of Virginia Tech, and Trent Sanders.
I would also like to thank the people and companies who agreed to participate in the Concepts in Action vignettes that appear at the beginning of each chapter and, in some cases, which appear later in the chapters. I strongly believe that business students should not have to study subjects like database management in a vacuum. Rather, they should be regularly reminded of the real ways in which real companies put these concepts and techniques to use. Whether the products involved are power tools, auto parts, toys, or books, it is important always to remember that database management supports businesses in which millions and billions of dollars are at stake every year. Thus, the people and companies who participated in these vignettes have significantly added to the educational experience that the students using this book.
Finally, I would like to thank the crew at John Wiley & Sons for their continuous support and professionalism, in particular Rachael Leblond, my editor for this edition of the book, and Beth Lang Golub, my long-time editor and friend, and her excellent staff.
Mark L. Gillenson Memphis, TN
April 2011
ABOUT THE AUTHOR
Dr. Mark L. Gillenson has been practicing, researching, teaching, writing, and, most importantly, thinking, about data and database management for over 35 years, split between working for the IBM Corporation and being a professor in the academic world. While working for IBM he designed databases for IBM’s corporate headquarters, consulted on database issues for some of IBM’s largest customers, taught database management at the prestigious IBM Systems Research Institute in New York, and conducted database seminars throughout the United States and on four continents. In one such seminar, he taught introduction to database to an IBM development group that went on to develop one of IBM’s first relational database management system products, SQL/DS.
Dr. Gillenson conducted some of the earliest studies on data and database administration and has written extensively about that subject as well as about database design. He is an associate editor of the Journal of Database Management, with which he has been associated since its inception. This is his third book on database management, all published by John Wiley & Sons, Inc. Dr. Gillenson is currently a professor of MIS in the Fogelman College of Business and Economics of The University of Memphis. His degrees are from Rensselaer Polytechnic Institute and The Ohio State University.
Oh, and speaking of interesting kinds of data, as a graduate student Dr. Gillenson invented the world’s first computerized facial compositor and codeveloped an early computer graphics system that, among other things, was used to produce some of the special effects in the first Star Wars movie.
C H A P T E R 1
DATA: THE NEW CORPORATE RESOURCE
T he development of database management systems, as well as the development of modern computers, came about as a result of society’s recognition of the crucial
importance of storing, managing, and retrieving its rapidly expanding volumes of business data. To understand how far we have come in this regard, it is important to know where we began and how the concept of managing data has developed. This chapter begins with the historical background of the storage and uses of data and then continues with a discussion of the importance of data to the modern corporation.
OBJECTIVES
■ Explain why humankind’s interest in data dates back to ancient times. ■ Describe how data needs have historically driven many information technology
developments. ■ Describe the evolution of data storage media during the last century. ■ Relate the idea of data as a corporate resource that can be used to gain a
competitive advantage to the development of the database management systems environment.
CHAPTER OUTLINE
Introduction The History of Data
The Origins of Data Data Through the Ages Early Data Problems Spawn
Calculating Devices Swamped with Data Modern Data Storage Media
Data in Today’s Information Systems Environment
Using Data for Competitive Advantage
Problems in Storing and Accessing Data
Data as a Corporate Resource The Database Environment
Summary
2 C h a p t e r 1 Data: The New Corporate Resource
INTRODUCTION
What a fascinating world we live in today! Technological advances are all around us in virtually every aspect of our daily lives. From cellular telephones to satellite television to advanced aircraft to modern medicine to computers—especially computers—high tech is with us wherever we look. Businesses of every description and size rely on computers and the information systems they support to a degree that would have been unimaginable just a few short years ago. Businesses routinely use automated manufacturing and inventory-control techniques, automated financial transaction procedures, and high-tech marketing tools. As consumers, we take for granted being able to call our banks, insurance companies, and department stores to instantly get up-to-the-minute information on our accounts. And everyone, businesses and consumers alike, has come to rely on the Internet for instant worldwide communications. Beneath the surface, the foundation for all of this activity is data: the stored facts that we need to manage all of our human endeavors.
This book is about data. It’s about how to think about data in a highly organized and deliberate way. It’s about how to store data efficiently and how to retrieve it effectively. It’s about ways of managing data so that the exact data that we need will be there when we need it. It’s about the concept of assembling data into a highly organized collection called a ‘‘database’’ and about the sophisticated software known as a ‘‘database management system’’ that controls the database and oversees the database environment. It’s about the various approaches people have taken to database management and about the roles people have assumed in the database environment. We will see many real-world examples of data usage throughout this book.
Computers came into existence because we needed help in processing and using the massive amounts of data we have been accumulating. Is the converse true? Could data exist without computers? The answer to this question is a resounding ‘‘yes.’’ In fact, data has existed for thousands of years in some very interesting, if by today’s standards crude, forms. Furthermore, some very key points in the history of the development of computing devices were driven, not by any inspiration about computing for computing’s sake, but by a real need to efficiently handle a pesky data management problem. Let’s begin by tracing some of these historical milestones in the evolution of data and data management.
THE HISTORY OF DATA
The Origins of Data
What is data? To start, what is a single piece of data? A single piece of data is a single fact about something we are interested in. Think about the world around you, about your environment. In any environment there are things that are important to you and there are facts about those things that are worth remembering. A ‘‘thing’’ can be an obvious object like an automobile or a piece of furniture. But the concept of an object is broad enough to include a person, an organization like a company, or an event that took place such as a particular meeting. A fact can be any characteristic of an object. In a university environment it may be the fact that student Gloria Thomas has completed 96 credits; or it may be the fact that Professor Howard Gold graduated from Ohio State University; or it may be the fact that English 349 is being
The History of Data 3
CONCEPTS
IN ACT ION
1-A AMAZON.COM
When one thinks of online shopping, one of the first companies that comes to mind is certainly Amazon.com. This highly innovative company, based in Seattle, WA, was one of the first online stores and has consistently been one of the most successful. Amazon.com seeks to be the world’s most customer-centric company, where customers can find and discover anything they might want to buy online. Amazon.com and its sellers list millions of unique new and used items in categories such as electronics, computers, kitchen products and house- wares, books, music, DVDs, videos, camera and photo items, toys, baby and baby registry, software, computer and video games, cell phones and service, tools and hardware, travel services, magazine subscriptions, and outdoor living products. Through Amazon Marketplace, zShops and Auctions, any business or individual can sell virtually anything to Amazon.com’s millions of customers. Demonstrating the reach of the Internet, Amazon.com has sold to people in over 220 countries.
‘‘Photo Courtesy of Amazon.com’’
Initially implemented in 1995 and continually improved ever since, Amazon.com’s ‘‘order pipeline’’ is a very sophisticated, information-intensive system that accepts, processes, and fulfills customer orders. When someone visits Amazon.com’s Web site, its system tries to enhance the shopping experience by offering the customer products on a personalized basis, based on past buying patterns. Once an order is placed, the system validates the customer’s credit-card information and sends the customer an email order confirmation. It then goes through a process of determining how best to fulfill the order, including deciding which of several fulfillment sites from which to ship the goods. When the order is shipped, the system emails the customer a shipping confirmation. Throughout the entire process, the system keeps track of the current status of every order at any point in time.
Amazon.com’s order pipeline system is totally built on relational database technology. Most of it uses Oracle running on Hewlett Packard Unix systems. In order to
4 C h a p t e r 1 Data: The New Corporate Resource
achieve high degrees of scalability and availability, the system is organized around the concept of distributed databases, including replicated data that is updated simultaneously at several domestic and international locations. The system is integrated with the Oracle Finan- cials enterprise resource planning (ERP) system and the transactional data is shared with the company’s account- ing and finance functions. In addition, Amazon.com has built a multiterabyte data warehouse that imports its transactional data and creates a decision support system with a menu-based facility system of its own design.
Programs utilizing the data warehouse send personally targeted promotional mailers to the company’s customers.
Amazon.com’s database includes hundreds of individual tables. Among these are catalog tables listing its millions of individual books and other products, acustomer table with millions of records, personalization tables, promotional tables, shopping-cart tables that handle the actual purchase transactions, and order-history tables. An order processing subsystem that determines which fulfillment center to ship goods from uses tables that keep track of product inventory levels in these centers.
held in Room 830 of Alumni Hall. In a commercial environment, it may be the fact that employee John Baker’s employee number is 137; or it may be the fact that one of a company’s suppliers, the Superior Products Co., is located in Chicago; or it may be the fact that the refrigerator with serial number 958304 was manufactured on November 5, 2004.
Actually, people have been interested in data for at least the past 12,000 years. While today we often associate the concept of data with the computer, historically there have been many more primitive methods of data storage and handling.
In the ancient Middle East, shepherds kept track of their flocks with pebbles, Figure 1.1. As each sheep left its pen to graze, the shepherd placed one pebble in a small sack. When all of the sheep had left, the shepherd had a record of how many sheep were out grazing. When the sheep returned, the shepherd discarded one pebble for each animal, and if there were more pebbles than sheep, he knew that some of his sheep still hadn’t returned or were missing. This is, indeed, a primitive but legitimate example of data storage and retrieval. What is important to realize about this example is that the count of the number of sheep going out and coming back in was all that the shepherd cared about in his ‘‘business environment’’ and that his primitive data storage and retrieval system satisfied his needs.
Excavations in the Zagros region of Iran, dated to 8500 B.C., have unearthed clay tokens or counters that we think were used for record keeping in primitive
F IGURE 1.1 Shepherd using pebbles to keep track of sheep
The History of Data 5
FIGURE 1.2 Ancient clay tokens used to record goods in transit
forms of accounting. Such tokens have been found at sites from present-day Turkey to Pakistan and as far afield as the present-day Khartoum in Sudan, dating as long ago as 7000 B.C. By 3000 B.C., in the present-day city of Susa in Iran, the use of such tokens had reached a greater level of sophistication. Tokens with special markings on them, Figure 1.2, were sealed in hollow clay vessels that accompanied commercial goods in transit. These primitive bills of lading certified the contents of the shipments. The tokens represented the quantity of goods being shipped and, obviously, could not be tampered with without the clay vessel being broken open. Inscriptions on the outside of the vessels and the seals of the parties involved provided a further record. The external inscriptions included such words or concepts as ‘‘deposited,’’ ‘‘transferred,’’ and ‘‘removed.’’
At about the same time that the Susa culture existed, people in the city-state of Uruk in Sumeria kept records in clay texts. With pictographs, numerals, and ideographs, they described land sales and business transactions involving bread, beer, sheep, cattle, and clothing. Other Neolithic means of record keeping included storing tallies as cuts and notches in wooden sticks and as knots in rope. The former continued in use in England as late as the medieval period; South American Indians used the latter.
Data Through the Ages
As in Susa and Uruk, much of thevery early interest in data can be traced to the rise of cities. Simple subsistence hunting, gathering, and, later, farming had only limited use for the concept of data. But when people live in cities they tend to specialize in the goods and services they produce. They become dependent on one another, bartering and using money to trade these goods and services for mutual survival. This trade encouraged record keeping—the recording of data—to track how much somone has produced and what it can be bartered or sold for.
6 C h a p t e r 1 Data: The New Corporate Resource
F IGURE 1.3 New types of data with the advance of civilization
BILL OF LADING
MARCH 2005
1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31
S M T W T F S
Family Tree
As time went on, more and different kinds of data and records were kept. These included calendars, census data, surveys, land ownership records, marriage records, records of church contributions, and family trees, Figure 1.3. Increasingly sophisticated merchants had to keep track of inventories, shipments, and wage payments in addition to production data. Also, as farming went beyond the subsistence level and progressed to the feudal manor stage, there was a need to keep data on the amount of produce to consume, to barter with, and to keep as seed for the following year.
The Crusades took place from the late eleventh to the late thirteenth centuries. One side effect of the Crusades was a broader view of the world on the part of the Europeans, with an accompanying increase in interest in trade. A common method of trade in that era was the establishment of temporary partnerships among merchants, ships captains, and owners to facilitate commercial voyages. This increased level of commercial sophistication brought with it another round of increasingly complex record keeping, specifically, double-entry bookkeeping.
Double-entry bookkeeping originated in the trading centers of fourteenth- century Italy. The earliest known example, from a merchant in Genoa, dates to the year 1340. Its use gradually spread, but it was not until 1494, in Venice (about 25 years after Venice’s first movable type printing press came into use), that a Franciscan monk named Luca Pacioli published his ‘‘Summa de Arithmetica, Geometrica, Proportioni et Proportionalita’’ a work important in spreading the use of double-entry bookkeeping. Of course, as a separate issue, the increasing use of paper and the printing press furthered the advance of record keeping as well.
As the dominance of the Italian merchants declined, other countries became more active in trade and thus in data and record keeping. Furthermore, as the use of temporary trading partnerships declined and more stable long-term mercantile organizations were established, other types of data became necessary. For example, annual as opposed to venture-by-venture statements of profit and loss were needed. In 1673 the ‘‘Code of Commerce’’ in France required every businessman to draw up a balance sheet every two years. Thus the data had to be periodically accumulated for reporting purposes.
The History of Data 7
Early Data Problems Spawn Calculating Devices
It was also in the seventeenth century that data began to prompt people to take an interest in devices that could ‘‘automatically’’ process their data, if only in a rudimentary way. Blaise Pascal produced one of the earliest and best known such devices in France in the 1640s, reputedly to help his father track the data associated with his job as a tax collector, Figure 1.4. This was a small box containing interlocking gears that was capable of doing addition and subtraction. In fact, it was the forerunner of today’s mechanical automobile odometers.
In 1805, Joseph Marie Jacquard of France invented a device that automatically reproduced patterns used in textile weaving. The heart of the device was a series of cards with holes punched in them; the holes allowed strands of material to be interwoven in a sequence that produced the desired pattern, Figure 1.5. While Jacquard’s loom wasn’t a calculating device as such, his method of storing fabric patterns, a form of graphic data, as holes in punched cards was a very clever means of data storage that would have great importance for computing devices to follow. Charles Babbage, a nineteenth-century English mathematician and inventor, picked up Jacquard’s concept of storing data in punched cards. Beginning in 1833, Babbage began to think about an invention that he called the ‘‘Analytical Engine.’’ Although he never completed it (the state of the art of machinery was not developed enough), included in its design were many of the principles of modern computers. The Analytical Engine was to consist of a ‘‘store’’ for holding data items and a ‘‘mill’’ for operating upon them. Babbage was very impressed by Jacquard’s work with punched cards. In fact, the Analytical Engine was to be able to store calculation instructions in punched cards. These would be fed into the machine together with punched cards containing data, would operate on that data, and would produce the desired result.
F IGURE 1.4 Blaise Pascal and his adding machine Photo courtesy of IBM Archives
8 C h a p t e r 1 Data: The New Corporate Resource
F IGURE 1.5 The Jacquard loom recorded patterns in punched-cards Photo courtesy of IBM Archives
Swamped with Data
In the late 1800s, an enormous (for that time) data storage and retrieval problem and greatly improved machining technology ushered in the era of modern information processing. The 1880 U.S. Census took about seven years to compile by hand. With a rapidly expanding population fueled by massive immigration, it was estimated that with the same manual techniques, the compilation of the 1890 census would not be completed until after the 1900 census data had begun to be collected. The solution to processing census data was provided by a government engineer named Herman Hollerith. Basing his work on Jacquard’s punched-card concept, he arranged to have the census data stored in punched cards. He built devices to punch the holes into cards and devices to sort the cards, Figure 1.6. Wire brushes touching the cards completed circuits when they came across the holes and advanced counters. The equipment came to be classified as ‘‘electromechanical,’’ ‘‘electro’’ because it was powered by electricity and ‘‘mechanical’’ because the electricity powered mechanical counters that tabulated the data. By using Hollerith’s equipment, the total population count of the 1890 census was completed a month after all the data was in. The complete set of tabulations, including data on questions that had never before even been practical to ask, took two years to complete. In 1896, Hollerith formed the Tabulating Machine Company to produce and commercially market his devices. That company, combined with several others, eventually formed what is today the International Business Machines Corporation (IBM).
Towards the turn of the century, immigrants kept coming and the U.S. population kept expanding. The Census Bureau, while using Hollerith’s equipment, continued experimenting on its own to produce even more advanced data-tabulating machinery. One of its engineers, James Powers, developed devices to automatically feed cards into the equipment and automatically print results. In 1911 he formed the Powers Tabulating Machine Company, which eventually formed the basis for the
The History of Data 9
FIGURE 1.6 Herman Hollerith and his tabulator/sorter, circa 1890
UNIVAC division of the Sperry Corporation, which eventually became the Unisys Corporation.
From the days of Hollerith and Powers through the 1940s, commercial data processing was performed on a variety of electromechanical punched-card-based devices. They included calculators, punches, sorters, collators, and printers. The data was stored in punched cards, while the processing instructions were implemented as collections of wires plugged into specially designed boards that in turn were inserted into slots in the electromechanical devices. Indeed, electromechanical equipment overlapped with electronic computers, which were introduced commercially in the mid-1950s.
In fact, the introduction of electronic computers in the mid-1950s coincided with a tremendous boom in economic development that raised the level of data storage and retrieval requirements another notch. This was a time of rapid commercial growth in the post-World War II U.S.A. as well as the rebuilding of Europe and the Far East. From this time onward, the furious pace of new data storage and retrieval requirements with more and more commercial functions and procedures were automated and the technological advances in computing devices has been one big blur. From this point on, it would be virtually impossible to tie advances in computing devices to specific, landmark data storage and retrieval needs. And there is no need to try to do so.
Modern Data Storage Media
Paralleling the growth of equipment to process data was the development of new media on which to store the data. The earliest form of modern data storage was punched paper tape, which was introduced in the 1870s and 1880s in conjunction with early teletype equipment. Of course we’ve already seen that Hollerith in the 1890s and Powers in the early 1900s used punched cards as a storage medium. In
10 C h a p t e r 1 Data: The New Corporate Resource
Y O U R
T U R N
1.1 THE DeVELOPMENT OF DATA
The need to organize and store data has arisen many times and in many ways throughout history. In addition to the data-focused events presented in this chapter, what other historical events can you think of that have made people think about organizing and storing data? As a hint, you might think about the exploration and conquest of new lands, wars, changes in type of governments such as the introduction of democracy, and
the implications of new inventions such as trains, printing presses, and electricity.
QUESTION: Develop a timeline showing several historical events that
influenced the need to organize and store data. Include a few noted in this chapter as well as a few that you can think of independently.
fact, punched cards were the only data storage medium used in the increasingly sophisticated electromechanical accounting machines of the 1920s, 1930s, and 1940s.They were still used extensively in the early computers of the 1950s and 1960s and could even be found well into the 1970s in smaller information systems installations, to a progressively reduced degree.
The middle to late 1930s saw the beginning of the era of erasable magnetic storage media, with Bell Laboratories experimenting with magnetic tape for sound storage. By the late 1940s, there was early work on the use of magnetic tape for recording data. By 1950, several companies, including RCA and Raytheon, were developing the magnetic tape concept for commercial use. Both UNIVAC and Raytheon offered commercially available magnetic tape units in 1952, followed by IBM in 1953, Figure 1.7. During the mid-1950s and into the mid-1960s, magnetic
F IGURE 1.7 Early magnetic tape drive, circa 1953
The History of Data 11
tape gradually became the dominant data-storage medium in computers. Magnetic tape technology has been continually improved since then and is still in limited use today, particularly for archived data.
The original concept that eventually grew into the magnetic disk actually began to be developed at MIT in the late 1930s and early 1940s. By the early 1950s, several companies including UNIVAC, IBM, and Control Data had developed prototypes of magnetic ‘‘drums’’ that were the forerunners of magnetic disk technology. In 1953, IBM began work on its 305 RAMAC (Random Access Memory Accounting Machine) fixed disk storage device. By 1954 there was a multi-platter version, which became commercially available in 1956, Figure 1.8.
During the mid-1960s a massive conversion from tape to magnetic disk as the preeminent data storage medium began and disk storage is still the data storage medium of choice today. After the early fixed disks, the disk storage environment became geared towards the removable disk-pack philosophy, with a dozen or more packs being juggled on and off a single drive as a common ratio. But, with the increasingly tighter environmental controls that fixed disks permitted, more data per square inch (or square centimeter) could be stored on fixed disk devices. Eventually, the disk drives on mainframes and servers, as well as the fixed disks or ‘‘hard drives’’ of PCs, all became non-removable, sealed units. But the removable disk concept stayed with us a while in the form of PC diskettes and the Iomega Corp.’s Zip Disks, and today in the form of so-called external hard drives that can be easily moved from one computer to another simply by plugging them into a USB port. These have been joined by the laser-based, optical technology compact disk (CD), introduced as a data storage medium in 1985. Originally, data could be recorded on these CDs only at the factory and once created, they were non-erasable. Now, data can be recorded on them, erased, and re-recorded in a standard PC. Finally, solid-state technology has become so miniaturized and inexpensive that a popular option for removable media today is the flash drive.
F IGURE 1.8 IBM RAMAC disk storage device, circa 1956
12 C h a p t e r 1 Data: The New Corporate Resource
DATA IN TODAY’S INFORMATION SYSTEMS ENVIRONMENT
Using Data for Competitive Advantage
Today’s computers are technological marvels. Their speeds, compactness, ease of use, price as related to capability, and, yes, their data storage capacities are truly amazing. And yet, our fundamental interest in computers is the same as that of the ancient Middle-Eastern shepherds in their pebbles and sacks: they are the vehicles we need to store and utilize the data that is important to us in our environment.
Indeed, data has become indispensable in every kind of modern business and government organization. Data, the applications that process the data, and the computers on which the applications run are fundamental to every aspect of every kind of endeavor. When speaking of corporate resources, people used to list such items as capital, plant and equipment, inventory, personnel, and patents. Today, any such list of corporate resources must include the corporation’s data. It has even been suggested that data is the most important corporate resource because it describes all of the others.
Data can provide a crucial competitive advantage for a company. We routinely speak of data and the information derived from it as competitive weapons in hotly contested industries. For example, FedEx had a significant competitive advantage when it first provided access to its package tracking data on its Web site. Then, once one company in an industry develops a new application that takes advantage of its data, the other companies in the industry are forced to match it to remain competitive. This cycle continually moves the use of data to ever-higher levels, making it an ever more important corporate resource than before. Examples of this abound. Banks give their customers online access to their accounts. Package shipping companies provide up-to-the-minute information on the whereabouts of a package. Retailers send manufacturers product sales data that the manufacturers use to adjust inventories and production cycles. Manufacturers automatically send their parts suppliers inventory data and expect the suppliers to use the data to keep a steady stream of parts flowing.
Problems in Storing and Accessing Data
But being able to store and provide efficient access to a company’s data while also maintaining its accuracy so that it can be used to competitive advantage is anything
Y O U R
T U R N
1.2 DATA AS A COMPETITIVE WEAPON
Think about a company with which you or your family regularly does business. This might be a supermarket, a department store, or a pharmacy, as examples. What kind of data do you think they collect about their suppliers, their inventory, their sales, and their customers? What kind of data do you think they should collect and how do you think they might be able to use it to gain a competitive advantage?
QUESTION: Choose one of the companies that you or your family
does business with and develop a plan for the kinds of data it might collect and the ways in which it might use the data to gain a business advantage over its competitors.
Data in Today’s Information Systems Environment 13
but simple. In fact, several factors make it a major challenge. First and foremost, the volume or amount of data that companies have is massive and growing all the time. Walmart estimates that its data warehouse (a type of database we will explore later) alone contains hundreds of terabytes (trillions of characters) of data and is constantly growing. The number of people who want access to the data is also growing: at one time, only a select group of a company’s own employees were concerned with retrieving its data, but this has changed. Now, not only do vastly more of a company’s employees demand access to the company’s data but also so do the company’s customers and trading partners. All major banks today give their depositors Internet access to their accounts. Increasingly tightly linked ‘‘supply chains’’ require that companies provide other companies, such as their suppliers and customers, with access to their data. The combination of huge volumes of data and large numbers of people demanding access to it has created a major performance challenge. How do you sift through so much data for so many people and give them the data that they want in an acceptably small amount of time? How much patience would you have with an insurance company that kept you on the phone for five or ten minutes while it retrieved claim data about which you had a question? Of course, the tremendous advances in computer hardware, including data storage hardware, have helped—indeed, it would have been impossible to have gone as far as we have in information systems without them. But as the hardware continues to improve, the volumes of data and the number of people who want access to it also increase, making it a continuing struggle to provide them with acceptable response times.
Other factors that enter into data storage and retrieval include data security, data privacy, and backup and recovery. Data security involves a company protecting its data from theft, malicious destruction, deliberate attempts to make phony changes to the data (e.g. someone trying to increase his own bank account balance), and even accidental damage by the company’s own employees. Data privacy implies assuring that even employees who normally have access to the company’s data (much less outsiders) are given access only to the specific data they need in their work. Put another way, sensitive data such as employee salary data and personal customer data should be accessible only by employees whose job functions require it. Backup and recovery means the ability to reconstruct data if it is lost or corrupted, say in a hardware failure. The extreme case of backup and recovery is known as disaster recovery when an information system is destroyed by fire, a hurricane, or other calamity.
Another whole dimension involves maintaining the accuracy of a company’s data. Historically, and in many cases even today, the same data is stored several, sometimes many, times within a company’s information system. Why does this happen? For several reasons. Many companies are simply not organized to share data among multiple applications. Every time a new application is written, new data files are created to store its data. As recently as the early 1990s, I spoke to a database administration manager (more on this type of position later) in the securities industry who told me that one of the reasons he was hired was to reduce duplicate data appearing in as many as 60–70 files! Furthermore, depending on how database files are designed, data can even be duplicated within a single file. We will explore this issue much more in this book, but for now, suffice it to say that duplicate data, either in multiple files or in a single file, can cause major data accuracy problems.
Data as a Corporate Resource
Every corporate resource must be carefully managed so that the company can keep track of it, protect it, and distribute it to those people and purposes in the
14 C h a p t e r 1 Data: The New Corporate Resource
company that need it. Furthermore, public companies have a responsibility to their shareholders to competently manage the company’s assets. Can you imagine a company’s money just sort of out there somewhere without being carefully managed? In fact, the chief financial officer with a staff of accountants and financial professionals is responsible for the money, with outside accounting firms providing independent audits of it. Typically vice presidents of personnel and their staffs are responsible for the administrative functions necessary to manage employee affairs. Production managers at various levels are responsible for parts inventories, and so on. Data is no exception.
But data may just be the most difficult corporate resource to manage. In data, we have a resource of tremendous volume, billions, trillions, and more individual pieces of data, each piece of which is different from the next. And it has the characteristic that much of it is in a state of change at any one time. It’s not as if we’re talking about managing a company’s employees. Even the largest companies have only a few hundred thousand of them, and they don’t change all that frequently. Or the money a company has: sure, there is a lot of it, but it’s all the same in the sense that a dollar that goes to payroll is the same kind of dollar that goes to paying a supplier for raw materials.
As far back as the early to mid-1960s, barely ten years after the introduction of commercially viable electronic computers, some forward-looking companies began to realize that storing each application’s data separately, in simple files, was becoming problematic and would not work in the long run, for just the reasons that we’ve talked about: the increasing volumes of data (even way back then), the increasing demand for data access, the need for data security, privacy, backup, and recovery, and the desire to share data and cut down on data redundancy. Several things were becoming clear. The task was going to require both a new kind of software to help manage the data and progressively faster hardware to keep up with the increasing volumes of data and data access demands. And data-management specialists would have to be developed, educated, and made responsible for managing the data as a corporate resource.
Out of this need was born a new kind of software, the database management system (DBMS), and a new category of personnel, with titles like database administrator and data management specialist. And yes, hardware has progressively gotten faster and cheaper for the performance it provides. The integration of these advances adds up to much more than the simple sum of their parts. They add up to the database environment.
The Database Environment
Back in the early 1960s, the emphasis in what was then called data processing was on programming. Data was little more than a necessary afterthought in the application development process and in running the data-processing installation. There was a good reason for this. By today’s standards, the rudimentary computers of the time had very small main memories and very simplistic operating systems. Even relatively basic application programs had to be shoehorned into main memory using low-level programming techniques and a lot of cleverness. But then, as we progressed further into the 1960s and beyond, two things happened simultaneously that made this picture change forever. One was that main memories became progressively larger and cheaper and operating systems became much more powerful. Plus, computers
Summary 15
progressively became faster and cheaper on a price/performance basis. All these changes had the effect of permitting the use of higher-level programming languages that were easier for a larger number of personnel to use, allowing at least some of the emphasis to shift elsewhere. Well, nature hates a vacuum, and at the same time that all of this was happening, companies started becoming aware of the value of thinking of data as a corporate resource and using it as a competitive weapon.
The result was the development of database management systems (DBMS) software and the creation of the ‘‘database environment.’’ Supported by ever- improved hardware and specialized database personnel, the database environment is designed largely to correct all the problems of the non-database environment. It encourages data sharing and the control of data redundancy with important improvements in data accuracy. It permits storage of vast volumes of data with acceptable access and response times for database queries. And it provides the tools to control data security, data privacy, and backup and recovery.
This book is a straightforward introduction to the fundamentals of database in the current information systems environment. It is designed to teach you the important concepts of the database approach and also to teach you specific skills, such as how to design relational databases, how to improve database performance, and how to retrieve data from relational databases using the SQL language. In addition, as you proceed through the book you will explore such topics as entity-relationship diagrams, object-oriented database, database administration, distributed database, data warehousing, Internet database issues, and others.
We start with the basics of database and take a step-by-step approach to exploring all the various components of the database environment. Each chapter progressively adds more to an understanding of both the technical and managerial aspects of the field. Database is avery powerful concept. Overall it provides ingenious solutions to a set of very difficult problems. As a result, it tends to be a multifaceted and complex subject that can appear difficult when one attempts to swallow it in one gulp. But database is approachable and understandable if we proceed carefully, cautiously, and progressively step by step. And this is an understanding that no one involved in information systems can afford to be without.
SUMMARY
Recognition of the commercial importance of data, of storing it, and of retrieving it can be traced back to ancient times. As trade routes lengthened and cities grew larger, data became increasingly important. Eventually, the importance of data led to the development of electromechanical calculating devices and then to modern electronic computers, complete with magnetic and optical disk-based data storage media.
While the use of data has given many companies a competitive advantage in their industries, the storage and retrieval of today’s vast amounts of data holds many challenges. These include speedy retrieval of data when many people try to access the data at the same time, maintaining the accuracy of the data, the issue of data security, and the ability to recover the data if it is lost.
The recognition that data is a critical corporate resource and that managing data is a complex task has led to the development and continuing refinement of specialized software known as database management systems, the subject of this book.
16 C h a p t e r 1 Data: The New Corporate Resource
KEY TERMS
Balance sheet Barter Calculating devices Census Compact disk Competitive advantage Corporate resource Data Data storage
Database Database environment Database management system Disk drive Double-entry bookkeeping Electromechanical equipment Electronic computer Flash drive Information processing
Magnetic disk Magnetic drum Magnetic tape Optical disk Punched cards Punched paper tape Record keeping Tally Token
QUESTIONS
1. What did the Middle Eastern shepherds’ pebbles and sacks, Pascal’s calculating device, and Hollerith’s punched-card devices all have in common?
2. What did the growth of cities have to do with the need for data?
3. What did the growth of trade have to do with the need for data?
4. What did Jacquard’s textile weaving device have to do with the development of data?
5. Choose what you believe to be the: a. One most important b. Two most important c. Three most important landmark events in the