Page 215
section 6.1
Data, Information, and Databases
LEARNING OUTCOMES
6.1Explain the four primary traits that determine the value of information.
6.2Describe a database, a database management system, and the relational database model.
6.3Identify the business advantages of a relational database.
6.4Explain the business benefits of a data-driven website.
THE BUSINESS BENEFITS OF HIGH-QUALITY INFORMATION
LO 6.1: Explain the four primary traits that determine the value of information.
Information is powerful. Information can tell an organization how its current operations are performing and help it estimate and strategize about how future operations might perform. The ability to understand, digest, analyze, and filter information is key to growth and success for any professional in any industry. Remember that new perspectives and opportunities can open up when you have the right data that you can turn into information and ultimately business intelligence.
Information is everywhere in an organization. Managers in sales, marketing, human resources, and management need information to run their departments and make daily decisions. When addressing a significant business issue, employees must be able to obtain and analyze all the relevant information so they can make the best decision possible. Information comes at different levels, formats, and granularities. Information granularity refers to the extent of detail within the information (fine and detailed or coarse and abstract). Employees must be able to correlate the different levels, formats, and granularities of information when making decisions. For example, a company might be collecting information from various suppliers to make needed decisions, only to find that the information is in different levels, formats, and granularities. One supplier might send detailed information in a spreadsheet, whereas another supplier might send summary information in a Word document, and still another might send a collection of information from emails. Employees will need to compare these differing types of information for what they commonly reveal to make strategic decisions. Figure 6.4 displays the various levels, formats, and granularities of organizational information.
Successfully collecting, compiling, sorting, and finally analyzing information from multiple levels, in varied formats, and exhibiting different granularities can provide tremendous insight into how an organization is performing. Exciting and unexpected results can include potential new markets, new ways of reaching customers, and even new methods of doing business. After understanding the different levels, formats, and granularities of information, managers next want to look at the four primary traits that help determine the value of information (see Figure 6.5 ).
Information Type: Transactional and Analytical
As discussed previously in the text, the two primary types of information are transactional and analytical. Transactional information encompasses all of the information contained within a single business process or unit of work, and its primary purpose is to support daily operational tasks. Organizations need to capture and store transactional information to perform operational tasks and repetitive decisions such as analyzing daily sales reports and production schedules to determine how much inventory to carry. Consider Walmart, which handles more than 1 million customer transactions every hour, and Facebook, which keeps track of 400 million active users (along with their photos, friends, and web links). In addition, every time a cash register rings up a sale, a deposit or withdrawal is made from an ATM, or a receipt is given at the gas pump, the transactional information must be captured and stored.
Page 216
FIGURE 6.4
Levels, Formats, and Granularities of Organizational Information
Analytical information encompasses all organizational information, and its primary purpose is to support the performance of managerial analysis tasks. Analytical information is useful when making important decisions such as whether the organization should build a new manufacturing plant or hire additional sales personnel. Analytical information makes it possible to do many things that previously were difficult to accomplish, such as spot business trends, prevent diseases, and fight crime. For example, credit card companies crunch through billions of transactional purchase records to identify fraudulent activity. Indicators such as charges in a foreign country or consecutive purchases of gasoline send a red flag highlighting potential fraudulent activity.
Walmart was able to use its massive amount of analytical information to identify many unusual trends, such as a correlation between storms and Pop-Tarts. Yes, Walmart discovered an increase in the demand for Pop-Tarts during the storm season. Armed with that valuable information, the retail chain was able to stock up on Pop-Tarts that were ready for purchase when customers arrived. Figure 6.6 displays different types of transactional and analytical information.
FIGURE 6.5
The Four Primary Traits of the Value of Information
Page 217
Information Timeliness
Timeliness is an aspect of information that depends on the situation. In some firms or industries, information that is a few days or weeks old can be relevant, whereas in others information that is a few minutes old can be almost worthless. Some organizations, such as 911 response centers, stock traders, and banks, require up-to-the-second information. Other organizations, such as insurance and construction companies, require only daily or even weekly information.
Real-time information means immediate, up-to-date information. Real-time systems provide real-time information in response to requests. Many organizations use real-time systems to uncover key corporate transactional information. The growing demand for real-time information stems from organizations’ need to make faster and more effective decisions, keep smaller inventories, operate more efficiently, and track performance more carefully. Information also needs to be timely in the sense that it meets employees’ needs, but no more. If employees can absorb information only on an hourly or daily basis, there is no need to gather real-time information in smaller increments.
Most people request real-time information without understanding one of the biggest pitfalls associated with real-time information—continual change. Imagine the following scenario: Three managers meet at the end of the day to discuss a business problem. Each manager has gathered information at different times during the day to create a picture of the situation. Each manager’s picture may be different because of the time differences. Their views on the business problem may not match because the information they are basing their analysis on is continually changing. This approach may not speed up decision making, and it may actually slow it down. Business decision makers must evaluate the timeliness of the information for every decision. Organizations do not want to find themselves using real-time information to make a bad decision faster.
Information Quality
Business decisions are only as good as the quality of the information used to make them. Information inconsistency occurs when the same data element has different values. Take for example the amount of work that needs to occur to update a customer who had changed her last name due to marriage. Changing this information in only a few organizational systems will lead to data inconsistencies causing customer 123456 to be associated with two last names. Information integrity issues occur when a system produces incorrect, inconsistent, or duplicate data. Data integrity issues can cause managers to consider the system reports invalid and will make decisions based on other sources.
FIGURE 6.6
Transactional versus Analytical Information
Page 218
FIGURE 6.7
Five Common Characteristics of High-Quality Information
To ensure that your systems do not suffer from data integrity issues, review Figure 6.7 for the five characteristics common to high-quality information: accuracy, completeness, consistency, timeliness, and uniqueness. Figure 6.8 provides an example of several problems associated with using low-quality information, including:
1. Completeness. The customer’s first name is missing.
2.Another issue with completeness. The street address contains only a number and not a street name.
3. Consistency. There may be a duplication of information since there is a slight difference between the two customers in the spelling of the last name. Similar street addresses and phone numbers make this likely.
FIGURE 6.8
Example of Low-Quality Information
Page 219
APPLY YOUR KNOWLEDGE
BUSINESS DRIVEN MIS
Determining Information Quality Issues
Real People magazine is geared toward working individuals and provides articles and advice on everything from car maintenance to family planning. The magazine is currently experiencing problems with its distribution list. More than 30 percent of the magazines mailed are returned because of incorrect address information, and each month it receives numerous calls from angry customers complaining that they have not yet received their magazines. Below is a sample of Real People’s customer information. Create a report detailing all the issues with the information, potential causes of the information issues, and solutions the company can follow to correct the situation.
4. Accuracy. This may be inaccurate information because the customer’s phone and fax numbers are the same. Some customers might have the same number for phone and fax, but the fact that the customer also has this number in the email address field is suspicious.
5.Another issue with accuracy. There is inaccurate information because a phone number is located in the email address field.
6.Another issue with completeness. The information is incomplete because there is not a valid area code for the phone and fax numbers.
Nestlé uses 550,000 suppliers to sell more than 100,000 products in 200 countries. However, due to poor information, the company was unable to evaluate its business effectively. After some analysis, it found that it had 9 million records of vendors, customers, and materials, half of which were duplicated, obsolete, inaccurate, or incomplete. The analysis discovered that some records abbreviated vendor names, and other records spelled out the vendor names. This created multiple accounts for the same customer, making it impossible to determine the true value of Nestlé’s customers. Without being able to identify customer profitability, a company runs the risk of alienating its best customers. 2
Knowing how low-quality information issues typically occur can help a company correct them. Addressing these errors will significantly improve the quality of company information and the value to be extracted from it. The four primary reasons for low-quality information are:
1.Online customers intentionally enter inaccurate information to protect their privacy.
2.Different systems have different information entry standards and formats.
3.Data-entry personnel enter abbreviated information to save time or erroneous information by accident.
4.Third-party and external information contains inconsistencies, inaccuracies, and errors.
Page 220
Understanding the Costs of Using Low-Quality Information Using the wrong information can lead managers to make erroneous decisions. Erroneous decisions in turn can cost time, money, reputations, and even jobs. Some of the serious business consequences that occur due to using low-quality information to make decisions are:
Inability to track customers accurately.
Difficulty identifying the organization’s most valuable customers.
Inability to identify selling opportunities.
Lost revenue opportunities from marketing to nonexistent customers.
The cost of sending undeliverable mail.
Difficulty tracking revenue because of inaccurate invoices.
Inability to build strong relationships with customers.
Understanding the Benefits of Using High-Quality Information High-quality information can significantly improve the chances of making a good decision and directly increase an organization’s bottom line. One company discovered that even with its large number of golf courses, Phoenix, Arizona, is not a good place to sell golf clubs. An analysis revealed that typical golfers in Phoenix are tourists and conventioneers who usually bring their clubs with them. The analysis further revealed that two of the best places to sell golf clubs in the United States are Rochester, New York, and Detroit, Michigan. Equipped with this valuable information, the company was able to place its stores strategically and launch its marketing campaigns.
High-quality information does not automatically guarantee that every decision made is going to be a good one, because people ultimately make decisions and no one is perfect. However, such information ensures that the basis of the decisions is accurate. The success of the organization depends on appreciating and leveraging the true value of timely and high-quality information.
Information Governance
Information is a vital resource, and users need to be educated on what they can and cannot do with it. To ensure that a firm manages its information correctly, it will need special policies and procedures establishing rules on how the information is organized, updated, maintained, and accessed. Every firm, large and small, should create an information policy concerning data governance. Data governance refers to the overall management of the availability, usability, integrity, and security of company data. Master data management (MDM) is the practice of gathering data and ensuring that it is uniform, accurate, consistent, and complete, including such entities as customers, suppliers, products, sales, employees, and other critical entities that are commonly integrated across organizational systems. MDM is commonly included in data governance. A company that supports a data governance program has a defined a policy that specifies who is accountable for various portions or aspects of the data, including its accuracy, accessibility, consistency, timeliness, and completeness. The policy should clearly define the processes concerning how to store, archive, back up, and secure the data. In addition, the company should create a set of procedures identifying accessibility levels for employees. Then, the firm should deploy controls and procedures that enforce government regulations and compliance with mandates such as Sarbanes-Oxley.
STORING INFORMATION USING A RELATIONAL DATABASE MANAGEMENT SYSTEM
LO 6.2: Describe a database, a database management system, and the relational database model.
The core component of any system, regardless of size, is a database and a database management system. Broadly defined, a database maintains information about various types of objects (inventory), events (transactions), people (employees), and places (warehouses). A database management system (DBMS) creates, reads, updates, and deletes data in a database while controlling access and security. Managers send requests to the DBMS, and the DBMS performs the actual manipulation of the data in the database. Companies store their information in databases, and managers access these systems to answer operational questions such as how many customers purchased Product A in December or what the average sales were by region. Two primary tools are available for retrieving information from a DBMS. First is a query-by-example (QBE) tool that helps users graphically design the answer to a question against a database. Second is a structured query language (SQL) that asks users to write lines of code to answer questions against a database. Managers typically interact with QBE tools, and MIS professionals have the skills required to code SQL. Figure 6.9 displays the relationship between a database, a DBMS, and a user. Some of the more popular examples of DBMS include MySQL, Microsoft Access, SQL Server, FileMaker, Oracle, and FoxPro.
Page 221
APPLY YOUR KNOWLEDGE
BUSINESS DRIVEN DEBATE
Excel or Access?
Excel is a great tool with which to perform business analytics. Your friend, John Cross, owns a successful publishing company specializing in Do It Yourself books. John started the business 10 years ago and has slowly grown to 50 employees and $1 million in sales. John has been using Excel to run the majority of his business, tracking book orders, production orders, shipping orders, and billing. John even uses Excel to track employee payroll and vacation dates. To date, Excel has done the job, but as the company continues to grow, the tool is becoming inadequate.
You believe John could benefit from moving from Excel to Access. John is skeptical of the change because Excel has done the job up to now, and his employees are comfortable with the current processes and technology. John has asked you to prepare a presentation explaining the limitations of Excel and the benefits of Access. In a group, prepare the presentation that will help convince John to make the switch.
A data element (or data field) is the smallest or basic unit of information. Data elements can include a customer’s name, address, email, discount rate, preferred shipping method, product name, quantity ordered, and so on. Data models are logical data structures that detail the relationships among data elements by using graphics or pictures.
Metadata provides details about data. For example, metadata for an image could include its size, resolution, and date created. Metadata about a text document could contain document length, data created, author’s name, and summary. Each data element is given a description, such as Customer Name; metadata is provided for the type of data (text, numeric, alphanumeric, date, image, binary value) and descriptions of potential predefined values such as a certain area code; and finally the relationship is defined. A data dictionary compiles all of the metadata about the data elements in the data model. Looking at a data model along with reviewing the data dictionary provides tremendous insight into the database’s functions, purpose, and business rules.
DBMS use three primary data models for organizing information—hierarchical, network, and the relational database, the most prevalent. A relational database model stores information in the form of logically related two-dimensional tables. A relational database management system allows users to create, read, update, and delete data in a relational database. Although the hierarchical and network models are important, this text focuses only on the relational database model.
FIGURE 6.9
Relationship of Database, DBMS, and User
Page 222
Storing Data Elements in Entities and Attributes
For flexibility in supporting business operations, managers need to query or search for the answers to business questions such as which artist sold the most albums during a certain month. The relationships in the relational database model help managers extract this information. Figure 6.10 illustrates the primary concepts of the relational database model—entities, attributes, keys, and relationships. An entity (also referred to as a table) stores information about a person, place, thing, transaction, or event. The entities, or tables, of interest in Figure 6.10 are TRACKS, RECORDINGS, MUSICIANS, and CATEGORIES. Notice that each entity is stored in a different two-dimensional table (with rows and columns).
Attributes (also called columns or fields) are the data elements associated with an entity. In Figure 6.10 , the attributes for the entity TRACKS are TrackNumber, TrackTitle, TrackLength, and RecordingID. Attributes for the entity MUSICIANS are MusicianID, MusicianName, MusicianPhoto, and MusicianNotes. A record is a collection of related data elements (in the MUSICIANS table, these include “3, Lady Gaga, gag.tiff , Do not bring young kids to live shows”). Each record in an entity occupies one row in its respective table.
Creating Relationships Through Keys
To manage and organize various entities within the relational database model, you use primary keys and foreign keys to create logical relationships. A primary key is a field (or group of fields) that uniquely identifies a given record in a table. In the table RECORDINGS, the primary key is the field RecordingID that uniquely identifies each record in the table. Primary keys are a critical piece of a relational database because they provide a way of distinguishing each record in a table; for instance, imagine you need to find information on a customer named Steve Smith. Simply searching the customer name would not be an ideal way to find the information because there might be 20 customers with the name Steve Smith. This is the reason the relational database model uses primary keys to identify each record uniquely. Using Steve Smith’s unique ID allows a manager to search the database to identify all information associated with this customer.
FIGURE 6.10
Primary Concepts of the Relational Database Model
Page 223
APPLY YOUR KNOWLEDGE
BUSINESS DRIVEN START-UP
2 Trillion Rows of Data Analyzed Daily—No Problem
eBay is the world’s largest online marketplace, with 97 million global users selling anything to anyone at a yearly total of $62 billion—more than $2,000 every second. Of course with this many sales, eBay is collecting the equivalent of the Library of Congress worth of data every three days that must be analyzed to run the business successfully. Luckily, eBay discovered Tableau!
Tableau started at Stanford when Chris Stolte, a computer scientist; Pat Hanrahan, an Academy Award–winning professor; and Christian Chabot, a savvy business leader, decided to solve the problem of helping ordinary people understand big data. The three created Tableau, which bridged two computer science disciplines: computer graphics and databases. No more need to write code or understand the relational database keys and categories; users simply drag and drop pictures of what they want to analyze. Tableau has become one of the most successful data visualization tools on the market, winning multiple awards, international expansion, and millions in revenue and spawning multiple new inventions. 3
Tableau is revolutionizing business analytics, and this is only the beginning. Visit the Tableau website and become familiar with the tool by watching a few of the demos. Once you have a good understanding of the tool, create three questions eBay might be using Tableau to answer, including the analysis of its sales data to find patterns, business insights, and trends.
A foreign key is a primary key of one table that appears as an attribute in another table and acts to provide a logical relationship between the two tables. For instance, Black Eyed Peas in Figure 6.10 is one of the musicians appearing in the MUSICIANS table. Its primary key, MusicianID, is “2.” Notice that MusicianID also appears as an attribute in the RECORDINGS table. By matching these attributes, you create a relationship between the MUSICIANS and RECORDINGS tables that states the Black Eyed Peas (MusicianID 2) have several recordings, including The E.N.D., Monkey Business, and Elepunk. In essence, MusicianID in the RECORDINGS table creates a logical relationship (who was the musician that made the recording) to the MUSICIANS table. Creating the logical relationship between the tables allows managers to search the data and turn it into useful information.
Coca Cola Relational Database Example
Figure 6.11 illustrates the primary concepts of the relational database model for a sample order of soda from Coca Cola. Figure 6.11 offers an excellent example of how data is stored in a database. For example, the order number is stored in the ORDER table, and each line item is stored in the ORDER LINE table. Entities include CUSTOMER, ORDER, ORDER LINE, PRODUCT, and DISTRIBUTOR. Attributes for CUSTOMER include Customer ID, Customer Name, Contact Name, and Phone. Attributes for PRODUCT include Product ID, Description, and Price. The columns in the table contain the attributes.
Consider Hawkins Shipping, one of the distributors appearing in the DISTRIBUTOR table. Its primary key, Distributor ID, is DEN8001. Distributor ID also appears as an attribute in the ORDER table. This establishes that Hawkins Shipping (Distributor ID DEN8001) was responsible for delivering orders 34561 and 34562 to the appropriate customer(s). Therefore, Distributor ID in the ORDER table creates a logical relationship (who shipped what order) between ORDER and DISTRIBUTOR.
Page 224
FIGURE 6.11
Potential Relational Database for Coca-Cola Bottling Company of Egypt (TCCBCE)
Page 225
USING A RELATIONAL DATABASE FOR BUSINESS ADVANTAGES
LO 6.3: Identify the business advantages of a relational database.
Many business managers are familiar with Excel and other spreadsheet programs they can use to store business data. Although spreadsheets are excellent for supporting some data analysis, they offer limited functionality in terms of security, accessibility, and flexibility and can rarely scale to support business growth. From a business perspective, relational databases offer many advantages over using a text document or a spreadsheet, as displayed in Figure 6.12 .
Increased Flexibility
Databases tend to mirror business structures, and a database needs to handle changes quickly and easily, just as any business needs to be able to do. Equally important, databases need to provide flexibility in allowing each user to access the information in whatever way best suits his or her needs. The distinction between logical and physical views is important in understanding flexible database user views. The physical view of information deals with the physical storage of information on a storage device. The logical view of information focuses on how individual users logically access information to meet their own particular business needs.
In the database illustration from Figure 6.10 , for example, one user could perform a query to determine which recordings had a track length of four minutes or more. At the same time, another user could perform an analysis to determine the distribution of recordings as they relate to the different categories. For example, are there more R&B recordings than rock, or are they evenly distributed? This example demonstrates that although a database has only one physical view, it can easily support multiple logical views that provide for flexibility.
Consider another example—a mail-order business. One user might want a report presented in alphabetical format, in which case, the last name should appear before first name. Another user, working with a catalog mailing system, would want customer names appearing as first name and then last name. Both are easily achievable but different logical views of the same physical information.
Increased Scalability and Performance
In its first year of operation, the official website of the American Family Immigration History Center, www.ellisisland.org , generated more than 2.5 billion hits. The site offers immigration information about people who entered America through the Port of New York and Ellis Island between 1892 and 1924. The database contains more than 25 million passenger names that are correlated to 3.5 million images of ships’ manifests. 4
The database had to be scalable to handle the massive volumes of information and the large numbers of users expected for the launch of the website. In addition, the database needed to perform quickly under heavy use. Some organizations must be able to support hundreds or thousands of users, including employees, partners, customers, and suppliers, who all want to access and share the same information. Databases today scale to exceptional levels, allowing all types of users and programs to perform information-processing and information-searching tasks.
FIGURE 6.12
Business Advantages of a Relational Database
Page 226
Reduced Information Redundancy
Information redundancy is the duplication of data, or the storage of the same data in multiple places. Redundant data can cause storage issues along with data integrity issues, making it difficult to determine which values are the most current or most accurate. Employees become confused and frustrated when faced with incorrect information causing disruptions to business processes and procedures. One primary goal of a database is to eliminate information redundancy by recording each piece of information in only one place in the database. This saves disk space, makes performing information updates easier, and improves information quality.
Increased Information Integrity (Quality)
Information integrity is a measure of the quality of information. Integrity constraints are rules that help ensure the quality of information. The database design needs to consider integrity constraints. The database and the DBMS ensures that users can never violate these constraints. There are two types of integrity constraints: (1) relational and (2) business critical.
Relational integrity constraints are rules that enforce basic and fundamental information-based constraints. For example, a relational integrity constraint would not allow someone to create an order for a nonexistent customer, provide a markup percentage that was negative, or order zero pounds of raw materials from a supplier. A business rule defines how a company performs certain aspects of its business and typically results in either a yes/no or true/false answer. Stating that merchandise returns are allowed within 10 days of purchase is an example of a business rule. Business-critical integrity constraints enforce business rules vital to an organization’s success and often require more insight and knowledge than relational integrity constraints. Consider a supplier of fresh produce to large grocery chains such as Kroger. The supplier might implement a business-critical integrity constraint stating that no product returns are accepted after 15 days past delivery. That would make sense because of the chance of spoilage of the produce. Business-critical integrity constraints tend to mirror the very rules by which an organization achieves success.
The specification and enforcement of integrity constraints produce higher-quality information that will provide better support for business decisions. Organizations that establish specific procedures for developing integrity constraints typically see an increase in accuracy that then increases the use of organizational information by business professionals.
Increased Information Security
Managers must protect information, like any asset, from unauthorized users or misuse. As systems become increasingly complex and highly available over the Internet on many devices, security becomes an even bigger issue. Databases offer many security features, including passwords to provide authentication, access levels to determine who can access the data, and access controls to determine what type of access they have to the information.
For example, customer service representatives might need read-only access to customer order information so they can answer customer order inquiries; they might not have or need the authority to change or delete order information. Managers might require access to employee files, but they should have access only to their own employees’ files, not the employee files for the entire company. Various security features of databases can ensure that individuals have only certain types of access to certain types of information.
Security risks are increasing as more and more databases and DBMS systems are moving to data centers run in the cloud. The biggest risks when using cloud computing are ensuring the security and privacy of the information in the database. Implementing data governance policies and procedures that outline the data management requirements can ensure safe and secure cloud computing.
Page 227
APPLY YOUR KNOWLEDGE
BUSINESS DRIVEN ETHICS AND SECURITY
Unethical Data Mining
Mining large amounts of data can create a number of benefits for business, society, and governments, but it can also create a number of ethical questions surrounding an invasion of privacy or misuse of information. Facebook recently came under fire for its data mining practices as it followed 700,000 accounts to determine whether posts with highly emotional content are more contagious. The study concluded that highly emotional texts are contagious, just as with real people. Highly emotional positive posts received multiple positive replies whereas highly emotional negative posts received multiple negative replies. Although the study seems rather innocent, many Facebook users were outraged; they felt the study was an invasion of privacy because the 700,000 accounts had no idea Facebook was mining their posts. As a Facebook user, you willingly consent that Facebook owns every bit and byte of data you post and, once you press submit, Facebook can do whatever it wants with your data. Do you agree or disagree that Facebook has the right to do whatever it wants with the data its 1.5 billion users post on its site? 5
DRIVING WEBSITES WITH DATA
LO 6.4: Explain the business benefits of a data-driven website.
A content creator is the person responsible for creating the original website content. A content editor is the person responsible for updating and maintaining website content. Static information includes fixed data incapable of change in the event of a user action. Dynamic information includes data that change based on user actions. For example, static websites supply only information that will not change until the content editor changes the information. Dynamic information changes when a user requests information. A dynamic website changes information based on user requests such as movie ticket availability, airline prices, or restaurant reservations. Dynamic website information is stored in a dynamic catalog , or an area of a website that stores information about products in a database.
Websites change for site visitors depending on the type of information they request. Consider, for example, an automobile dealer. The dealer would create a database containing data elements for each car it has available for sale, including make, model, color, year, miles per gallon, a photograph, and so on. Website visitors might click Porsche and then enter their specific requests such as price range or year made. Once the user hits Go, the website automatically provides a custom view of the requested information. The dealer must create, update, and delete automobile information as the inventory changes.
A data-driven website is an interactive website kept constantly updated and relevant to the needs of its customers using a database. Data-driven capabilities are especially useful when a firm needs to offer large amounts of information, products, or services. Visitors can become quickly annoyed if they find themselves buried under an avalanche of information when searching a website. A data-driven website can help limit the amount of information displayed to customers based on unique search requirements. Companies even use data-driven websites to make information in their internal databases available to customers and business partners.
There are a number of advantages to using the web to access company databases. First, web browsers are much easier to use than directly accessing the database by using a custom-query tool. Second, the web interface requires few or no changes to the database model. Finally, it costs less to add a web interface in front of a DBMS than to redesign and rebuild the system to support changes. Additional data-driven website advantages include:
Easy to manage content: Website owners can make changes without relying on MIS professionals; users can update a data-driven website with little or no training.
Page 228
FIGURE 6.13
Zappos.com—A Data-Driven Website
FIGURE 6.14
BI in a Data-Driven Website
Page 229
Easy to store large amounts of data: Data-driven websites can keep large volumes of information organized. Website owners can use templates to implement changes for layouts, navigation, or website structure. This improves website reliability, scalability, and performance.
Easy to eliminate human errors: Data-driven websites trap data-entry errors, eliminating inconsistencies while ensuring that all information is entered correctly.
Zappos credits its success as an online shoe retailer to its vast inventory of nearly 3 million products available through its dynamic data-driven website. The company built its data-driven website catering to a specific niche market: consumers who were tired of finding that their most-desired items were always out of stock at traditional retailers. Zappos’ highly flexible, scalable, and secure database helped it rank as the most available Internet retailer. Figure 6.13 displays the Zappos data-driven website illustrating a user querying the database and receiving information that satisfies the user’s request. 6
Companies can gain valuable business knowledge by viewing the data accessed and analyzed from their websites. Figure 6.14 displays how running queries or using analytical tools, such as a PivotTable, on the database that is attached to the website can offer insight into the business, such as items browsed, frequent requests, items bought together, and so on.
section 6.2
Business Intelligence
LEARNING OUTCOMES
6.5Identify the advantages of using business intelligence to support managerial decision making.
6.6Define data warehousing and data marts and explain how they support business decisions.
6.7Describe the three organizational methods for analyzing big data.
SUPPORTING DECISIONS WITH BUSINESS INTELLIGENCE
LO 6.5: Identify the advantages of using business intelligence to support managerial decision making.
Many organizations today find it next to impossible to understand their own strengths and weaknesses, let alone their biggest competitors, because the enormous volume of organizational data is inaccessible to all but the MIS department. Organization data include far more than simple structured data elements in a database; the set of data also includes unstructured data such as voice mail, customer phone calls, text messages, video clips, and numerous new forms of data such as tweets from Twitter.
The Problem: Data Rich, Information Poor
An ideal business scenario would be as follows. As a business manager on his way to meet with a client reviews historical customer data, he realizes that the client’s ordering volume has substantially decreased. As he drills down into the data, he notices the client had a support issue with a particular product. He quickly calls the support team to find out all of the information and learns that a replacement for the defective part can be shipped in 24 hours. In addition, he learns that the client has visited the website and requested information on a new product line. Armed with all this information, the business manager is prepared for a productive meeting with his client. He now understands the client’s needs and issues, and he can address new sales opportunities with confidence.
For many companies, the preceding example is simply a pipe dream. Attempting to gather all of the client information would actually take hours or even days to compile. With so much data available, it is surprisingly hard for managers to get information, such as inventory levels, past order history, or shipping details. Managers send their information requests to the MIS department where a dedicated person compiles the various reports. In some situations, responses can take days, by which time the information may be outdated and opportunities lost. Many organizations find themselves in the position of being data rich and information poor. Even in today’s electronic world, managers struggle with the challenge of turning their business data into business intelligence.
Page 230
APPLY YOUR KNOWLEDGE
BUSINESS DRIVEN INNOVATION
News Dots
Gone are the days of staring at boring spreadsheets and trying to understand how the data correlate. With innovative data visualization tools, managers can arrange different ways to view the data, providing new forms of pattern recognition not offered by simply looking at numbers. Slate, a news publication, developed a new data visualization tool called News Dots, that offers readers a different way of viewing the daily news through trends and patterns. The News Dots tool scans about 500 stories a day from major publications and then tags the content with important keywords such as people, places, companies, and topics. Surprisingly, the majority of daily news overlaps as the people, places, and stories are frequently connected. Using News Dots, you can visualize how the news fits together, almost similar to a giant social network. News Dots uses circles (or dots) to represent the tagged content and arranges them according to size. The more frequently a certain topic is tagged, the larger the dot and its relationship to other dots. The tool is interactive and users simply click a dot to view which stories mention that topic and which other topics it connects to in the network such as a correlation among the U.S. government, Federal Reserve, Senate, bank, and Barack Obama. 7
How can data visualization help identify trends? What types of business intelligence could you identify if your college used a data visualization tool to analyze student information? What types of business intelligence could you identify if you used a data visualization tool to analyze the industry in which you plan to compete?
The Solution: Business Intelligence
Employee decisions are numerous, and they include providing service information, offering new products, and supporting frustrated customers. Employees can base their decisions on data, experience, or knowledge and preferably a combination of all three. Business intelligence can provide managers with the ability to make better decisions. A few examples of how different industries use business intelligence include:
Airlines: Analyze popular vacation locations with current flight listings.
Banking: Understand customer credit card usage and nonpayment rates.
Health care: Compare the demographics of patients with critical illnesses.
Insurance: Predict claim amounts and medical coverage costs.
Law enforcement: Track crime patterns, locations, and criminal behavior.
Marketing: Analyze customer demographics.
Retail: Predict sales, inventory levels, and distribution.
Technology: Predict hardware failures.
Figure 6.15 displays how organizations using BI can find the cause to many issues and problems simply by asking “Why?” The process starts by analyzing a report such as sales amounts by quarter. Managers will drill down into the report looking for why sales are up or why sales are down. Once they understand why a certain location or product is experiencing an increase in sales, they can share the information in an effort to raise enterprisewide sales. Once they understand the cause for a decrease in sales, they can take effective action to resolve the issue. Here are a few examples of how managers can use BI to answer tough business questions:
Page 231
FIGURE 6.15
How BI Can Answer Tough Customer Questions
Where has the business been? Historical perspective offers important variables for determining trends and patterns.
Where is the business now? Looking at the current business situation allows managers to take effective action to solve issues before they grow out of control.
Where is the business going? Setting strategic direction is critical for planning and creating solid business strategies.
Ask a simple question—such as who is my best customer or what is my worst-selling product—and you might get as many answers as you have employees. Databases, data warehouses, and data marts can provide a single source of “trusted” data that can answer questions about customers, products, suppliers, production, finances, fraud, and even employees. They can also alert managers to inconsistencies or help determine the causes and effects of enter-prisewide business decisions. All business aspects can benefit from the added insights provided by business intelligence, and you, as a business student, will benefit from understanding how MIS can help you make intelligent decisions.
THE BUSINESS BENEFITS OF DATA WAREHOUSING
LO 6.6: Define data warehousing and data marts and explain how they support business decisions.
In the 1990s as organizations began to need more timely information about their business, they found that traditional management information systems were too cumbersome to provide relevant information efficiently and effectively. Most of the systems were in the form of operational databases that were designed for specific business functions, such as accounting, order entry, customer service, and sales, and were not appropriate for business analysis for the reasons shown in Figure 6.16 .
During the latter half of the 20th century, the numbers and types of operational databases increased. Many large businesses found themselves with information scattered across multiple systems with different file types (such as spreadsheets, databases, and even word processing files), making it almost impossible for anyone to use the information from multiple sources. Completing reporting requests across operational systems could take days or weeks using antiquated reporting tools that were ineffective for running a business. From this idea, the data warehouse was born as a place where relevant information could be stored and accessed for making strategic queries and reports.
A data warehouse is a logical collection of information, gathered from many operational databases, that supports business analysis activities and decision-making tasks. The primary purpose of a data warehouse is to combine information, more specifically, strategic information, throughout an organization into a single repository in such a way that the people who need that information can make decisions and undertake business analysis. A key idea within data warehousing is to collect information from multiple systems in a common location that uses a universal querying tool. This allows operational databases to run where they are most efficient for the business, while providing a common location using a familiar format for the strategic or enterprisewide reporting information.
Page 232
FIGURE 6.16
Reasons Business Analysis Is Difficult from Operational Databases
Data warehouses go even a step further by standardizing information. Gender, for instance can be referred to in many ways (Male, Female, M/F, 1/0), but it should be standardized on a data warehouse with one common way of referring to each data element that stores gender (M/F). Standardization of data elements allows for greater accuracy, completeness, and consistency and increases the quality of the information in making strategic business decisions. The data warehouse then is simply a tool that enables business users, typically managers, to be more effective in many ways, including:
Developing customer profiles.
Identifying new-product opportunities.
Improving business operations.
Identifying financial issues.
Analyzing trends.
Understanding competitors.
Understanding product performance. (See Figure 6.17 for the three core concepts of data warehousing.)
DATA MARTS
Businesses collect a tremendous amount of transactional information as part of their routine operations. Marketing, sales, and other departments would like to analyze these data to understand their operations better. Although databases store the details of all transactions (for instance, the sale of a product) and events (hiring a new employee), data warehouses store that same information but in an aggregated form more suited to supporting decision-making tasks. Aggregation, in this instance, can include totals, counts, averages, and the like.
Page 233
FIGURE 6.17
Three Core Concepts of Data Warehousing
The data warehouse modeled in Figure 6.18 compiles information from internal databases (or transactional and operational databases) and external databases through extraction, transformation, and loading. Extraction, transformation, and loading (ETL) is a process that extracts information from internal and external databases, transforms it using a common set of enterprise definitions, and loads it into a data warehouse. The data warehouse then sends portions (or subsets) of the information to data marts. A data mart contains a subset of data warehouse information. To distinguish between data warehouses and data marts, think of data warehouses as having a more organizational focus and data marts as having a functional focus. Figure 6.18 provides an illustration of a data warehouse and its relationship to internal and external databases, ETL, and data marts.
FIGURE 6.18
Data Warehouse Model
Page 234
Multidimensional Analysis
A relational database contains information in a series of two-dimensional tables. In a data warehouse and data mart, information contains layers of columns and rows. For this reason, most data warehouses and data marts are multidimensional databases. A dimension is a particular attribute of information. Each layer in a data warehouse or data mart represents information according to an additional dimension. An information cube is the common term for the representation of multidimensional information. Figure 6.19 displays a cube (cube a) that represents store information (the layers), product information (the rows), and promotion information (the columns).
After creating a cube of information, users can begin to slice and dice the cube to drill down into the information. The second cube (cube b) in Figure 6.19 displays a slice representing promotion II information for all products at all stores. The third cube (cube c) in Figure 6.19 displays only information for promotion III, product B, at store 2. By using multidimensional analysis, users can analyze information in a number of ways and with any number of dimensions. Users might want to add dimensions of information to a current analysis, including product category, region, and even forecasted versus actual weather. The true value of a data warehouse is its ability to provide multidimensional analysis that allows users to gain insights into their information.
Data warehouses and data marts are ideal for off-loading some of the querying against a database. For example, querying a database to obtain an average of sales for Product B at Store 2 while Promotion III is under way might create a considerable processing burden for a database, increasing the time it takes another person to enter a new sale into the same database. If an organization performs numerous queries against a database (or multiple databases), aggregating that information into a data warehouse will be beneficial.
Information Cleansing or Scrubbing
Dirty data is erroneous or flawed data (see Figure 6.20 ). The complete removal of dirty data from a source is impractical or virtually impossible. According to Gartner Inc., dirty data is a business problem, not an MIS problem. Over the next two years, more than 25 percent of critical data in Fortune 1000 companies will continue to be flawed; that is, the information will be inaccurate, incomplete, or duplicated.
Obviously, maintaining quality information in a data warehouse or data mart is extremely important. To increase the quality of organizational information and thus the effectiveness of decision making, businesses must formulate a strategy to keep information clean. Information cleansing or scrubbing is a process that weeds out and fixes or discards inconsistent, incorrect, or incomplete information.
FIGURE 6.19
A Cube of Information for Performing a Multidimensional Analysis on Three Stores for Five Products and Four Promotions
Page 235
APPLY YOUR KNOWLEDGE
BUSINESS DRIVEN DISCUSSION
Butterfly Effects
The butterfly effect, an idea from chaos theory in mathematics, refers to the way a minor event—like the movement of a butterfly’s wing—can have a major impact on a complex system like the weather. Dirty data can have the same impact on a business as the butterfly effect. Organizations depend on the movement and sharing of data throughout the organization, so the impact of data quality errors are costly and far-reaching. Such data issues often begin with a tiny mistake in one part of the organization, but the butterfly effect can produce disastrous results, making its way through MIS systems to the data warehouse and other enterprise systems. When dirty data or low-quality data enters organizational systems, a tiny error such as a spelling mistake can lead to revenue loss, process inefficiency, and failure to comply with industry and government regulations. Explain how the following errors can affect an organization:
A cascading spelling mistake
Inaccurate customer records
Incomplete purchasing history
Inaccurate mailing address
Duplicate customer numbers for different customers
Specialized software tools exist that use sophisticated procedures to analyze, standardize, correct, match, and consolidate data warehouse information. This step is vitally important because data warehouses often contain information from several databases, some of which can be external to the organization. In a data warehouse, information cleansing occurs first during the ETL process and again once the information is in the data warehouse. Companies can choose information cleansing software from several vendors, including Oracle, SAS, Ascential Software, and Group 1 Software. Ideally, scrubbed information is accurate and consistent.
FIGURE 6.20
Dirty Data Problems
Page 236
Looking at customer information highlights why information cleansing is necessary. Customer information exists in several operational systems. In each system, all the details could change—from the customer ID to contact information—depending on the business process the user is performing (see Figure 6.21 ).
Figure 6.22 displays a customer name entered differently in multiple operational systems. Information cleansing allows an organization to fix these types of inconsistencies in the data warehouse. Figure 6.23 displays the typical events that occur during information cleansing.
FIGURE 6.21
Contact Information in Operational Systems
FIGURE 6.22
Standardizing a Customer Name in Operational Systems
Page 237
FIGURE 6.23
Information Cleansing Activities
FIGURE 6.24
The Cost of Accurate and Complete Information
Achieving perfect information is almost impossible. The more complete and accurate a company wants its information to be, the more it costs (see Figure 6.24 ). Companies may also trade accuracy for completeness. Accurate information is correct, whereas complete information has no blanks. A birth date of 2/31/10 is an example of complete but inaccurate information (February 31 does not exist). An address containing Denver, Colorado, without a zip code is an example of accurate information that is incomplete. Many firms complete data quality audits to determine the accuracy and completeness of its data. Most organizations determine a percentage of accuracy and completeness high enough to make good decisions at a reasonable cost, such as 85 percent accurate and 65 percent complete.
THE POWER OF BIG DATA ANALYTICS
LO 6.7: Describe the three organizational methods for analyzing big data.
Companies are collecting more data than ever. Historically, data were housed in functional systems that were not integrated, such as customer service, finance, and human resources. Today companies can gather all of the functional data together by the zetabyte, but finding a way to analyze the data is incredibly challenging. Figure 6.25 displays the three methods organizations are using to dissect, analyze, and understand organizational data.
Page 238
FIGURE 6.25
Three Organizational Methods for Analyzing Big Data
Data Mining
Data mining is the process of analyzing data to extract information not offered by the raw data alone. Data mining can also begin at a summary information level (coarse granularity) and progress through increasing levels of detail (drilling down) or the reverse (drilling up). Companies use data-mining techniques to compile a complete picture of their operations, all within a single view, allowing them to identify trends and improve forecasts. Consider Best Buy, which used data-mining tools to identify that 7 percent of its customers accounted for 43 percent of its sales, so the company reorganized its stores to accommodate those customers.
To perform data mining, users need data-mining tools. Data-mining tools use a variety of techniques to find patterns and relationships in large volumes of information that predict future behavior and guide decision making. Data mining uncovers trends and patterns, which analysts use to build models that, when exposed to new information sets, perform a variety of information analysis functions. Data-mining tools for data warehouses help users uncover business intelligence in their data. Figure 6.26 displays the data-mining analysis methods used to uncover patterns and trends for business analysis such as:
Analyzing customer buying patterns to predict future marketing and promotion campaigns.
Building budgets and other financial information.
Detecting fraud by identifying deceptive spending patterns.
Finding the best customers who spend the most money.
Keeping customers from leaving or migrating to competitors.
Promoting and hiring employees to ensure success for both the company and the individual.
Page 239
FIGURE 6.26
Data Mining Analysis Methods
Data mining enables these companies to determine relationships among such internal factors as price, product positioning, or staff skills, and external factors such as economic indicators, competition, and customer demographics. In addition, it enables companies to determine the impact on sales, customer satisfaction, and corporate profits and to drill down into summary information to view detailed transactional data. With data mining, a retailer could use point-of-sale records of customer purchases to send targeted promotions based on an individual’s purchase history. By mining demographic data from comment or warranty cards, the retailer could develop products and promotions to appeal to specific customer segments.
Netflix uses data mining to analyze each customer’s film-viewing habits to provide recommendations for other customers with Cinematch, its movie recommendation system. Using Cinematch, Netflix can present customers with a number of additional movies they might want to watch based on the customer’s current preferences. Netflix’s innovative use of data mining provides its competitive advantage in the movie rental industry. Data mining uses specialized technologies and functionalities such as query tools, reporting tools, multidimensional analysis tools, statistical tools, and intelligent agents to uncover patterns displayed in Figure 6.27 .
FIGURE 6.27
Data-Mining Techniques
Page 240
Big Data Analytics
Structured data has a defined length, type, and format and includes numbers, dates, or strings such as Customer Address. Structured data is typically stored in a traditional system such as a relational database or spreadsheet and accounts for about 20 percent of the data that surrounds us. The sources of structured data include:
Machine-generated data , created by a machine without human intervention. Machine-generated structured data includes sensor data, point-of-sale data, and web log (blog) data.
Human-generated data is data that humans, in interaction with computers, generate. Human-generated structured data includes input data, click-stream data, or gaming data.
Unstructured data is not defined, does not follow a specified format, and is typically free-form text such as emails, Twitter tweets, and text messages. Unstructured data accounts for about 80 percent of the data that surrounds us. The sources of unstructured data include:
Machine-generated unstructured data: satellite images, scientific atmosphere data, and radar data.
Human-generated unstructured data: text messages, social media data, and emails.
Big data is a collection of large, complex data sets, including structured and unstructured data, which cannot be analyzed using traditional database methods and tools. The four common characteristics of big data are detailed in Figure 6.28 . Big data requires sophisticated tools to analyze all the unstructured information from millions of customers, devices, and machine interactions. Big data are analyzed for marketing trends in business as well as in the fields of manufacturing, medicine, and science.
FIGURE 6.28
Four Common Characteristics of Big Data
Page 241
Distributed computing processes and manages algorithms across many machines in a computing environment. Big data tools use distributed computing to store and analyze data across databases stored around the globe. Traditional analytical tools focus on basic business intelligence, including querying and reporting of historical data against a relational database. Traditional data-mining tools focus on history and explain where the organization has been. Advanced analytics focuses on forecasting future trends and producing insights using sophisticated quantitative methods, including statistics, descriptive and predictive data mining, simulation, and optimization. Advanced analytics uses data patterns to make forward-looking predictions to explain to the organization where it is headed. A data scientist extracts knowledge from data by performing statistical analysis, data mining, and advanced analytics on big data to identify trends, market changes, and other relevant information. Figure 6.29 displays the techniques a data scientist will use to perform big data advanced analytics.
Data Visualization
Traditional bar graphs and pie charts are boring and at best confusing and at worst misleading. As databases and graphics collide more and more, people are creating infographics (information graphics), which display information graphically so it can be easily understood. Infographics present the results of data analysis, displaying the patterns, relationships, and trends in a graphical format. Inforgraphics are exciting and quickly convey a story users can understand without having to analyze numbers, tables, and boring charts. Great data visualizations provide insights into something new about the underlying patterns and relationships. Just think of the periodic table of elements and imagine if you had to look at an Excel spreadsheet showing each element and the associated attributes in a table format. This would be not only difficult to understand but easy to misinterpret. By placing the elements in the visual periodic table, you quickly grasp how the elements relate and the associated hierarchy. Infographics perform the same function for business data as the periodic table does for chemical elements.
FIGURE 6.29
Big Data Advanced Analytical Techniques
Page 242
APPLY YOUR KNOWLEDGE
BUSINESS DRIVEN GLOBALIZATION
Integrity Information Inc.
Congratulations! You have just been hired as a consultant for Integrity Information Inc., a start-up business intelligence consulting company. Your first job is to help work with the sales department in securing a new client, The Warehouse. The Warehouse has been operating in the United States for more than a decade, and its primary business is to sell wholesale low-cost products. The Warehouse is interested in hiring Integrity Information Inc. to clean up the data that are stored in its U.S. database. To determine how good your work is, the client would like your analysis of the following spreadsheet. The Warehouse is also interested in expanding globally and wants to purchase several independent wholesale stores located in Australia, Thailand, China, Japan, and the United Kingdom. Before the company moves forward with the venture, it wants to understand what types of data issues it might encounter as it begins to transfer data from each global entity to the data warehouse. Please create a list detailing the potential issues The Warehouse can anticipate encountering as it consolidates the global databases into a single data warehouse. 8
Page 243
Analysis paralysis occurs when the user goes into an emotional state of over-analysis (or over-thinking) a situation so that a decision or action is never taken, in effect paralyzing the outcome. In the time of big data, analysis paralysis is a growing problem. One solution is to use data visualizations to help people make decisions faster. Data visualization describes technologies that allow users to see or visualize data to transform information into a business perspective. Data visualization is a powerful way to simplify complex data sets by placing data in a format that is easily grasped and understood far quicker than the raw data alone. Data visualization tools move beyond Excel graphs and charts into sophisticated analysis techniques such as controls, instruments, maps, time-series graphs, and more. Data visualization tools can help uncover correlations and trends in data that would otherwise go unrecognized. Business intelligence dashboards track corporate metrics such as critical success factors and key performance indicators and include advanced capabilities such as interactive controls, allowing users to manipulate data for analysis. The majority of business intelligence software vendors offer a number of data visualization tools and business intelligence dashboards. A data artist is a business analytics specialist who uses visual tools to help people understand complex data.
Big data is one of the most promising technology trends occurring today. Of course, notable companies such as Facebook, Google, and Netflix are gaining the most business insights from big data currently, but many smaller markets are entering the scene, including retail, insurance, and health care. Over the next decade, as big data starts to improve your everyday life by providing insights into your social relationships, habits, and careers, you can expect to see the need for data scientists and data artists dramatically increase.
LEARNING OUTCOME REVIEW
Learning Outcome 6.1: Explain the four primary traits that determine the value of information.
Information is data converted into a meaningful and useful context. Information can tell an organization how its current operations are performing and help it estimate and strategize about how future operations might perform. It is important to understand the different levels, formats, and granularities of information along with the four primary traits that help determine the value of information, which include (1) information type: transactional and analytical; (2) information timeliness; (3) information quality; and (4) information governance.
Learning Outcome 6.2: Describe a database, a database management system, and the relational database model.
A database maintains information about various types of objects (inventory), events (transactions), people (employees), and places (warehouses). A database management system (DBMS) creates, reads, updates, and deletes data in a database while controlling access and security. A DBMS provides methodologies for creating, updating, storing, and retrieving data in a database. In addition, a DBMS provides facilities for controlling data access and security, allowing data sharing and enforcing data integrity. The relational database model allows users to create, read, update, and delete data in a relational database.
Learning Outcome 6.3: Identify the business advantages of a relational database.
Many business managers are familiar with Excel and other spreadsheet programs they can use to store business data. Although spreadsheets are excellent for supporting some data analysis, they offer limited functionality in terms of security, accessibility, and flexibility and can rarely scale to support business growth. From a business perspective, relational databases offer many advantages over using a text document or a spreadsheet, including increased flexibility, increased scalability and performance, reduced information redundancy, increased information integrity (quality), and increased information security.
Page 244
Learning Outcome 6.4: Explain the business benefits of a data-driven website.
A data-driven website is an interactive website kept constantly updated and relevant to the needs of its customers using a database. Data-driven capabilities are especially useful when the website offers a great deal of information, products, or services because visitors are frequently annoyed if they are buried under an avalanche of information when searching a website. Many companies use the web to make some of the information in their internal databases available to customers and business partners.
Learning Outcome 6.5: Identify the advantages of using business intelligence to support managerial decision making.
Many organizations today find it next to impossible to understand their own strengths and weaknesses, let alone their biggest competitors, due to enormous volumes of organizational data being inaccessible to all but the MIS department. Organization data include far more than simple structured data elements in a database; the set of data also includes unstructured data such as voice mail, customer phone calls, text messages, video clips, along with numerous new forms of data, such as tweets from Twitter. Managers today find themselves in the position of being data rich and information poor, and they need to implement business intelligence systems to solve this challenge.
Learning Outcome 6.6: Define data warehousing and data marts and explain how they support business decisions.
A data warehouse is a logical collection of information, gathered from many different operational databases, that supports business analysis and decision making. The primary value of a data warehouse is to combine information, more specifically, strategic information, throughout an organization into a single repository in such a way that the people who need that information can make decisions and undertake business analysis.
Learning Outcome 6.7: Describe the three organizational methods for analyzing big data.
Data mining, big data analytics, and data visualization are the three methods organizations are using to dissect, analyze, and understand organizational data. Data mining is the process of analyzing data to extract information not offered by the raw data alone. Data mining can also begin at a summary information level (coarse granularity) and progress through increasing levels of detail (drilling down), or the reverse (drilling up). Big data is a collection of large, complex data sets, including structured and unstructured data, which cannot be analyzed using traditional database methods and tools. Data visualization describes technologies that allow users to see or visualize data to transform information into a business perspective.
OPENING CASE QUESTIONS
1. Knowledge:List the reasons a business would want to display information in a graphic or visual format.
2. Comprehension:Describe how a business could use a business intelligence digital dashboard to gain an understanding of how the business is operating.
3. Application:Explain how a marketing department could use data visualization tools to help with the release of a new product.
4. Analysis:Categorize the five common characteristics of high-quality information and rank them in order of importance for Hotels.com.
5. Synthesis:Develop a list of some possible entities and attributes located in the Hotels.com database.
6. Evaluate:Assess how Hotels.com is using BI to identify trends and change associated business processes.
Page 245
KEY TERMS
Advanced analytics
Analysis paralysis
Attribute
Big data
Business-critical integrity constraint
Business rule
Business intelligence dashboard
Content creator
Content editor
Data dictionary
Data element (or data field)
Data governance
Data mart
Data mining
Data model
Data quality audit
Data visualization
Data visualization tools
Data warehouse
Database
Database management system (DBMS)
Data-driven website
Data-mining tool
Data artist
Data scientist
Dirty data
Distributed computing
Dynamic catalog
Dynamic information
Entity
Extraction, transformation, and loading (ETL)
Foreign key
Human-generated data
Infographic (or information graphic)
Information cleansing or scrubbing
Information cube
Information granularity
Information inconsistency
Information integrity
Information integrity issues
Information redundancy
Integrity constraint
Logical view of information
Machine-generated data
Master data management (MDM)
Metadata
Physical view of information
Primary key
Query-by-example (QBE) tool
Real-time information
Real-time system
Record
Relational database management system
Relational database model
Relational integrity constraint
Static information
Structured data
Structured query language (SQL)
Time-series information
Unstructured data
REVIEW QUESTIONS
1.How does a database turn data elements into information?
2.Why does a business need to be concerned with the quality of its data?
3.How can data governance help protect a business from hackers?
4.Why would a company care about the timeliness of its data?
5.What are the five characteristics common to high-quality information?
6.What is data governance and its importance to a company?
7.What are the four primary traits that help determine the value of information?
8.What is the difference between an entity and an attribute?
9.What are the advantages of a relational database?
10.What are the advantages of a data-driven website?
11.What is a data warehouse and why would a business want to implement one?
12.Why would you need to use multidimensional analysis?
13.What is the purpose of information cleansing (or scrubbing)?
14.Why would a department want a data mart instead of just accessing the entire data warehouse?
15.Why would a business be data rich but information poor?
Page 246
CLOSING CASE ONE
Data Visualization: Stories for the Information Age
At the intersection of art and algorithm, data visualization schematically abstracts information to bring about a deeper understanding of the data, wrapping it in an element of awe. Although the practice of visually representing information is arguably the foundation of all design, a newfound fascination with data visualization has been emerging. After The New York Times and The Guardian recently opened their online archives to the public, artists rushed to dissect nearly two centuries’ worth of information, elevating this art form to new prominence.
For artists and designers, data visualization is a new frontier of self-expression, powered by the proliferation of information and the evolution of available tools. For enterprise, it is a platform for displaying products and services in the context of the cultural interaction that surrounds them, reflecting consumers’ increasing demand for corporate transparency.
“Looking at something ordinary in a new way makes it extraordinary,” says Aaron Koblin, one of the more recent pioneers of the discipline. As technology lead of Google’s Creative Labs in San Francisco, he spearheaded the search giant’s Chrome Experiments series designed to show off the speed and reliability of the Chrome browser.
Forget Pie Charts and Bar Graphs
Data visualization has nothing to do with pie charts and bar graphs. And it’s only marginally related to infographics, information design that tends to be about objectivity and clarification. Such representations simply offer another iteration of the data—restating it visually and making it easier to digest. Data visualization, on the other hand, is an interpretation, a different way to look at and think about data that often exposes complex patterns or correlations.
Data visualization is a way to make sense of the ever-increasing stream of information with which we’re bombarded and provides a creative antidote to the analysis paralysis that can result from the burden of processing such a large volume of information. “It’s not about clarifying data,” says Koblin. “It’s about contextualizing it.”
Today algorithmically inspired artists are reimagining the art-science continuum through work that frames the left-brain analysis of data in a right-brain creative story. Some use data visualization as a bridge between alienating information and its emotional impact—see Chris Jordan’s portraits of global mass culture. Others take a more technological angle and focus on cultural utility—the Zoetrope project offers a temporal and historical visualization of the ephemeral web. Still others are pure artistic indulgence—like Koblin’s own Flight Patterns project, a visualization of air traffic over North America.
How Business Can Benefit
There are real implications for business here. Most cell phone providers, for instance, offer a statement of a user’s monthly activity. Most often it’s an overwhelming table of various numerical measures of how much you talked, when, with whom, and how much it cost. A visual representation of this data might help certain patterns emerge, revealing calling habits and perhaps helping users save money.
Companies can also use data visualization to gain new insight into consumer behavior. By observing and understanding what people do with the data—what they find useful and what they dismiss as worthless—executives can make the valuable distinction between what consumers say versus what they do. Even now, this can be a tricky call to make from behind the two-way mirror of a traditional qualitative research setting.
It’s essential to understand the importance of creative vision along with the technical mastery of software. Data visualization isn’t about using all the data available, but about deciding which patterns and elements to focus on, building a narrative, and telling the story of the raw data in a different, compelling way.
Page 247
Ultimately, data visualization is more than complex software or the prettying up of spreadsheets. It’s not innovation for the sake of innovation. It’s about the most ancient of social rituals: storytelling. It’s about telling the story locked in the data differently, more engagingly, in a way that draws us in, makes our eyes open a little wider and our jaw drop ever so slightly. And as we process it, it can sometimes change our perspective altogether.9
Questions
1.Identify the effects poor information might have on a data visualization project.
2.How does data visualization use database technologies?
3.How could a business use data visualization to identify new trends?
4.What is the correlation between data mining and data visualization?
5.Is data visualization a form of business intelligence? Why or why not?
6.What security issues are associated with data visualization?
7.What might happen to a data visualization project if it failed to cleanse or scrub its data?
CLOSING CASE TWO
Zillow
Zillow.com is an online, web-based real estate site helping homeowners, buyers, sellers, renters, real estate agents, mortgage professionals, property owners, and property managers find and share information about real estate and mortgages. Zillow allows users to access, anonymously and free of charge, the kinds of tools and information previously reserved for real estate professionals. Zillow’s databases cover more than 90 million homes, which represents 95 percent of the homes in the United States. Adding to the sheer size of its databases, Zillow recalculates home valuations for each property every day, so it can provide historical graphs on home valuations over time. In some areas, Zillow is able to display 10 years of valuation history, a value-added benefit for many of its customers. This collection of data represents an operational data warehouse for anyone visiting the website.
As soon as Zillow launched its website, it immediately generated a massive amount of traffic. As the company expanded its services, the founders knew the key to its success would be the site’s ability to process and manage massive amounts of dataquickly, in real time. The company identified a need for accessible, scalable, reliable, secure databases that would enable it to continue to increase the capacity of its infrastructure indefinitely without sacrificing performance. Zillow’s traffic continues to grow despite the weakened real estate market; the company is experiencing annual traffic growth of 30 percent, and about a third of all U.S. mortgage professionals visit the site in a given month.
Data Mining and Business Intelligence
Zestimate values on Zillow use data-mining features for spotting trends across property valuations. Data mining also allows the company to see how accurate Zestimate values are over time. Zillow has also built the industry’s first search by monthly payment, allowing users to find homes that are for sale and rent based on a monthly payment they can afford. Along with the monthly payment search, users can also enter search criteria such as the number of bedrooms or bathrooms.
Zillow also launched a new service aimed at changing the way Americans shop for mortgages. Borrowers can use Zillow’s new Mortgage Marketplace to get custom loan quotes from lenders without having to give their names, addresses, phone numbers, or Social Security numbers, or field unwanted telephone calls from brokers competing for their business. Borrowers reveal their identities only after contacting the lender of their choice. The company is entering a field of established mortgage sites such as LendingTree.com and Experian Group’sLowermybills.com, which charge mortgage companies for borrower information. Zillow, which has an advertising model, says it does not plan to charge for leads.
Page 248
For mortgage companies, the anonymous leads come free; they can make a bid based on information provided by the borrower, such as salary, assets, credit score, and the type of loan. Lenders can browse borrower requests and see competing quotes from other brokers before making a bid.10
Questions
1.List the reasons Zillow would need to use a database to run its business.
2.Describe how Zillow uses business intelligence to create a unique product for its customers.
3.How could the marketing department at Zillow use a data mart to help with the release of a new product launch?
4.Categorize the five common characteristics of high-quality information and rank them in order of importance to Zillow.
5.Develop a list of some possible entities and attributes of Zillow’s mortgage database.
6.Assess how Zillow uses a data-driven website to run its business.
CRITICAL BUSINESS THINKING
1.Information–Business Intelligence or a Diversion from the Truth? President Obama used part of his commencement address at Virginia’s Hampton University to criticize the flood of incomplete information or downright incorrect information that flows in the 24-hour news cycle. The president said, “You’re coming of age in a 24/7 media environment that bombards us with all kinds of content and exposes us to all kinds of arguments, some of which don’t always rank all that high on the truth meter. With iPods and iPads and Xboxes and PlayStations—none of which I know how to work—information becomes a distraction, a diversion, a form of entertainment, rather than a tool of empowerment, rather than the means of emancipation.”11 Do you agree or disagree with President Obama’s statement? Who is responsible for verifying the accuracy of online information? What should happen to companies that post inaccurate information? What should happen to individuals who post inaccurate information? What should you remember when reading or citing sources for online information?
2.Illegal Database Access Goldman Sachs has been hit with a $3 million lawsuit by a company that alleges the brokerage firm stole intellectual property from its database that had market intelligence facts. The U.S. District Court for the Southern District of New York filed the lawsuit in 2010 claiming Goldman Sachs employees used other people’s access credentials to log on to Ipreo’s proprietary database, dubbed Bigdough. Offered on a subscription basis, Bigdough provides detailed information on more than 80,000 contacts within the financial industry. Ipreo complained to the court that Goldman Sachs employees illegally accessed Bigdough at least 264 times in 2008 and 2009.12 Do you agree or disagree with the lawsuit? Should Goldman Sachs be held responsible for rogue employees’ behavior? What types of policies should Goldman Sachs implement to ensure that this does not occur again?
3.Data Storage Information is one of the most important assets of any business. Businesses must ensure information accuracy, completeness, consistency, timeliness, and uniqueness. In addition, business must have a reliable backup service. In part thanks to cloud computing, there are many data hosting services on the Internet. These sites offer storage of information that can be accessed from anywhere in the world. These data hosting services include Hosting (www.hosting.com), Mozy (www.mozy.com), My Docs Online (www.mydocsonline.com), and Box (www.box.net). Visit a few of these sites along with several others you find through research. Which sites are free? Are there limits to how much you can store? If so, what is the limit? What type of information can you store (video, text, photos, etc.)? Can you allow multiple users with different passwords to access your storage area? Are you contractually bound for a certain duration (annual, etc.)? Are different levels of services provided such as personal, enterprise, and work group? Does it make good business sense to store business data on the Internet? What about personal data?