Chapter #1:
Beginning of the End … Or the End of the
Beginning?
The past few years have been challenging for Good Tunes & More (GT&M), a
business that traces its roots to Good Tunes, a store that exclusively sold music
CDs and vinyl records.
GT&M first broadened its merchandise to include home entertainment
and computer systems (the “More”), and then undertook an expansion to take
advantage of prime locations left empty by bankrupt former competitors. Today,
GT&M finds itself at a crossroads. Hoped-for increases in revenues that have
failed to occur and declining profit margins due to the competitive pressures of
online sellers have led management to reconsider the future of the business.
While some investors in the business have argued for an orderly retreat,
closing
stores and limiting the variety of merchandise, GT&M CEO Emma Levia
has decided to “double down” and expand the business
by purchasing Whitney
Wireless, a successful three-store chain that sells smartphones
and other mobile
devices.
Levia foresees creating a brand new “A-to-Z” electronics retailer but
first must establish a fair and reasonable price for the privately held Whitney
Wireless.
To do so, she has asked a group of analysts to identify the data that
would be helpful in setting a price for the wireless business. As part of that
group, you quickly realize that you need the data that would help to verify the
contents of the wireless company’s basic financial statements.
You focus on data associated with the company’s profit and loss statement
and quickly realize the need for sales and expense-related
variables.
You begin to
think about what the data for
such variables would look
like and how to collect those
data. You realize that you are
starting to apply the DCOVA
framework to the objective
of helping Levia acquire
Whitney Wireless.
Chapter Defining and
1 Collecting Data
Tyler Olson/Shutterstock
contents
1.1 Defining Variables
1.2 Collecting Data
1.3 Types of Sampling Methods
1.4 Types of Survey Errors
Think About This: New Media
Surveys/Old Sampling Problems
Using Statistics: Beginning of
the End … Revisited
Chapter 1 Excel Guide
Chapter 1 Minitab Guide
Objectives
Understand issues that arise
when defining variables
How to define variables
How to collect data
Identify the different ways to
collect a sample
Understand the types of
survey errors
Business Statistics: A First Course, Seventh Edition, by David M. Levine, Kathryn A. Szabat, and David F. Stephan. Published by Pearson.
Copyright © 2016 by Pearson Education, Inc.
ISBN: 978-1-323-26258-0
1.1 Defining Variables 11
When Emma Levia decides to purchase Whitney Wireless, she has defined a new
goal or business objective for GT&M. Business objectives can arise from any
level of management and can be as varied as the following:
• A marketing analyst needs to assess the effectiveness of a new online advertising campaign.
• A pharmaceutical company needs to determine whether a new drug is more effective
than those currently in use.
• An operations manager wants to improve a manufacturing or service process.
• An auditor needs to review a company’s financial transactions to determine whether the
company is in compliance with generally accepted accounting principles.
Establishing an objective marks the end of a problem definition process. This end triggers
the new process of identifying the correct data to support the objective. In the GT&M scenario,
having decided to buy Whitney Wireless, Levia needs to identify the data that would be helpful
in setting a price for the wireless business. This process of identifying the correct data triggers
the start of applying the tasks of the DCOVA framework. In other words, the end of problem
definition marks the beginning of applying statistics to business decision making.
Identifying the correct data to support a business objective is a two-part job that requires
defining variables and collecting the data for those variables. These tasks are the first two tasks
of the DCOVA framework first defined in Section GS.1 and which can be restated here as:
• Define the variables that you want to study to solve a problem or meet an objective.
• Collect the data for those variables from appropriate sources.
This chapter discusses these two tasks which must always be done before the Organize, Visualize,
and Analyze tasks.
Defining variables at first may seem to be the simple process of making the list of things one
needs to help solve a problem or meet an objective. However, consider the GT&M scenario.
Most would quickly agree that yearly sales of Whitney Wireless would be part of the data
needed to meet Levia’s objective, but just placing “yearly sales” on a list could lead to confusion
and miscommunication: Does this variable refer to sales per year for the entire chain or
for individual stores? Does the variable refer to net or gross sales? Are the yearly sales values
expressed in number of units or as currency amounts such as U.S. dollar sales?
These questions illustrate that for each variable of interest that you identify you must supply
an operational definition, a universally accepted meaning that is clear to all associated
with an analysis. Operational definitions should also classify the variable, as explained in the
next section, and may include additional facts such as units of measures, allowed range of
values, and definitions of specific variable values, depending on how the variable is classified.
Classifying Variables by Type
When you operationally define a variable, you must classify the variable as being either categorical
or numerical. Categorical variables (also known as qualitative variables) take categories
as their values. Numerical variables (also known as quantitative variables) have values
that represent a counted or measured quantity. Classification also affects a variable’s operational
definition and getting the classification correct is important because certain statistical methods
can be applied correctly to one type or the other, while other methods may need a specific mix
of variable types.
Categorical variables can take the form of yes-and-no questions such as “Do you have a
Twitter account?” (in which yes and no form the variable’s two categories) or describe a trait
or characteristic that has many categories such as undergraduate class standing (which might
have the defined categories freshman, sophomore, junior, and senior). When defining a categorical
variable, the list of permissible category values must be included and each category
1.1 Defining Variables
Student Tip
Providing operational
definitions for concepts
is important, too, when
writing a textbook! The
end-of-chapter Key
Terms gives you an index
of operational definitions
and the most fundamental
definitions are
presented in boxes such
as the page 3 box that
defines variable and data.
Business Statistics: A First Course, Seventh Edition, by David M. Levine, Kathryn A. Szabat, and David F. Stephan. Published by Pearson.
Copyright © 2016 by Pearson Education, Inc.
ISBN: 978-1-323-26258-0
12 Chapter 1 Defining and Collecting Data
value should be defined, too, e.g., that a “freshman” is a student who has completed fewer
than 32 credit hours. Overlooking these requirements can lead to confusion and incorrect data
collection. In one famous example, when persons were asked by researchers to fill in a value
for the categorical variable sex, many answered yes and not male or female, the values that the
researchers intended. (Perhaps this is the reason that gender has replaced sex on many data collection
forms—gender’s operational definition is more self-apparent.)
The operational definitions of numerical variables are affected by whether the variable being
defined is discrete or continuous. Discrete variables such as “number of items purchased”
or “total amount paid” are numerical values that arise from a counting process. Continuous
variables such as “time spent on checkout line” or “distance from home to store” have numerical
values that arise from a measuring process and those values depend on the precision of the
measuring instrument used. For example, “time spent on checkout line” might be 2, 2.1, 2.14,
or 2.143 minutes, depending on the precision of the timing instrument being used. Units of
measures and the level of precision should be part of the operational definitions of continuous
variables, e.g., “tenths of a second” for “time spent on checkout line.” The definitions of any
numerical variable can include the allowed range of values, such as “must be greater than 0”
for “number of items purchased.”
When defining variables for survey collection (discussed in Section 1.2), thinking about
the responses you seek helps classify variables as Table 1.1 demonstrates. Thinking about how
a variable will be used to solve a problem or meet an objective can also be helpful when you
define a variable. The variable age might be a numerical (discrete) variable in some cases or
might be categorical with categories such as child, young adult, middle-aged, and retirement
aged in other contexts.
Problems for Section 1.1
Learning the Basics
1.1 Four different beverages are sold at a fast-food restaurant:
soft drinks, tea, coffee, and bottled water. Explain why the
type of beverage sold is an example of a categorical variable.
1.2 U.S. businesses are listed by size: small, medium, and large. Explain
why business size is an example of a categorical variable.
1.3 The time it takes to download a video from the Internet is
measured. Explain why the download time is a continuous
numerical variable.
Applying the Concepts
SELF
Test
1.4 For each of the following variables, determine
whether the variable is categorical or numerical. If the
variable is numerical, determine whether the variable is discrete or
continuous.
a. Number of cellphones in the household
b. Monthly data usage (in MB)
c. Number of text messages exchanged per month
d. Voice usage per month (in minutes)
e. Whether the cellphone is used for email
1.5 The following information is collected
Question Responses Variable Type
Do you have a Facebook
profile?
❑ Yes ❑ No Categorical
How many text messages have
you sent in the past three days?
______ Numerical
(discrete)
How long did the mobile app
update take to download?
______ seconds Numerical
(continuous)
Problems for Section 1.1
Learning the Basics
1.1 Four different beverages are sold at a fast-food restaurant:
soft drinks, tea, coffee, and bottled water. Explain why the
type of beverage sold is an example of a categorical variable.
1.2 U.S. businesses are listed by size: small, medium, and large. Explain
why business size is an example of a categorical variable.
1.3 The time it takes to download a video from the Internet is
measured. Explain why the download time is a continuous
numerical variable.
Applying the Concepts
SELF
Test
1.4 For each of the following variables, determine
whether the variable is categorical or numerical. If the
variable is numerical, determine whether the variable is discrete or
continuous.
a. Number of cellphones in the household
b. Monthly data usage (in MB)
c. Number of text messages exchanged per month
d. Voice usage per month (in minutes)
e. Whether the cellphone is used for email
1.5 The following information is collected from students upon
exiting the campus bookstore during the first week of classes.
a. Amount of time spent shopping in the bookstore
b. Number of textbooks purchased
c. Academic major
d. Gender
Classify each of these variables as categorical or numerical. If the
variable is numerical, determine whether the variable is discrete or
continuous.
1.6 For each of the following variables, determine whether the
variable is categorical or numerical. If the variable is numerical,
determine whether the variable is discrete or continuous.
a. Name of Internet service provider
b. Time, in hours, spent surfing the Internet per week
c. Whether the individual uses a mobile phone to connect to the
Internet
d. Number of online purchases made in a month
e. Where the individual uses social networks to find sought-after
information
Learn More
Read the Short Takes for
Chapter 1 for more examples
of classifying variables
as either
categorical or numerical.
Ta ble 1 . 1
Identifying Types of
Variables
Question Responses Variable Type
Do you have a Facebook
profile?
❑ Yes ❑ No Categorical
How many text messages have
you sent in the past three days?
______ Numerical
(discrete)
How long did the mobile app
update take to download?
______ seconds Numerical
(continuous)
Business Statistics: A First Course, Seventh Edition, by David M. Levine, Kathryn A. Szabat, and David F. Stephan. Published by Pearson.
Copyright © 2016 by Pearson Education, Inc.
ISBN: 978-1-323-26258-0
1.2 Collecting Data 13
1.2 Collecting Data
After defining the variables that you want to study, you can proceed with the data collection
task. Collecting data is a critical task because if you collect data that are flawed by biases,
ambiguities, or other types of errors, the results you will get from using such data with even
the most sophisticated statistical methods will be suspect or in error. (For a famous example of
flawed data collection leading to incorrect results, read the Think About This essay on page 21.)
Data collection consists of identifying data sources, deciding whether the data you collect
will be from a population or a sample, cleaning your data, and sometimes recoding variables.
The rest of this section explains these aspects of data collection.
Data Sources
You collect data from either primary or secondary data sources. You are using a primary data
source if you collect your own data for analysis. You are using a secondary data source if the
data for your analysis have been collected by someone else.
You collect data by using any of the following:
• Data distributed by an organization or individual
• The outcomes of a designed experiment
• The responses from a survey
• The results of conducting an observational study
• Data collected by ongoing business activities
Market research companies and trade associations distribute data pertaining to specific industries
or markets. Investment services provide business and financial data on publicly listed
companies. Syndicated services such as The Nielsen Company provide consumer research data to
telecom and mobile media companies. Print and online media companies also distribute data that
they may have collected themselves or may be republishing from other sources.
The outcomes of a designed experiment are a second data source. For example, a consumer
electronics company might conduct an experiment that compares the sales of mobile
electronics merchandise for different store locations. Note that developing a proper experimental
design is mostly beyond the scope of this book, but Chapter 10 discusses some of the
fundamental experimental design concepts.
Survey responses represent a third type of data source. People being surveyed are asked
questions about their beliefs, attitudes, behaviors, and other characteristics. For example,
people could be asked which store location for mobile electronics merchandise is preferable.
(Such a survey could lead to data that differ from the data collected from the outcomes of the
1.7 For each of the following variables, determine whether the
variable is categorical or numerical. If the variable is numerical,
determine whether the variable is discrete or continuous.
a. Amount of money spent on clothing in the past month
b. Favorite department store
c. Most likely time period during which shopping for clothing
takes place (weekday, weeknight, or weekend)
d. Number of pairs of shoes owned
1.8 Suppose the following information is collected from Robert
Keeler on his application for a home mortgage loan at the Metro
County Savings and Loan Association.
a. Monthly payments: $2,227
b. Number of jobs in past 10 years: 1
c. Annual family income: $96,000
d. Marital status: Married
Classify each of the responses by type of data.
1.9 One of the variables most often included in surveys is income.
Sometimes the question is phrased “What is your income
(in thousands of dollars)?” In other surveys, the respondent is
asked to “Select the circle corresponding to your income level”
and is given a number of income ranges to choose from.
a. In the first format, explain why income might be considered
either discrete or continuous.
b. Which of these two formats would you prefer to use if you
were conducting a survey? Why?
1.10 If two students score a 90 on the same examination,
what arguments could be used to show that the underlying
variable—test score—is continuous?
1.11 The director of market research at a large department store
chain wanted to conduct a survey throughout a metropolitan area
to determine the amount of time working women spend shopping
for clothing in a typical month.
a. Indicate the type of data the director might want to collect.
b. Develop a first draft of the questionnaire needed in (a) by writing
three categorical questions and three numerical questions
that you feel would be appropriate for this survey
One of the variables most often included in surveys is income.
Sometimes the question is phrased “What is your income
1.2 Collecting Data
After defining the variables that you want to study, you can proceed with the data collection
task. Collecting data is a critical task because if you collect data that are flawed by biases,
ambiguities, or other types of errors, the results you will get from using such data with even
the most sophisticated statistical methods will be suspect or in error. (For a famous example of
flawed data collection leading to incorrect results, read the Think About This essay on page 21.)
Data collection consists of identifying data sources, deciding whether the data you collect
will be from a population or a sample, cleaning your data, and sometimes recoding variables.
The rest of this section explains these aspects of data collection.
Data Sources
You collect data from either primary or secondary data sources. You are using a primary data
source if you collect your own data for analysis. You are using a secondary data source if the
data for your analysis have been collected by someone else.
You collect data by using any of the following:
• Data distributed by an organization or individual
• The outcomes of a designed experiment
• The responses from a survey
• The results of conducting an observational study
• Data collected by ongoing business activities
Market research companies and trade associations distribute data pertaining to specific industries
or markets. Investment services provide business and financial data on publicly listed
companies. Syndicated services such as The Nielsen Company provide consumer research data to
telecom and mobile media companies. Print and online media companies also distribute data that
they may have collected themselves or may be republishing from other sources.
The outcomes of a designed experiment are a second data source. For example, a consumer
electronics company might conduct an experiment that compares the sales of mobile
electronics merchandise for different store locations. Note that developing a proper experimental
design is mostly beyond the scope of this book, but Chapter 10 discusses some of the
fundamental experimental design concepts.
Survey responses represent a third type of data source. People being surveyed are asked
questions about their beliefs, attitudes, behaviors, and other characteristics. For example,
people could be asked which store location for mobile electronics merchandise is preferable.
(Such a survey could lead to data that differ from the data collected from the outcomes of the
1.7 For each of the following variables, determine whether the
variable is categorical or numerical. If the variable is numerical,
determine whether the variable is discrete or continuous.
a. Amount of money spent on clothing in the past month
b. Favorite department store
c. Most likely time period during which shopping for clothing
takes place (weekday, weeknight, or weekend)
d. Number of pairs of shoes owned
1.8 Suppose the following information is collected from Robert
Keeler on his application for a home mortgage loan at the Metro
County Savings and Loan Association.
a. Monthly payments: $2,227
b. Number of jobs in past 10 years: 1
c. Annual family income: $96,000
d. Marital status: Married
Classify each of the responses by type of data.
1.9 One of the variables most often included in surveys is income.
Sometimes the question is phrased “What is your income
(in thousands of dollars)?” In other surveys, the respondent is
asked to “Select the circle corresponding to your income level”
and is given a number of income ranges to choose from.
a. In the first format, explain why income might be considered
either discrete or continuous.
b. Which of these two formats would you prefer to use if you
were conducting a survey? Why?
1.10 If two students score a 90 on the same examination,
what arguments could be used to show that the underlying
variable—test score—is continuous?
1.11 The director of market research at a large department store
chain wanted to conduct a survey throughout a metropolitan area
to determine the amount of time working women spend shopping
for clothing in a typical month.
a. Indicate the type of data the director might want to collect.
b. Develop a first draft of the questionnaire needed in (a) by writing
three categorical questions and three numerical questions
that you feel would be appropriate for this survey.
Business Statistics: A First Course, Seventh Edition, by David M. Levine, Kathryn A. Szabat, and David F. Stephan. Published by Pearson.
Copyright © 2016 by Pearson Education, Inc.
ISBN: 978-1-323-26258-0
14 Chapter 1 Defining and Collecting Data
designed experiment of the previous paragraph.) Surveys can be affected by any of the four
types of errors that are discussed in Section 1.4.
Observational study results are a fourth data source. A researcher collects data by directly
observing a behavior, usually in a natural or neutral setting. Observational studies are a common