Chapter #1:
Beginning of the End … Or the End of the
Beginning?
The past few years have been challenging for Good Tunes & More (GT&M), a
business that traces its roots to Good Tunes, a store that exclusively sold music
CDs and vinyl records.
GT&M first broadened its merchandise to include home entertainment
and computer systems (the “More”), and then undertook an expansion to take
advantage of prime locations left empty by bankrupt former competitors. Today,
GT&M finds itself at a crossroads. Hoped-for increases in revenues that have
failed to occur and declining profit margins due to the competitive pressures of
online sellers have led management to reconsider the future of the business.
While some investors in the business have argued for an orderly retreat,
closing
stores and limiting the variety of merchandise, GT&M CEO Emma Levia
has decided to “double down” and expand the business
by purchasing Whitney
Wireless, a successful three-store chain that sells smartphones
and other mobile
devices.
Levia foresees creating a brand new “A-to-Z” electronics retailer but
first must establish a fair and reasonable price for the privately held Whitney
Wireless.
To do so, she has asked a group of analysts to identify the data that
would be helpful in setting a price for the wireless business. As part of that
group, you quickly realize that you need the data that would help to verify the
contents of the wireless company’s basic financial statements.
You focus on data associated with the company’s profit and loss statement
and quickly realize the need for sales and expense-related
variables.
You begin to
think about what the data for
such variables would look
like and how to collect those
data. You realize that you are
starting to apply the DCOVA
framework to the objective
of helping Levia acquire
Whitney Wireless.
Chapter Defining and
1 Collecting Data
Tyler Olson/Shutterstock
contents
1.1 Defining Variables
1.2 Collecting Data
1.3 Types of Sampling Methods
1.4 Types of Survey Errors
Think About This: New Media
Surveys/Old Sampling Problems
Using Statistics: Beginning of
the End … Revisited
Chapter 1 Excel Guide
Chapter 1 Minitab Guide
Objectives
Understand issues that arise
when defining variables
How to define variables
How to collect data
Identify the different ways to
collect a sample
Understand the types of
survey errors
Business Statistics: A First Course, Seventh Edition, by David M. Levine, Kathryn A. Szabat, and David F. Stephan. Published by Pearson.
Copyright © 2016 by Pearson Education, Inc.
ISBN: 978-1-323-26258-0
1.1 Defining Variables 11
When Emma Levia decides to purchase Whitney Wireless, she has defined a new
goal or business objective for GT&M. Business objectives can arise from any
level of management and can be as varied as the following:
• A marketing analyst needs to assess the effectiveness of a new online advertising campaign.
• A pharmaceutical company needs to determine whether a new drug is more effective
than those currently in use.
• An operations manager wants to improve a manufacturing or service process.
• An auditor needs to review a company’s financial transactions to determine whether the
company is in compliance with generally accepted accounting principles.
Establishing an objective marks the end of a problem definition process. This end triggers
the new process of identifying the correct data to support the objective. In the GT&M scenario,
having decided to buy Whitney Wireless, Levia needs to identify the data that would be helpful
in setting a price for the wireless business. This process of identifying the correct data triggers
the start of applying the tasks of the DCOVA framework. In other words, the end of problem
definition marks the beginning of applying statistics to business decision making.
Identifying the correct data to support a business objective is a two-part job that requires
defining variables and collecting the data for those variables. These tasks are the first two tasks
of the DCOVA framework first defined in Section GS.1 and which can be restated here as:
• Define the variables that you want to study to solve a problem or meet an objective.
• Collect the data for those variables from appropriate sources.
This chapter discusses these two tasks which must always be done before the Organize, Visualize,
and Analyze tasks.
Defining variables at first may seem to be the simple process of making the list of things one
needs to help solve a problem or meet an objective. However, consider the GT&M scenario.
Most would quickly agree that yearly sales of Whitney Wireless would be part of the data
needed to meet Levia’s objective, but just placing “yearly sales” on a list could lead to confusion
and miscommunication: Does this variable refer to sales per year for the entire chain or
for individual stores? Does the variable refer to net or gross sales? Are the yearly sales values
expressed in number of units or as currency amounts such as U.S. dollar sales?
These questions illustrate that for each variable of interest that you identify you must supply
an operational definition, a universally accepted meaning that is clear to all associated
with an analysis. Operational definitions should also classify the variable, as explained in the
next section, and may include additional facts such as units of measures, allowed range of
values, and definitions of specific variable values, depending on how the variable is classified.
Classifying Variables by Type
When you operationally define a variable, you must classify the variable as being either categorical
or numerical. Categorical variables (also known as qualitative variables) take categories
as their values. Numerical variables (also known as quantitative variables) have values
that represent a counted or measured quantity. Classification also affects a variable’s operational
definition and getting the classification correct is important because certain statistical methods
can be applied correctly to one type or the other, while other methods may need a specific mix
of variable types.
Categorical variables can take the form of yes-and-no questions such as “Do you have a
Twitter account?” (in which yes and no form the variable’s two categories) or describe a trait
or characteristic that has many categories such as undergraduate class standing (which might
have the defined categories freshman, sophomore, junior, and senior). When defining a categorical
variable, the list of permissible category values must be included and each category
1.1 Defining Variables
Student Tip
Providing operational
definitions for concepts
is important, too, when
writing a textbook! The
end-of-chapter Key
Terms gives you an index
of operational definitions
and the most fundamental
definitions are
presented in boxes such
as the page 3 box that
defines variable and data.
Business Statistics: A First Course, Seventh Edition, by David M. Levine, Kathryn A. Szabat, and David F. Stephan. Published by Pearson.
Copyright © 2016 by Pearson Education, Inc.
ISBN: 978-1-323-26258-0
12 Chapter 1 Defining and Collecting Data
value should be defined, too, e.g., that a “freshman” is a student who has completed fewer
than 32 credit hours. Overlooking these requirements can lead to confusion and incorrect data
collection. In one famous example, when persons were asked by researchers to fill in a value
for the categorical variable sex, many answered yes and not male or female, the values that the
researchers intended. (Perhaps this is the reason that gender has replaced sex on many data collection
forms—gender’s operational definition is more self-apparent.)
The operational definitions of numerical variables are affected by whether the variable being
defined is discrete or continuous. Discrete variables such as “number of items purchased”
or “total amount paid” are numerical values that arise from a counting process. Continuous
variables such as “time spent on checkout line” or “distance from home to store” have numerical
values that arise from a measuring process and those values depend on the precision of the
measuring instrument used. For example, “time spent on checkout line” might be 2, 2.1, 2.14,
or 2.143 minutes, depending on the precision of the timing instrument being used. Units of
measures and the level of precision should be part of the operational definitions of continuous
variables, e.g., “tenths of a second” for “time spent on checkout line.” The definitions of any
numerical variable can include the allowed range of values, such as “must be greater than 0”
for “number of items purchased.”
When defining variables for survey collection (discussed in Section 1.2), thinking about
the responses you seek helps classify variables as Table 1.1 demonstrates. Thinking about how
a variable will be used to solve a problem or meet an objective can also be helpful when you
define a variable. The variable age might be a numerical (discrete) variable in some cases or
might be categorical with categories such as child, young adult, middle-aged, and retirement
aged in other contexts.
Problems for Section 1.1
Learning the Basics
1.1 Four different beverages are sold at a fast-food restaurant:
soft drinks, tea, coffee, and bottled water. Explain why the
type of beverage sold is an example of a categorical variable.
1.2 U.S. businesses are listed by size: small, medium, and large. Explain
why business size is an example of a categorical variable.
1.3 The time it takes to download a video from the Internet is
measured. Explain why the download time is a continuous
numerical variable.
Applying the Concepts
SELF
Test
1.4 For each of the following variables, determine
whether the variable is categorical or numerical. If the
variable is numerical, determine whether the variable is discrete or
continuous.
a. Number of…