CHAPTER 5
Analyzing Performance Measures
Think of data as the raw materials that you convert into information. Data by themselves, however, are not likely to be very useful. Column after column of numbers mean very little. To make the data meaningful you need to organize and present them, that is, the data need to become information. As you skim a newspaper, listen to a presentation, or read a report you may mistakenly assume that creating the graphs and statistics took little work. This is not true. Someone thought through how to present the data so that you and others could quickly understand and interpret them.
One of your tasks as a program manager is to decide how to organize and present data. There is no single best way to graph or analyze a set of data. You may create several graphs and try different statistics as you search for patterns that make sense. In this chapter we focus on the basic tasks for organizing performance data: entering data into a spreadsheet, creating tables and graphs, and describing variations in individual variables. The same skills apply to surveys, program evaluations, and community assessment, which we cover in later chapters. Also note that in Chapter 8 we will discuss analyzing relationships between and among variables. First, however, we will cover the terminology of measurement scales. Familiarity with these terms will facilitate our discussion of various statistics in this chapter and later.
MEASUREMENT SCALES
Measurement scales or levels of measurement describe the relationship among the values of a variable. You will find the terminology associated with measurement scales useful as you decide what statistics to use. The basic scales are nominal, ordinal, interval, and ratio scales.
Nominal scales identify and label the values of a variable. You cannot place the values of a nominal variable along a continuum; nor can you rank individual cases according to their values. Even though numbers are sometimes assigned, these numbers have no particular importance beyond allowing you to classify and count how many cases belong in each category. For example, imagine an organization, the Happy Housing Center, records why people seek its services. The variable Reason for Seeking Services has four values: “Laid off or lost job,” “Rental housing needs repairs,” “Rent increased,” “Eviction.” A nominal scale reports how many requests for services are in each category:
1 = Laid off or lost job
2 = Rental housing needs repairs
3 = Rent increase
4 = Eviction
The numbers are simply a device to identify categories; letters of the alphabet or other symbols could replace the numbers and the meaning of the scale would be unchanged. Remember, too, that values of nominal scales are not ranked. Thus, the numbering system in our example does not imply that an eviction has a greater or lesser value than being laid off.
Ordinal scales identify and categorize values of a variable and put the values in rank order. Ordinal scales rank the values without regard to the distance between values. Ordinal scales report that one case has more or less of the characteristic than another case does. If you can rank values but do not know how far apart they are, you have an ordinal scale. You assign numbers to the values in the same order as the ranking implied by the scale. For example, the value represented by 3 is greater than the value represented by 2, and the value represented by 2 is greater than the value represented by 1. The numbers indicate only that one value is more or less than another; they do not imply that a value represents an amount. Let’s look at how you could assign numbers to respondents’ answers to the statement “The Happy Housing Center staff provided me with accurate information.”
5 = Strongly agree
4 = Agree
3 = Neither agree nor disagree
2 = Disagree
1 = Strongly disagree
You can use other numbering schemes as long as the numbers preserve the rank order of the categories. For example, you could reverse the order and number Strongly agree as 1 and Strongly disagree as 5. Alternatively, you could skip numbers and number the categories 10, 8, 6, 4, and 2. Because you cannot determine the distance between values, you cannot argue that a client who answers “Strongly agree” to all items is five times more satisfied than a client who answers “Strongly disagree” to all items.
Rankings commonly produce an ordinal scale. For example, a supervisor may rank 10 employees and give the best employee a 10 and the worst a 1. The persons rated 10 and 9 may be exceptionally good, and the supervisor may have a hard time deciding which one is better. The employee rated with 8 may be good, but not nearly as good as the top two. Hence, the difference between employee 10 and employee 9 may be very small and much less than the difference between employee 9 and employee 8.
Interval and ratio scales assign numbers corresponding to the magnitude of the variable being measured. Interval scales do not have an absolute zero; ratio scales do. The most common example of an interval scale is the temperature scale. We know that 40°F is 20° warmer than 20°F, but we cannot say that 40° is twice as warm as 20°. In the Fahrenheit temperature scale, zero is an arbitrary point; heat exists at 0°F.
The numbers you assign to a ratio scale could be the actual number of persons working in an agency, the number of homeless persons in a given year, or the amount of per capita income in a city. You can add or subtract the values in ratio scale. If the Happy Housing Center had 100 service requests in January and 50 in February, you can say that the number of requests fell by 50 in February. You may also note that the center had half as many requests in February. And at zero, there are no service requests. Table 5.1 summarizes information on these four levels of measurement.
In practice, the boundaries between ordinal and interval scales and interval and ratio scales may be blurred. If an ordinal scale has a large number of values, analysts may assume that it approximates an interval scale. Similarly, the summed values from a set of questions, such as the six questions that measured orientation quality in Chapter 2, may be treated as a ratio scale.
You may mistakenly assume that categories consisting of numerical data form a ratio scale. They do not. Rather, such a scale would be ordinal. For example, the following categories constitute an ordinal scale: under 20 years old, 20 to 29 years old, 30 to 39 years old, and so forth. The exact distance, that is, the age difference between any two people, cannot be determined. While we know a person who checks off 20 to 29 years old is younger than a person who checks off 30 to 39 years old; the age difference between them could be a few days or nearly 10 years.
TABLE 5.1
Levels of Measurement
ENTERING DATA ON A SPREADSHEET
The first step in organizing data is to enter them into a spreadsheet. A spreadsheet may be a piece of paper with rows and columns or a software tool, such as Microsoft Excel. When you open an electronic spreadsheet a workbook with rows and columns appears on the computer screen. Each column can represent a variable and each row can represent a case. You can enter numbers or text into each cell. Entering data is relatively easy. In addition to storing data most spreadsheet programs can calculate basic statistics and create graphs. The graphs can be moved into text documents as part of a report. The data can be readily transferred into a sophisticated statistical software package for more intense analysis.
Before you enter the data you should have a plan for how you will use them. You will want to decide whether to enter words or letters as opposed to numbers. Statistical analysis is easier with numbers; but words and letters are easier to understand. To illustrate other common decisions consider the data on Robert and Anne, two clients who visited a Happy Housing Center. The counselor who worked with them recorded the following.
Name
Gender
Age
Reason for Requesting Services
F. Robert
Male
54
Needs Help Obtaining Apartment
P. Anne
Female
39
Received Eviction Notice
The names and the values of three variables, with the exception of age, could be entered as text, but statistical analysis may be easier if numbers represent the values for gender and reason for requesting services. A “1” may be entered for Males and a “2” for Females; “1” may be assigned to Needs Help Obtaining Housing, “2” for Received Eviction Notice, and so on. If numbers were used, the information for Robert and Anne would look like this:
Name
Gender
Age
Reason for Requesting Services
F. Robert
1
54
1
P. Anne
2
39
2
You can decide what numbers to assign to a value. Instead of “1” for males and “2” for females, other numbers could be used. If the values constitute an ordinal scale, the numbers assigned should correspond to a continuum. Responses ranging from Very satisfied to Very dissatisfied could just as easily been “5” for Very dissatisfied and “1” for Very satisfied.
Data for a given case may be incomplete. Data in existing records may be missing, a respondent may have refused to answer a question, or some items on a form may not be legible. A missing value may be represented by “missing,” or “not applicable,” or a 9, 99, or another number that is noticeably larger than the other values. A large number may enable you to easily identify and track missing data. It also reduces the risk of including missing data in statistical calculations, because a 9 or 99 will often yield a statistical result that doesn’t make sense. For example, assume that the Housing Center also recorded the number of times a client had been evicted. If the arithmetic average for this turned out as a high number, say 65, then you would know something was wrong. However, if a missing answer for this variable was entered as 0, then you might not notice that this value was included in calculating the mean.
COUNTING THE VALUES: FREQUENCY DISTRIBUTIONS
The first step in any analysis is to obtain a frequency distribution for each variable. A frequency distribution lists the values or categories for the variable and the number of cases with each value. For example, a frequency distribution of gender in the database containing Robert’s and Anne’s information would report the number of males and females. A frequency distribution for age would report the number of respondents who were 22, 23, 24 years old, and so on. More likely you would combine the age categories, for example, less than 20 years old, 20–29 years old, 30–39 years old, and so on. The categories must be set so that you count each case in one and only one category.
Some variables lend themselves to grouping. If the differences between some groups’ values are of little interest, you may combine responses. For example, you may combine “Strongly Agree” with Agree and “Strongly Disagree” with Disagree. A variable, such as income, may have so many values that you cannot easily discern how it varies without grouping the values in some way. Presenting each value of income separately may result in a large number of values, most of which will have few, if any, cases. Therefore, you need to decide how many categories to create and how wide each category should be. The categories should not be so wide that important differences are overlooked. Nor should they be so narrow that many intervals are required. Using equal intervals to group ages, by decades as shown in Table 5.2, is a common way to create categories, but for some variables equal intervals may mask important differences among cases. A frequency distribution for income will often have many cases in the lower income categories and few in the higher income categories. To give a more accurate picture you may want to create narrower categories at the lower income levels and wider intervals at the higher income levels.
TABLE 5.2
Frequency Distribution of Ages of Happy Housing Center Clients
Age (in Years)
Number of Cases
Less than 20
2
20–29
6
30–39
12
40–49
22
50–59
20
60–69
17
70–79
3
Total
82
Relative Frequency Distribution
Relative frequency distributions report the percentage of cases for each value. (A percent is calculated by dividing the frequency of one value of the variable by the total number of cases and multiplying the result by 100.) The percentages allow you to compare the frequency of different values in the same distribution or to compare values in two or more frequency distributions that have different numbers of cases. Almost everyone is familiar with percentages and can quickly interpret them. In our experience politicians and the public are comfortable with findings that are either preceded by a dollar sign or followed by a percent. So don’t hesitate to report percentages and make them the focus of a presentation.
You can report frequency distributions and relative frequency distributions in the same table. Uncluttered tables are easier to read; therefore, you may want to report the relative frequency for each value and include enough information so that a reader can determine the number of cases represented by each percent. You may report the total number of cases as a column total as in Table 5.2, in the table title or a part of the row or column label as in Table 5.3.
Cumulative Relative Frequency Distribution
You may want to show more than just the percentage of cases in a specific category, or you may want to indicate the percentage of the cases below a given value. A cumulative relative frequency distribution provides this information. To obtain the cumulative percentage, the percentages up to the given value are added together. Table 5.3 includes the relative frequency distribution and cumulative percentage distribution for the data in Table 5.2. The percentage of cases column gives the total number of cases, so you can multiply the total number of cases by the percentage to learn how many cases are found in each category. Remember to convert the percentage to a decimal before multiplying. The last column reports the cumulative percentage for the distribution of ages. Note, for example, that 24.4 percent of the clients seeking housing assistance were under the age of 40. Note also that the Number of Cases column of Table 5.2 is omitted from Table 5.3.
TABLE 5.3
Distribution of Client Age in Happy Housing Center
Age (in Years)
Percentage of Cases (N = 82)
Cumulative Percentage
Less than 20
2.4
2.4
20–29
7.3
9.7
30–39
14.6
24.4
40–49
26.8
51.2
50–59
24.4
75.6
60–69
20.7
96.3
70–79
3.7
100.0
Presenting Data Visually
Once you know the frequencies, you need to decide how to present them. Visual presentations of data often illustrate points more clearly than do verbal descriptions. Some people who are not comfortable with tables and statistics seem to have no trouble interpreting a well-done graph. Visual displays1 should
serve a clear purpose; to describe, explore, or elaborate;
show the data and make them coherent;
encourage the eye to compare different pieces of data;
entice the reader or listener to think about the information;
avoid distorting what the data have to say;
enhance the statistical and verbal descriptions of the data.
A graph should be able to stand on its own. You should give each graph a descriptive title and label its variables and their values. Use footnotes to clarify any terms that need an explanation to be interpreted correctly and to identify the source of the data. You will find it useful to indicate the date when a graph was produced because, as you analyze the data, you may create multiple versions as you correct errors and make changes in how you combine values. If graphs are not dated you can lose time trying to determine which version is the most recent.
Pie Charts
A pie chart consists of a complete circle, or pie, with wedges. The circle represents 100 percent of the values of the variable displayed. The size of each wedge or “slice of the pie” corresponds to each value’s percentage of the total. Figure 5.1 depicts a pie chart showing the primary reason why people requested services from the Happy Housing Center.
A common convention is to place the largest slice of the pie at the 12 o’clock position. The other slices should follow in a clockwise direction according to size, with the smallest slice last. This rule, however, does not apply if you create two pie charts to make a comparison, for example, if you want to compare the reasons for requesting services from the Happy Housing Center last year and 10 years ago.
Pie charts work well for oral presentations and short written reports, because an audience can quickly understand the depicted information. They may effectively illustrate differences among organizations, locations, and dates. You should avoid using a pie chart that requires a large number of slices. An audience may find it difficult to differentiate between the slices, especially if it is comparing two charts. Furthermore, with many slices you may have trouble finding distinctive colors or patterns to distinguish one slice from another.
FIGURE 5.1 Reason for Visiting Homeless Office – Pie Chart
Bar Graphs and Histograms
Bar graphs are alternatives to pie charts. You place the value of the variable along one axis and the frequency or percentage of cases along the other. The length of the bar indicates the number or percentage of cases possessing each value of the variable. Figure 5.2 depicts a bar graph that includes the same data as the Figure 5.1 pie chart.
Whether the bars are vertical or horizontal depends on which arrangement communicates more effectively and clearly. All bars in a graph should have the same width. If you use different widths for the bars, you risk implying that some values are more important than others, which is misleading. You can also use bar charts to compare organizations, locations, or dates. For example, if you had historical data you could compare this year’s data in Figure 5.2 to data from 10 years ago. To do this you would place a bar representing the percentage laid off 10 years ago next to the first column. The percentage whose rental house needs repairs would go next to the second column and so on.
A histogram represents ratio variables. Figure 5.3 shows an example of a histogram. Each column of the histogram represents a range of values; for example, in Figure 5.3 the second column represents the range 20–24 years old, and the third column the range 25–29 years. The columns adjoin one another because the range of values is continuous. The variable and its values are displayed along the horizontal axis, and the frequency or percentage of cases is displayed along the vertical axis. A histogram is similar to a bar graph, but unlike bar graphs its widths can vary. For example, the widths would vary if the age groupings for the columns were as follows: less than 20, 20–24, 25–29, 30–44, 45–59, and over 60.
FIGURE 5.2 Reason for Visiting Homeless Office – Bar Chart
FIGURE 5.3 Age of Clients of Homeless Office
Time Series Graphs
To track performance over time you will want to use time series graphs. They are valuable to monitor performance, show changes, and demonstrate the impact of a policy. Users can easily and quickly discern changes from one time period to another, trends over time, and the frequency and extent of irregular fluctuations. (Recall that Chapter 4 discussed the changes over time you should look for.) To create a time series graph you put time, whether it is days, months, or years, on the horizontal axis, and the values of the variable on the vertical axis. For each time period place a dot at the intersection of the time period and the variable’s value, and draw a line to connect the dots. Figure 5.4 shows an example of a time series graph.
FIGURE 5.4 Number of Clients by Month
RATES AND PERCENTAGE CHANGE
We were tempted to title this section “putting your elementary school math to use.” Calculation of rates and percentage changes requires only basic math skills. Both rates and percentage changes contain valuable information that policy makers and the public can understand and react to.
Rates
Rates report the number of cases experiencing an event as a proportion of the number of cases that could have experienced that event over a specific time period. Commonly reported rates include the unemployment rate and the crime rate. The unemployment rate reports the number of unemployed individuals as a percentage of the number who could have been employed (employed + unemployed). (The discerning reader will note that the number who “could have been employed” must be carefully defined. One common definition includes the “employable population actively looking for work.” This would exclude those who are not seeking employment.) You may report rates as percents or use a base number other than 100. For example, cities, states, and nations report the annual rates of violent crimes as the number of occurrences for every 1,000 residents.
Assume that a county agency wants to compare the extent of homelessness in its community with that of other jurisdictions. Knowing the number of homeless may be valuable, but knowing how the problem in large cities compares with that in small cities or suburbs is also important. Rates allow such comparisons even though the cities may vary greatly in size.
An important decision is what to put in the denominator, that is, the number who could have been homeless. For many rates, the denominator is population size, but not always. As our definition of rates implies, the denominator for the unemployment rate excludes certain population groups, such as the very young. The selection of the denominator may appear to be somewhat arbitrary. Take, for example, contraceptive use. To compare data on contraceptive use by putting the entire population—which also includes men and children—in the denominator would not give an accurate a picture. A far better method would be to use either the number of women of childbearing age or the number of married women. Either denominator more accurately estimates the number at risk; at risk is another way of thinking about the number of possible occurrences. Deciding between the number of women of childbearing age and the number of married women may largely depend on the availability of data.
Consider two counties Moburg and Robus and the number of infant deaths in each. In one year Moburg had 104 infant deaths and Robus had 20. Which community had the greater problem? Directly comparing the number of deaths would be misleading because Moburg has 511,400 inhabitants and Robus has 106,000. Dividing the frequency of infant deaths in each county by the county population produces a more useful comparison. For Moburg we divided 104 by 511,400 which equals 0.0002033. For Robus, we divided 20 by 106,000 and obtained 0.00018886. The decimal values are so small that they might be ignored or interpreted incorrectly. Multiplying each decimal by 10,000 converts the data into figures that are more easily understood. We would report that Moburg has 2.033 infant deaths per 10,000 inhabitants and Robus has 1.886 infant deaths per 10,000 inhabitants.
The equation to compute a rate is
where
N1 = count for variable of interest
N2 = population or another indicator of number of cases at risk
Base number = a multiple of 10
The following conventions apply in selecting a base number. Remember that these are conventions, not absolute rules.
■ Be consistent and report rates in common use for a specific variable. Crime rates, for example, are usually reported as crimes per 1,000 of the population. Homicide rates, however, may be reported per 100,000 of the population.
■ Select a base number that produces rates with a whole number with at least one digit and not more than four digits.
■ Use the same base number when calculating rates for comparison. For instance, in the example just mentioned, you should not use a base number of 1,000 for Moburg and a base number of 10,000 for Robus.
■ Note that a rate is meaningful only if it is specified for a particular time period, usually a year.
As noted earlier, you should also consider whether the entire population is the appropriate denominator for a rate
Percentage Change
The percentage change measures the amount of change over two points in time. For example, organizations in Moburg County have a campaign to reduce homelessness every year over the next decade. If they know the number of homeless people in any two years, they can report the change as a percent.
The formula for percentage change is
where
N1 = value of the variable at time 1
N2 = value of the variable at time 2
For example, assume that 5,000 individuals were homeless in the first year (time 1) and 4,325 the second year (time 2). The calculations to determine the percentage change would be as follows:
The percentage change can be positive or negative. If the number of homeless in the second year was 5,075, the percentage change would be 1.5%, indicating a 1.5 percentage increase in the homeless population.
CHARACTERISTICS OF A DISTRIBUTION
While a frequency distribution includes useful information, you may want a simple statistic to summarize its content. Measures of central tendency reduce the distribution to a value that represents a typical case, the center of the distribution, or both. Measures of central tendency give an incomplete picture of how typical the typical case is. Focusing only on a typical case may be misleading, since cases may be widely different from one another. Measures of variability fill in this gap and add to your knowledge about the distribution. Measures of variability show how representative the typical case is by giving you information on how spread out or dispersed all of the cases are and how far they are from a central point. Several statistics are used to measure central tendency and variation. The choice of a statistic depends on the level of measurement of the variable and what information you will find most valuable. Note that measures of central tendency and variability are the same for interval and ratio scales, so to simplify our discussion we will refer only to ratio scales.
Measures of Central Tendency
Measures of central tendency indicate the value that is representative, most typical, or central in the distribution. The most common measures are the mode, median, and arithmetic mean.
Mode: The simplest summary of a variable’s frequency distribution is to indicate which category or value is the most common. The mode is the value or category of a variable that occurs most often. In a frequency distribution it is the value with the highest frequency. Table 5.3 shows a distribution of a variable to the Happy Housing Center, Reason for Requesting Services. The mode is Needs Help Finding An Apartment. More clients came for that reason than for any other. Of course that reason has the highest relative frequency as well.
The mode can be determined for all measurement scales: nominal, ordinal, interval and ratio. If two values have the same frequency, the variable has two modes and is said to be bi-modal. A common mistake is to confuse the frequency of the modal category with the mode. For instance, the mode for the Reason for Requesting Services in Table 5.4 is Needs Help Finding an Apartment. It is not 13. For nominal and many ordinal variables, the category that occurs most often will have a name; for ratio variables, the value of the mode will be a number.
TABLE 5.4
Frequency and Percentage Distributions for Reason For Requesting Services From Housing Center.
Reason for Requesting Services
Number of Clients (N = 50) Percent
Percent
Has Been Evicted
7
14
Needs Help Finding an Apartment
13
26
Rent Has Been Increased
8
16
Lost Job
12
24
Rental House Needs Repairs
10
20
Median: The median is the value or category of the case that is in the middle of a distribution in which the cases have been ordered along a continuum. It is the value of the case that divides the distribution in two; one-half of the cases have values less than the median and one-half of them have values greater than the median. The median requires that variables be measured at the ordinal or ratio level. To find the median, as mentioned, you must order the case values along a continuum. It makes no sense to find the middle case if the cases have not been ordered according to their values on the variable of interest.
To find the median, you locate the middle case in a distribution. If the number of cases is odd, then the median is the value of a specific case. If the number of cases is even, then the median is estimated as the value halfway between two cases. For example, with 11 cases, the median is the value of the 6th case. If there are 12 cases the median is halfway between the value of case number 6 and case number 7. The formula for finding the middle case is (N + 1)/2.
Two examples follow. Table 5.5 shows the distribution of an ordinal variable. The table reports 11 clients’ ratings of a Happy Housing Center transportation program.