BUS 308 Week 3 Lecture 3
This lecture focuses on the Chi Square, how to set-up the data tables and how to conduct the Chi Square tests on distributions. All the chi square related functions are found in the fx or Formulas list. None of these are found in the Data Analysis tab.
The chi square test compares the actual or observed count distribution across groups (such as how many in each grade) against an expected distribution. We will see that different ways exist to define what this expected distribution is.
Chi Square Tests
With the Chi Square tests, we are going to move from looking at population parameters, such as means and standard deviations, and move to looking at patterns or distributions. Generally, when looking at distributions and patterns we will create groups within our variable of interest. For example, the variable grades is already divided into 6 groups. The Compa-ratio range could be divided into quartiles (4 groups); etc. Most variables can be subdivided this way.
The Chi Square distribution then examines the differences between what we see (actual counts per group) and what we expect in each group. Once we have these two counts, the actual calculation of the Chi Square statistic is:
∑ (Observed count – Expected count)^2/(Expected count).
This is simply the sum (∑) of the squared differences between what we saw and what we expected) divided by our expected count. The expected values are obviously critical to outcomes with this test, and they can be developed in several different ways if they are not already known. These approaches depend upon the complexity of the situation and will be discussed below.
First, we will determine if the compa-ratios are evenly distributed in the quartiles; in theory, a compa-ratio generally ranges from .8 to 1.2; this range is the most typical found in companies. A second example will be closer to the question asked in the assignment. We will look at if the males and females are distributed in a pyramid shape – most in the lower grades, and fewer and fewer in the higher grades – typical of a hierarchical company pyramid shape.
Both of these tests will use counts (how many) rather than the measurements (how much) we have been using to date.
The Chi Square tests use the difference between an actual distribution/counts and an expected distribution to reach decisions on the similarity or difference in patterns. One of the simplest examples of when to use this test is in testing the “fairness” of a single 6-sided die (half of a pair of dice). Over the long run, if we tossed it a “lot of times” we would expect to see each of the 6 numbered faces show up the same number of times. Of course, over a somewhat smaller number of tosses, say 60, we would not expect to see exactly the same number for each face, but would expect the counts to be close. Comparing the actual counts of how many times each face number showed with the expected count of 10 (which equals the number of tosses in out sample of 60 divided by the number of possible outcomes or groups which equal 6) for each face, would give us our answer on whether the die was fair or biased.
This lecture will look at two related Chi Square tests. The first, called the Goodness of Fit Test, involves a single row of counts, such as with the die example above. The second is called the Contingency Table analysis involves multiple rows in the table, such as we might have if we looked at how males and females were distributed for some measure. Both are calculated the same way.
Chi Square
Two input tables are required for all Chi Square test set-ups. The first table is the “actual” or “observed” counts, a table showing how many items fit into each group we care about. The second is a table showing the expected counts.
Example
The assignment does not ask for a simple 1 row table of counts, a Goodness of Fit test; but we will start with this simple example first. In the goodness of f