Minitab And Excel
Assignment 1
Create frequency and summary tables for categorical data
Background: The data provided are from a larger study on college students which also included students’ pulse rate before and after running, height, weight, and exercise level. Since we are only dealing with categorical variables here, I deleted all the other data. Still, the data regarding the smoking status can be seen as a sample taken from the population of college students, since the subjects were enrolled randomly.
Raw data are in posted the Excel file: Class 6 Assignment raw data - smoking status among college students.xls
Use Excel and minitab to carry out the counts of the raw data for each combination of category values.
Combine the frequency counts into a 2-way frequency table.
Calculate the respective proportions and create a 2-way summary table.
Note: This task is very common and very basic, so you should know how to do this in Excel. You may not have minitab available all the time. Plus the minitab worksheet isn’t the best format to work with tables (see below).
Part A. Using Excel
I created a youtube tutorial to walk you through all the steps of this assignment.
http://www.youtube.com/watch?v=RrXRoJDO90U
Watch it until 30 min, 30 seconds. Skip the last 10 minutes! I screwed up a little with my wording for the row and column proportions. Better use the wording I put below.
So here are the steps (as shown in the video as well):
1. Create a frequency table and a summary table for the raw data, in the same worksheet, next to the raw data (as shown in the video).
2. For the frequency table, use the Excel formula COUNTIFS to get the counts automatically inserted into your table fields.
3. Then calculate the marginal totals and the grand total.
4. For the summary table, use the frequencies to calculate the joint proportions.
5. Next, calculate the row proportions and the column proportions and list them in rows underneath the table (not within the table, so these proportions won’t get confused with the joint proportions of the table).
Row proportions in this example:
Proportion of all males that smoke / don’t smoke (=100%)
Proportion of all females that smoke / don’t smoke (=100%)
Column proportions in this example
Proportion of all smokers that are male / female (=100%)
Proportion of all non-smokers that are male / female (=100%)
[You see, there are a lot of ways to calculate proportions. Be sure to keep them apart and when you see a number for a proportion, ask or verify what that proportion actually is.]
When you are done, save the file twice (as I mention in the video): Once with the original title, plus your name, to upload as a homework. Then save it again, calling it “Template to create frequency and summary tables for categorical data.xls” or something like that...
So now each time you have categorical data, you just paste them into the columns of this template and your tables are generated automatically. That may come in handy for the midterm.
(Tip: since you don’t want to lose the template, just always save it under a new name as soon as you enter new data.)
PART B. Generating tables from categorical data with Minitab
1. Copy the 2 columns of raw data into a Minitab worksheet.
2. Then go to Stat > Tables > Descriptive Statistics
3. Fill out the minitab popup menu (it already has found your columns with data):
In your mind, imagine the table you just created in Excel. Or draw a quick sketch of a table.
You want the values for the variable “gender” in separate rows. So these are your row variables. You want the values for the smoking status in separate columns. So these are your column variables. (The convention is to put the variable that predicts an outcome in rows and the variable that measures the outcome in columns. In this case, gender might or might not predict smoking status, but definitely not the other way round!)
4. Put your cursor in the field “For rows”.
5. Click on C2 Gender (it will turn blue)
6. Click the select button. C2 Gender will appear in the field “for rows”.
7. Repeat for the field “For columns” with the variable C1 Smokes.
Note: If you had more columns with variables, these would be treated as layers. Here you don’t have any.
8. Click on “Display summaries for Categorical Variables”
9. In the new menu that appears, check off the box for “Counts” (and nothing else!)
10. Click OK.
Note: You may be tempted to check off all boxes, since in the end we want all of these data. But if you do, the output will be a mess of numbers that are hard to sort (I’ll show you further below). So it pays off to do this one by one.
11. Click OK again on the menu.
12. Your frequency table will appear in the results box for the session. It should look like this:
Notice that minitab calculated the marginal totals as well. So we didn’t need to tell it.
This is done by default.
(It could be switched off in the options box.)
13. Mouse over the field with the table to have a little arrow appear. Click on it and select “copy”
14. Paste the table into an Excel worksheet (maybe in another tab in the same file than you used for the first part of this homework). Note: Tables are much easier to deal with in an Excel worksheet than in a minitab worksheet.
15. Call the table “Frequency table”.
[[optional: To confirm what I just said, paste the same table into the minitab worksheet. It will probably look like this:]]
Minitab doesn’t keep the column headings
straight, but shifts them to the left.
16. Repeat steps 2-14, this time selecting “Total percents”. These correspond to the joint proportions you calculated in Excel. Copy output table into the same Excel worksheet.
17. Call the table “Summary table”
18. Repeat steps 2-14, selecting “Row percents”. Copy result into Excel. Make sure to label the table as “Row percents”.
19. Repeat steps 2-14, selecting “Column percents”. Copy result into Excel.
At this point you will most likely have noticed that minitab is not calculating proportions but percentages.
It’s the same thing, but percentages may be a bit more user-friendly. For doing statistics however they need to be converted back to proportions (which happens in the background with these stats packages, so one doesn’t need to worry).
To show you what I meant, this is what happen when you select the counts and all the percentages to be calculated at once: Here’s the output in the minitab Session result field:
and this is how it looks like when pasted into a minitab worksheet.
There are probably ways to fix this and make it look nicer, but I am not a minitab expert.
Additional Resources:
EXCEL:
Help file for COUNTIFS function and the COUNTIF function (PDF posted on Blackboard)
(also check out what the COUNTIF function can do, even though it can’t be used here).
Microsoft video tutorial for COUNTIFS function
https://support.office.com/client/en-us/videoplayer/embed/RWeqXB?pid=ocpVideo0-innerdiv-oneplayer&jsapi=true&postJsllMsg=true&maskLevel=20&market=en-us
COUNTIFS function is explained in the first 2 min 30 seconds, then the SUMIFS function is explained
Minitab:
https://support.minitab.com/en-us/minitab/18/help-and-how-to/statistics/tables/supporting-topics/basics/table-analyses-in-minitab/
This opens the first page for the help section on table analysis. You can then click your way through the menu on the left.
Also note: minitab has a function that tallies (counts) values for variables in a column. Just like the Excel COUNTIF function, it can’t be used to count across multiple values or criteria in different columns.