1
Data Analysis Project Instruction: Part 1
Due in Tutorials in the Week of Oct 1st to 5th or On Quercus Friday Oct 5th, 11:59pm
Purpose:
The objective of this project is to give you the opportunity in using some of the statistical techniques that
you have learned in this course for exploring a real data set.
Submission Format:
You are required to submit a concise written (typed) report (no more than two pages) of your data analysis
with font size of no smaller than 12 points. Your PSPP outputs may be incorporated into the body of your
written report or it may be left in the appendix section of your paper. You may work individually or in
groups of no more than three students. Your group members can be from different tutorial sections. If you
are working in a group, please think of creating a team name for changing a variable name in this data set;
if you are working individually, you will change a variable name with your last name. If you are working
in a group, please submit your work to only one course TA. You can either submit your report in your
tutorial (to one course TA), or you can submit your paper on Quercus (refer to Assignment section:
Analysis 1). The assessment criteria are described on page 7 of this document.
Context of Data:
The Organisation of Economic Cooperation and Development (OECD) gathers various information
regarding OECD countries and its partners in order to promote policies that aims to improve the economic
and social well-being of people around the world (http://www.oecd.org/about/)
This agency collects quantitative information on many domains and makes the collected data available for
public use (e.g., researchers) so that interested individuals can further investigate relationships among a set
of variables. A particular domain is named “Social Protection and Well-being”, which includes a yearly
collection of data “Better Life Index”. This information can be retrieved from: http://stats.oecd.org
http://www.oecd.org/about/
http://stats.oecd.org/
2
From the “Better Life Index 2017” (BLI, 2017), the most recent data collected in this domain, we will
analyze a quantitative variable named “Social Network Support”. Information regarding this variable can
be retrieved from: http://www.oecd.org/statistics/OECD-Better-Life-Index-2017-definitions.pdf
(note that this document is also posted on our Quercus page, “Data Analysis Project” module). This
variable is a sub-component of the Social connections/Community component in (BLI, 2017), which
reflects percentage of males and females aged 15-years and over in 35 OECD countries who perceive their
social network as having relatives or friends that they can count on to help them in times of need and
trouble. OECD indicates that they obtained and calculated this information based on Gallup World Poll.
• Let us recap the variables of interest in our data analysis:
1. Percentage of people (15 years of age and older) having social network support
2. Sex of the respondents identified as Male or Female
• I recommend that you read about this data here:
http://www.oecdbetterlifeindex.org/#/11111111111
Also, click on “Community” on the right hand-side menu to be directed to another web-link:
http://www.oecdbetterlifeindex.org/topics/community/
Scroll down that page and you can click and read about each country’s supported network.
PSPP Activity: Task A, and Task B
A. Describing the distribution of percentages of adults who reported having a Social Network Support in OECD countries.
B. Understanding and comparing distributions of percentages of males and females who reported having a Social Network Support in OECD countries.
Overview of Steps:
1. Save the following two data files on your computer (e.g., My Document folder)
• BLI_Support_Net_2017.csv
• BLI_Support_Net_Gender.csv
2. Open each of the above files and make changes to the column headings:
• Change the variable name “Support_network_lastname_teamname” with your last or team name. For example: Support_network_Aslemand
3. Save the excel files that you have modified for their column heading names.
4. Refer to the described PSPP tasks (A, and B) on the next pages. For each task, produce the associated
PSPP outputs and answer the related questions.
5. Please include your PSPP outputs with your written report upon submission. If you are submitting on
Quercus, you can upload three files: One PDF file for your written report and two PDF files for your PSPP
outputs. Name your files (modify with your lastname) as:
o Report 1_Lastname.pdf (e.g., Report 1_Aslemand.pdf)
o PSPP Overall Outputs_Lastname.pdf (e.g., PSPP Overall Outputs_Aslemand.pdf)
o PSPP Gender Outputs_Lastname.pdf (e.g., PSPP Gender Outputs_Aslemand.pdf)
http://www.oecd.org/statistics/OECD-Better-Life-Index-2017-definitions.pdf
http://www.oecdbetterlifeindex.org/#/11111111111
http://www.oecdbetterlifeindex.org/topics/community/
3
Task A. Describe the distribution of perceived social network support.
Open PSPP (from your computer program).
Step 1. Select Files to Import:
▪ In menu bar, go to File > Import Data > (e.g., My Document) ▪ Select the saved data file: “BLI_Support_Net_2017.csv” ▪ Click Next (Bottom of the page)
Step 2. Select the Lines to Import: Click Next
Step 3. Select the First Line:
▪ Select Line “1” (move the blue line from line 0 to line 1) ▪ At the bottom of screen, check off the box: Line Above Selected Line Contains Variable Names ▪ Click Next
Step 4. Choose Separators: Click Next
Step 5. Adjust Variable Formats: Click Apply.
PSPP will open “Data View” and “Variable View”. See the bottom of the screen to change between these
two windows. It is not necessary to change screens.
PSPP Instruction for Task A:
• From top bar menu in PSPP, go to Analyze > Descriptive Statistics > Explore
• Select the variable “Support_network_lastname_teamname” from the list and put it in the “Dependent List”. You should see your lastname or team name.
• Do not close this box yet; click on “Statistics”, select “Descriptive”, “Extreme”, “Percentile”, and click on Continue.
• Do not close the main box yet; click on Paste.
• PSPP will open another window; this is the Syntax Editor window.
• Your code looks like this at the moment:
EXAMINE
/VARIABLES = Support_network_lastname_teamname
/STATISTICS = DESCRIPTIVES EXTREME
/PERCENTILE
/MISSING=LISTWISE.
• We need to add a line for PLOT = BOXPLOT
• So, add the following red line in the code, exactly where I placed mine.
EXAMINE
/VARIABLES = Support_network_lastname_teamname
/STATISTICS = DESCRIPTIVES EXTREME
/PERCENTILE
/PLOT = BOXPLOT
/MISSING=LISTWISE.
4
• Highlight the entire modified code in your PSPP Syntax Editor and go to “Run” from the tool bar menu and click on “All”.
• You will get an output (PSPP icon blinks/flashes at the bottom of your computer screen).
• Open your PSPP output. This output displays tables of descriptive statistics and a boxplot.
• Note the case numbers that are displayed individually on the boxplot. What are their country names? Record these case numbers.
• Go back to your PSPP Syntax Editor. Highlight the code below and paste it into your syntax editor.
LIST
/VARIABLES = COUNTRY Support_network_lastname_teamname
/CASES = FROM 19 TO 19
/FORMAT = NUMBERED.
• Make sure that you modify the above variable name for “Support_network_lastname_teamname” with your lastname. The variable name should match what you have changed it with previously.
• In your syntax editor, highlight the above codes, go to “Run” from the tool bar menu, and click on “Selection” in order to run/compile the selected code.
• PSPP icon blinks/flashes again at the bottom of your computer screen to indicate that something new has been added to your output.
• Open your PSPP output window. New information is added to your output. See the bottom of your PSPP output window.
❖ Save/Export your PSPP outputs:
• In PSPP output window, go to File > Export > “Give a name to your output” and the location that you want to save your output in (e.g., My Document folder).
• At the bottom of the box that has appeared on your screen, from the drop down menu, select the format that you want to save your PSPP output: e.g., PDF(*.pdf) and click “Save”.
• Check your computer folder to make sure that your PSPP output is exported to your desired folder.
❖ Close the PSPP program in your computer.
❖ Refer to your PSPP outputs to answer the following questions.
Note: Unit of measurement is “Percentage of people aged 15 and over”
1. Refer to the descriptive statistics table for the distribution of percentages of adults who reported having
a social network support. Report the mean and standard deviation for this distribution. Interpret these
values within the context of this study.
2. Refer to the boxplot, and tables of descriptive statistics, percentiles, and extreme values for the
distribution of percentages of adults who reported having a social network support. Describe the shape,
centre, and spread of this boxplot within the context of this study. Note whether any points are plotted
individually on the plot. Specify the country name(s) for the individually plotted points on the boxplot.
3. Use the 1.5IQR rule to determine whether the individually plotted point(s) is/are suspect outlier(s).
4. Find the z-score for the minimum data value. Give a brief interpretation of this value (z-score) within
the context of this study.
5
Part B. Compare percentages of perceived social network support between males and females.
Open PSPP (from your computer program).
Step 1. Select Files to Import:
▪ In menu bar, go to File > Import Data > (e.g., My Document)
• Select the saved data file: “BLI_Support_Net_Gender_2017.csv”
• Click Next (Bottom of the page)
Step 2. Select the Lines to Import: Click Next
Step 3. Select the First Line:
▪ Select Line “1” (move the blue line from line 0 to line 1) ▪ At the bottom of screen, check off the box: Line Above Selected Line Contains Variable Names ▪ Click Next
Step 4. Choose Separators: Click Next
Step 5. Adjust Variable Formats: Click Apply.
PSPP will open “Data View” and “Variable View”.
PSPP Instruction for Task B:
• From top bar menu in PSPP, go to Analyze > Descriptive Statistics > Explore
• Select the variable “Support_network_lastname_teamname” from the list and put it in the “Dependent List”. You should see your lastname or team name.
• Select the variable “Gender” from the list and put it in the “Factor List”.
• Do not close this box yet; click on “Statistics”, select “Descriptive”, “Extreme”, “Percentile”, and click on Continue.
• Do not close the main box yet; click on Paste.
• PSPP will open another window; this is the Syntax Editor window.
• Your code looks like this at the moment:
EXAMINE
/VARIABLES = Support_network_lastname_teamname
BY Gender
/STATISTICS = DESCRIPTIVES EXTREME
/PERCENTILE
/MISSING=LISTWISE.
• We need to add a line for PLOT = BOXPLOT (let’s add it in the same place as mine).
EXAMINE
/VARIABLES = Support_network_lastname_teamname
BY Gender
/STATISTICS = DESCRIPTIVES EXTREME
/PERCENTILE
/PLOT = BOXPLOT
/MISSING=LISTWISE.
6
• Highlight the entire modified code in your PSPP Syntax Editor and go to “Run” from the tool bar menu and click on “All”.
• You will get an output (PSPP icon blinks/flashes at the bottom of your computer screen).
• Open your PSPP output. This PSPP output displays tables of descriptive statistics by gender and side-by-side boxplots.
• Note the case numbers that are displayed individually on the side-by-side boxplots. What are their country names? Record these case numbers (Females: 54, 44, 66; Males: 33, 59, 65, 53, 43)
• Go back to your PSPP Syntax Editor. Highlight the code below and paste it into your syntax editor.
• The code below will only display information for the case #54. Change the number to display other cases (the ones displayed individually on the boxplots). You need to do these seven more times.
LIST
/VARIABLES = COUNTRY Support_network_lastname_teamname
/CASES = FROM 53 TO 54
/FORMAT = NUMBERED.
• Make sure that you modify the above variable name for “Support_network_lastname_teamname” with your lastname. The variable name should match what you have changed it with previously.
• In your syntax editor, highlight the above codes, go to “Run” from the tool bar menu, and click on “Selection” in order to run/compile the selected code.
• PSPP icon blinks/flashes again at the bottom of your computer screen to indicate that something new has been added to your output.
• Open your PSPP output window. New information is added to the bottom of your PSPP output.
❖ Save/Export your PSPP outputs:
• In PSPP output window, go to File > Export > “Give a name to your output” and the location that you want to save your output in (e.g., My Document folder).
• At the bottom of the box that has appeared on your screen, from the drop down menu, select the format that you want to save your PSPP output: e.g., PDF(*.pdf) and click “Save”.
• Check your computer folder to make sure that your PSPP output is exported to your desired folder.
❖ Close the PSPP program in your computer. ❖ Refer to your PSPP outputs to answer the following questions.
Note: Unit of measurement is “Percentage of people aged 15 and over”
1. Refer to the descriptive statistics table for the distributions of percentages of perceived social network
support by gender. Report and compare the means and standard deviations for distributions of females and
males. Interpret these values within the context of this study.
2. Refer to the side-by-side boxplots, and tables of descriptive statistics, percentiles, and extreme values for
the distribution of percentages of perceived social network support by gender. Describe and compare the
shapes, centres, and spreads of these plots within the context of this study. Note whether any points are
plotted individually on each boxplot. Specify the country name(s) for the individually plotted points on
each boxplot.
3. Use the 1.5IQR rule (and 3IQR) to determine whether the individually plotted point(s) is/are suspect
outlier(s) or rare/unlikely cases. Confirm the cases that are displayed with “O” or “*” on the boxplots.
4. Find the z-scores for the individually plotted data value(s) on each boxplot (females, males). Give a
brief interpretation of these values (z-scores) within the context of this study.
7
Assessment of Data Analysis Project: Part 1
Last Name of Students or Team Name
1.________________________________________
2. ________________________________________
3. ________________________________________
Task A: Describe the distribution of perceived social network support Possible
Points
Point(s)
Received
1: PSPP Outputs (with modified lastname/team name) 10
2: Interpretation of Descriptive Statistics: Mean and Standard Deviation 4
3: Interpretation of Boxplot: Describe shape, centre, spread, outliers 8
4: Investigation of suspect outliers 2
5: Interpretation of Z-score(s) 2
Total 26
Task B: Describe the distribution of perceived social network support by gender Points Point(s)
Received
1: PSPP Outputs (with modified lastname/team name) 10
2: Interpretation of Descriptive Statistics: Means and Standard Deviations 6
3: Interpretation of Side-by-Side Boxplots: Describe shape, centre, spread, outliers 10
4: Investigation of suspect outliers 6
5: Interpretation of Z-score(s) 2
Total 34
Total Points
60
Marked by TA: ______________________________________
Comments (if any):