1).
Describe the data, using summary statistics and graphs, as appropriate.
In the present analysis, the collected data
is used to define the infant mortality rates that have been fallen after half
of the twentieth century. The hypothesis proposed in the project explains the
unprecedented trend that is observed for the broad sweep of history. In the
present analysis, information is related to the infant mortality rates (IMR),
real gross domestic product per capita in rupees (GDPPC), educational and
health expenditures per capita in rupees (EDUCPC and HEXPPC) for Sri Lanka,
covering the period 1951 to 1981. The summary of quantitative analysis of data includes
measured mean, standard deviation, skewness, and range of the data collected.
In the case of infant mortality rates (IMR) mean, standard deviation, skewness,
and range are 53.75, 12.8, 4, and 29.5 to 82 respectively. In real gross
domestic product per capita in rupees (GDPPC) mean, standard deviation, skewness,
and range are 742.13, 123.3, 0.87, and 617 to 1023 respectively. When comparing
educational and health expenditures per capita in rupees (EDUCPC and HEXPPC)
mean, standard deviation, skewness, and range for HEXPPC are 13.70, 2.01,
-0.68, and 8.8 to 16.54 respectively. While for the EDUCPC the values of mean,
standard deviation, skewness, and range are 26.6, 6.3, -0.6, and 14.6 to 35.53
respectively. The statistical summary of all the analysis is formulated below
in table 1, 2, 3, and 4. When comparing all the analysis, the maximum mean
values are observed for real gross domestic product per capita (statisticshowto, 2019).
Table 1: Statistical summary of IMR
IMR
|
|
|
Mean
|
53.75806
|
Standard Error
|
2.307406
|
Median
|
53
|
Mode
|
53
|
Standard Deviation
|
12.84709
|
Sample Variance
|
165.0478
|
Kurtosis
|
-0.27218
|
Skewness
|
0.394624
|
Range
|
52.5
|
Minimum
|
29.5
|
Maximum
|
82
|
Sum
|
1666.5
|
Count
|
31
|
Confidence Level(95.0%)
|
4.712352
|
Table 2: Statistical summary of GDPPC
GDPPC
|
|
|
Mean
|
742.1323
|
Standard Error
|
22.1594
|
Median
|
697.83
|
Mode
|
#N/A
|
Standard Deviation
|
123.3783
|
Sample Variance
|
15222.21
|
Kurtosis
|
-0.23112
|
Skewness
|
0.872891
|
Range
|
417.01
|
Minimum
|
617.59
|
Maximum
|
1034.6
|
Sum
|
23006.1
|
Count
|
31
|
Confidence Level(95.0%)
|
45.25553
|
Table 3: Statistical summary of HEXPPC
HEXPPC
|
|
|
Mean
|
13.70548
|
Standard Error
|
0.361904
|
Median
|
14.28
|
Mode
|
14.28
|
Standard Deviation
|
2.014994
|
Sample Variance
|
4.060199
|
Kurtosis
|
-0.36343
|
Skewness
|
-0.68038
|
Range
|
7.74
|
Minimum
|
8.8
|
Maximum
|
16.54
|
Sum
|
424.87
|
Count
|
31
|
Confidence Level(95.0%)
|
0.739106
|
Table 4: Statistical summary of EDUCPC
EDUCPC
|
|
|
Mean
|
26.6387097
|
Standard Error
|
1.13782019
|
Median
|
29.64
|
Mode
|
#N/A
|
Standard Deviation
|
6.3351147
|
Sample Variance
|
40.1336783
|
Kurtosis
|
-0.9113174
|
Skewness
|
-0.6506272
|
Range
|
20.93
|
Minimum
|
14.6
|
Maximum
|
35.53
|
Sum
|
825.8
|
Count
|
31
|
Confidence Level(95.0%)
|
2.32373883
|
2).
Calculate the pair-wise correlation coefficients between IMR and each of the
other variables. Test the statistical significance of each correlation
coefficient.
The pairwise correlation coefficient is a
linear correlation between two factors, and it is used to measure the
correlation factor of IMR with GDPPC, HEXPPC, and EDUCPC. The correlation
factor of IMR and GDPPC is 0.86. The pair-wise correlation coefficients between
IMR and HEXPPC is 0.78. In the case of IMR and EDUCPC, the value of pair-wise
correlation coefficient is 0.77. The maximum value of pair-wise correlation
coefficient is observed for GDPPC. The maximum to minimum correlation factor is
as EDUCPC < GDPPC < IMR. The values of pair-wise correlation coefficients
for all the factors are mentioned below in table 5 (sciencedirect, 2019) and (djsresearch, 2019).
Table for
Correlation coefficient between IMR and GDPPC
|
IMR
|
GDPPC
|
IMR
|
1
|
|
GDPPC
|
-0.86285
|
1
|
Table for Correlation coefficient between IMR
and HEXPPC
|
IMR
|
HEXPPC
|
IMR
|
1
|
|
HEXPPC
|
-0.78113
|
1
|
Table for Correlation coefficient between IMR
and EDUCPC
|
IMR
|
EDUCPC
|
IMR
|
1
|
|
EDUCPC
|
-0.77001
|
1
|
3).
Estimate a regression model of the form:
I'm
=α + β1GDPPCt + β2HEXPPCt +ut
where the t subscript corresponds to year t, Interpret the coefficients
that you obtain, and comment on their economic and statistical significance.
The regression
model is calculated for two data sets separately including regression analysis
of IMR with GDPPC and IMR with HEXPPC. The multiple regression analysis is
conducted to measure the regression coefficients and other variables. The
regression model corresponds to the time in the year and interprets the
economic and statistically significant values. The number of observations in
the analysis is 31. The value of R-Square for the regression of IMR and GDPPC
is observed as 0.744 with adjusted R Square value as 0.73. The t-value of the
analysis is -9.19 that is smaller than standard t-value of 0.001. The value of
correlation coefficient is between -1 and 1 that demonstrate for validation of
hypothesis. The p-value is smaller than the standard value of 1. Another
regression analysis is carried out to identify the statistical correlation
between IMR and HEXPPC. The multiple R and R squared values of regression
statistics are 0.78 and 0.61 respectively. The value of adjusted R square is
0.61 with the standard error of 8.15. The significant factor is . The t-value for IMR and
HEXPPC is less than the standard value of t-value as . The p-value measured is less significant because of less value from
the standard statistical value. The significance value of multiple regression
is less than the standard value of the regression. The coefficient of intercept
is 145 with different t value and p-value. The regression summary of IMR v/s
GDPPC, IMR V/S HEXPPC and multiple regression are tabulated below in table 6,
7, and 8 (stat, 2019)
and (statsoft, 2019).
IMR V/S GDPPC
Table 6: Regression statistics of IMR V/S GDPPC
SUMMARY OUTPUT
|
|
|
|
Regression Statistics
|
Multiple R
|
0.862847105
|
R Square
|
0.744505126
|
Adjusted R Square
|
0.735694958
|
Standard Error
|
6.604769393
|
Observations
|
31
|
ANOVA
|
|
|
|
|
|
|
df
|
SS
|
MS
|
F
|
Significance F
|
Regression
|
1
|
3686.369101
|
3686.369
|
84.50521279
|
4.29948E-10
|
Residual
|
29
|
1265.066383
|
43.62298
|
|
|
Total
|
30
|
4951.435484
|
|
|
|
|
Coefficients
|
Standard Error
|
t Stat
|
P-value
|
Intercept
|
120.4358667
|
7.349727562
|
16.38644
|
3.36241E-16
|
X Variable 1
|
-0.089846252
|
0.009773682
|
-9.19267
|
4.29948E-10
|
IMR V/S HEXPPC
Table 7: Regression statistics of IMR v/s HEXPPC
SUMMARY OUTPUT
|
|
|
|
Regression Statistics
|
Multiple R
|
0.781129842
|
R Square
|
0.610163831
|
Adjusted R Square
|
0.596721204
|
Standard Error
|
8.158449484
|
Observations
|
31
|
ANOVA
|
|
|
|
|
|
|
df
|
SS
|
MS
|
F
|
Significance F
|
Regression
|
1
|
3021.186843
|
3021.187
|
45.39022412
|
2.1553E-07
|
Residual
|
29
|
1930.248641
|
66.5603
|
|
|
Total
|
30
|
4951.435484
|
|
|
|
|
Coefficients
|
Standard Error
|
t Stat
|
P-value
|
Intercept
|
122.0153294
|
10.2367743
|
11.91931
|
1.06725E-12
|
X Variable 1
|
-4.980288587
|
0.739219382
|
-6.73723
|
2.1553E-07
|
Multiple Regression
Table 8: Multiple regression statistics
SUMMARY OUTPUT
|
|
|
|
|
|
|
|
|
|
|
|
Regression Statistics
|
|
|
|
|
Multiple R
|
0.962099449
|
|
|
|
|
R Square
|
0.92563535
|
|
|
|
|
Adjusted R Square
|
0.920323589
|
|
|
|
|
Standard Error
|
3.626350826
|
|
|
|
|
Observations
|
31
|
|
|
|
|
|
|
|
|
|
|
ANOVA
|
|
|
|
|
|
|
df
|
SS
|
MS
|
F
|
Significance F
|
Regression
|
2
|
4583.223715
|
2291.612
|
174.2614915
|
1.58173E-16
|
Residual
|
28
|
368.2117687
|
13.15042
|
|
|
Total
|
30
|
4951.435484
|
|
|
|
|
Coefficients
|
Standard Error
|
t Stat
|
P-value
|
Intercept
|
145.0590517
|
5.017399882
|
28.9112
|
2.14377E-22
|
X Variable 1
|
-0.066255587
|
0.006079204
|
-10.8987
|
1.39275E-11
|
X Variable 2
|
-3.073994243
|
0.372230404
|
-8.25831
|
5.48523E-09
|
4).
Interpret the R2 statistic from the regression and test whether it
is statistically significant.
The R square statistics define the
significance of the regression test and how significantly the statistical
analysis is. The equation used for the regression analysis is as follow
s,
I'm =α + β1GDPPCt
+ β2HEXPPCt +ut
In the analysis, the value of IMR is
considered equivalent to the sum of GDPPC and HEXPPC with some additional
parameters. The interpretation of these variables is based on the t-value and
p-value of the regression. The
regression analysis is further subdivided into three combinations including
regression analysis of IMR v/s GDPPC, IMR v/s HEXPPC and multiple regression
analysis. The significance factor and adjusted R square of IMR v/s GDPPC is and 0.73 respectively. When considering
the IMR v/s HEXPPC the significant factor and adjusted R square is and 0.59 respectively. The results of multiple
regression show significance factors and adjusted R square is and 0.92 respectively. The maximum to the
minimum value of the adjusted R square is as follows IMR v/s HEXPPC < IMR
v/s GDPPC < multiple regression.
5).
Predict the IMR for Sri Lanka at a GDP per capita level of 750 rupees, assuming
HEXPPC is at its mean value.
The R square in the statistical analysis is
used to measure the closeness of data that is fitted to the regression line.
The models are used to explain the variability of the response data around the
mean conditions. The results indicate model conditions along with with the
variability to response the data conditions. The mean value of HEXPPC is 13.70
and based on HEXPPC, the IMR value will be 13.70 at GDP per capita level of 750
rupees.
6).
Re-estimate the model including the EDUCPC variable and comment on any changes
to the results and goodness of fit:
IMRt =α + β1GDPPCt
+ β2HEXPPCt + β3EDUCPCt +ut
Explain how the omission of EDUCPC in part
3 may have biased the results. (Note: it is sufficient to discuss the changes,
without explicitly showing the testing procedure).
The
linear regression is used to re-estimate the model that include the EDUCPC
variables. The significant impact is measured on the results and it fit the
values of the variables. The results demonstrate the explicit conditions based
on procedures used in the model. The results of linear regression for IMR and
EDUCPC shows R square value and adjusted R square values as 0.57 and 0.59
respectively. In the analysis, the number of observations considered to take
the outcomes is 31. The t value of
multiple regression value for variable 1, variable 2, and variable 3 are
measured as -11.11, -3.03, and -2.08. The p-value for variable 1, variable 2,
and variable 3 are measured as , , and . The t value and
p-value are less than the significant value of the regression values.
Linear Regression
IMR v/s EDUCPC
SUMMARY OUTPUT
|
|
|
Regression Statistics
|
Multiple R
|
0.770014756
|
R Square
|
0.592922724
|
Adjusted R Square
|
0.578885577
|
Standard Error
|
8.336907693
|
Observations
|
31
|
ANOVA
|
|
|
|
|
|
|
df
|
SS
|
MS
|
F
|
Significance F
|
Regression
|
1
|
2935.818617
|
2935.819
|
42.23955
|
4.08928E-07
|
Residual
|
29
|
2015.616867
|
69.50403
|
|
|
Total
|
30
|
4951.435484
|
|
|
|
|
Coefficients
|
Standard Error
|
t Stat
|
P-value
|
Intercept
|
95.35512915
|
6.573159199
|
14.50674
|
7.93E-15
|
X Variable 1
|
-1.56152701
|
0.240264653
|
-6.4992
|
4.09E-07
|
Multiple Regression
SUMMARY OUTPUT
|
|
|
|
|
|
|
|
|
|
|
|
Regression Statistics
|
|
|
|
|
Multiple R
|
0.967438284
|
|
|
|
|
R Square
|
0.935936833
|
|
|
|
|
Adjusted R Square
|
0.928818703
|
|
|
|
|
Standard Error
|
3.427582234
|
|
|
|
|
Observations
|
31
|
|
|
|
|
|
|
|
|
|
|
ANOVA
|
|
|
|
|
|
|
df
|
SS
|
MS
|
F
|
Significance F
|
Regression
|
3
|
4634.230845
|
1544.744
|
131.4863418
|
3.20077E-16
|
Residual
|
27
|
317.2046392
|
11.74832
|
|
|
Total
|
30
|
4951.435484
|
|
|
|
|
Coefficients
|
Standard Error
|
t Stat
|
P-value
|
Intercept
|
139.7691319
|
5.379173862
|
25.98338
|
1.22146E-20
|
X Variable 1
|
-0.064535877
|
0.005804959
|
-11.1174
|
1.39592E-11
|
X Variable 2
|
-1.951526011
|
0.643412555
|
-3.03309
|
0.005297232
|
X Variable 3
|
-0.426833846
|
0.204847794
|
-2.08366
|
0.046785864
|
7). What conclusions do you draw
from your analysis?
The
present report aimed to measure infant mortality rates that have been
decreasing in Sri Lanka. The report is established to measure the GDP, health
and educational expenditures per Capita based on regression analysis and linear
regression of all the variables in the analysis. The comparison of R square values is used to
highlight the limitations and to determine the estimated coefficients. These
coefficients are used to measure the mean changes in the values. The regression
measurement indicates changes in the dependent and independent variables with
the correlation to measure the shift in the dependent and independent
variables. The coefficient relations are used to consider the assumptions and
relationship between the variables and coefficients.
Appendix 1: Data for analysis
Year
|
IMR
|
GDPPC
|
HEXPPC
|
EDUCPC
|
1951
|
82
|
617.59
|
8.8
|
14.6
|
1952
|
78
|
629.63
|
10.51
|
16.47
|
1953
|
71
|
619.4
|
10.78
|
16.96
|
1954
|
72
|
623.65
|
10.84
|
15.96
|
1955
|
71
|
648.85
|
10.67
|
15.51
|
1956
|
67
|
634.94
|
11.4
|
17.84
|
1957
|
68
|
622.72
|
11.87
|
19.88
|
1958
|
64
|
619.15
|
12.9
|
22.09
|
1959
|
58
|
617.71
|
14.72
|
24.84
|
1960
|
57
|
641.41
|
14.28
|
24.14
|
1961
|
52
|
646.34
|
15.86
|
29.64
|
1962
|
53
|
649.81
|
14.56
|
30.31
|
1963
|
56
|
655.75
|
14.77
|
31.44
|
1964
|
55
|
697.83
|
13.82
|
32.27
|
1965
|
53
|
694.04
|
13.98
|
33.38
|
1966
|
54
|
688.95
|
14.45
|
32.49
|
1967
|
48
|
705.56
|
14.38
|
30.53
|
1968
|
50
|
744.75
|
14.74
|
30.13
|
1969
|
53
|
767.87
|
15.99
|
32.26
|
1970
|
47
|
786.16
|
16.1
|
35.53
|
1971
|
45
|
780.63
|
16.16
|
32.42
|
1972
|
46
|
786.43
|
15.62
|
33.67
|
1973
|
46
|
802.6
|
13.8
|
31.03
|
1974
|
51
|
825.15
|
11.63
|
24.44
|
1975
|
45
|
883.11
|
12.63
|
25.89
|
1976
|
44
|
842.48
|
14.28
|
28.66
|
1977
|
42
|
860.07
|
13.45
|
25.17
|
1978
|
37
|
924.96
|
15.42
|
25.25
|
1979
|
38
|
956.14
|
16.54
|
29.78
|
1980
|
34
|
997.82
|
15.84
|
32.29
|
1981
|
29.5
|
1034.6
|
14.08
|
30.93
|
where:
Year = the year
of observation;
IMR = Infant Mortality Rate per 1000 live births
GDPPC = Real GDP per
capita in rupees
EDUCPC = Real
Educational Expenditures per capita in rupees
HEXPPC = Real Health
Expenditures per capita in rupees
References of The
Relationship between Infant Mortality, Income and Public Expenditures in Sri
Lanka 1951 to 1981
djsresearch. (2019). Correlation analysis: market
research. Retrieved from
https://www.djsresearch.co.uk/glossary/item/correlation-analysis-market-research
sciencedirect. (2019). Correlation analysis.
Retrieved from
https://www.sciencedirect.com/topics/medicine-and-dentistry/correlation-analysis
stat. (2019). Linear Regression.
Retrieved from http://www.stat.yale.edu/Courses/1997-98/101/linreg.htm
statisticshowto. (2019). Descriptive
Statistics: Definition & charts and graphs. Retrieved from
https://www.statisticshowto.datasciencecentral.com/probability-and-statistics/descriptive-statistics/
statsoft. (2019). Multiple Regression.
Retrieved from http://www.statsoft.com/Textbook/Multiple-Regression