Case Study on Real Estate
This case study will utilize Excel file Real_Estate.xlsx
, which consists of 100 homes purchased in 2018. It includes variables regarding the number of bedrooms, number of bathrooms, whether the house has a pool or garage, the age, size and price of the home, what the house is constructed from, and the appraisals in 2016 and 2017.
Assigned Problem 1: It has been expressed by real estate professionals that swimming pools do not increase the value of a home. Conduct a hypothesis test for two independent samples to determine if the mean sales are different for homes with and without a pool. Use a .05 significance level. Describe your findings using proper statistical language. Note: that this problem requires an assumption that homes with a pool versus those without one are not much different otherwise; i.e., if those with a pool are in better locations, made of better materials, are newer and larger, then those homes will be worth more and it will not have anything to do with a pool. For the sake of simplicity, we will assume that the homes or without a pool have similar variability in them.
Assigned Problem 2: We would like to find out how different two real estate agents can be in their appraisals of a property. Conduct a hypothesis test for paired samples and test if there is a difference in the mean appraisal prices given by these agents on the same homes. Use a .05 significance level. Describe your findings using proper statistical language.
Assigned Problem 3: If people are going to invest in their homes by constructing them out of brick, are they going to take the plunge and install a swimming pool? Conduct a hypothesis test of proportions to determine if the proportion of homes made of brick are more likely to have a swimming pool versus homes made of other materials. Use a .05 significance level. Describe your findings using proper statistical language.
Assigned Problem 4: You might expect that homes with more bedrooms are worth more since they are probably larger, but is there more to the value; i.e., location, construction, age, etc.? Using the sample of 100 homes in the data file, conduct a hypothesis test using Analysis of Variance (ANOVA) to determine if there is a difference in the mean sale price of homes with two bedrooms versus those with three, four or five bedrooms. Use a .05 significance level. Since there are homes made of varying sizes at different locations and made of different material for this sample, it would be reasonable to assume that location and construction are not factors in this test. Describe your findings as you do on the other problems.
Due at the end of Module 5
Course Learning Outcomes: 1, 2, 3 & 4
Real_Estate.xlsx
Excel Instructions
We will create an answer Excel spreadsheet to write the solution to our problems. After writing your name and Module 5 in the first two rows of the left-hand column and then skipping a row, label cell a4 as “Problem 1” and do likewise after we solve each one. Always start a hypothesis problem with stating the null and alternative hypothesis, in this case: Ho: μp = μnp ; Ha: μp > μnp and α = .05 . Find the c.v., which is a t-value with 99 degrees of freedom, 1.66. You need to find the sample mean and sample standard deviation of home with pools, those without pools and write these on you answer spreadsheet. Sorting column E will provide you with the raw information you need. We not know the population standard deviation, so, we must use a t-distribution to find the S.T.S. The standardized test statistic is
t = { (ā1 – ā2) – (μ1 - μ2) } / √ (s12 /n1 + s22 /n2 ) =
Observed difference – Hypothesized difference
Standard error
This formula assumes the variances are not equal, just like when we use a normal dist. Compare this to the c.v. or calculate the p-value. State your conclusion.
For Problem 2, we need to create a new column, which will be the difference between column J from K, or vice-versa. Find the mean and standard deviation of the difference column and write these on your answer spreadsheet right after stating the null and alternative hypothesis. We have only sample data therefore, we must use a t-distribution. Find the c.v. on this two-tail test, which I come up with as ± 1.984. Determine the standardized test statistic, t = (ā - μā ) / (s /√n) , compare them, reject or fail-to-reject the null hypothesis and then state your conclusion.
In Problem 3, after stating the null and alternative hypothesis, Ho: pb = po , Ha: pb > po , α = .05 , you may first sort column E and then sort column F. Calculate p-hat for both the proportion of brick homes with pools and the proportion of (other + frame) homes with pools. Find the c.v. for a z-distribution; i.e., z = 1.645. Test statistic is ̂p1 - ̂p2, the Standardized Test Statistics is
z = {(̂p1 - ̂p2) – (p1 - p2)} / √ (p¯• q¯ (1/n1 + 1/n2))
p¯ = (x1 + x2) / (n1 + n2) = (n1̂̂p1 + n2 ̂p2) / (n1 + n2) = the weighted estimate
Reject or fail-to-reject the null hypothesis and state your conclusion.
In Problem 4, we need to add the Data Analysis tool to our display. We are going to set up the Data Analysis tool under the “Data” tab. Start by going to “File”, then all the way at the bottom, “Options”, then on the left highlight “Add-ins” and hit “Go” not enter, highlight Analysis Data Pak and hit Go, when the pop-up window appears check Analysis Data Pak and hit OK. This will now show up under the “Data” tab and be the right most entry. For Apple computers, start by going to “Tools” and then “add-ins”. On your answer spreadsheet, label four columns, two-bedroom, three-bedroom, etc. Sort column B on the data sheet. Using the ctrl c capability of your computer, copy column I for all the two-bedroom homes onto your answer spreadsheet. Do likewise for three, four and five-bedroom homes. Arrange each of these columns in ascending order; i.e., the smallest column to the largest. State your null hypothesis and alternative; i.e., Ho: All the means are the same; Ha: At least one mean is different. α = .05. With Data Analysis installed, go to “ANOVA Single Factor” and fill in the Input Range. Put in the first entry on your spreadsheet : the last entry; i.e., you are sweeping your ordered array. The Output Range is where on the page you want all the information that Excel will provide. When initiated, the Between Groups shows the degrees of freedom—the number of variables minus one, the F standardized test statistic, the F critical value and the p-value. State your conclusion comparing the p-value and α.