Deliverable 1 - Descriptive Statistics
InstructionsScenario
(Information repeated for deliverable 01, 03, and 04)
A major client of your company is interested in the salary distributions of jobs in the state of Minnesota that range from $30,000 to $200,000 per year. As a Business Analyst, your boss asks you to research and analyze the salary distributions. You are given a spreadsheet that contains the following information: A listing of the jobs by title The salary (in dollars) for each jobThe client needs the preliminary findings by the end of the day, and your boss asks you to first compute some basic statistics.
Background information on the Data The data set in the spreadsheet consists of 364 records that you will be analyzing from the Bureau of Labor Statistics. The data set contains a listing of several jobs titles with yearly salaries ranging from approximately $30,000 to $200,000 for the state of Minnesota.
What to Submit Your boss wants you to submit the spreadsheet with the completed calculations. Your research and analysis should be present within the answers provided on the worksheet.
I've already completed the worksheet with the answers only thing that needs to be done is the answers that were completed on the worksheet be transferred to the excel document
1. Introduce your scenario and data set.
· Provide a brief overview of the scenario you are given and describe the data set.
· Describe how you will be analyzing the data set.
· Classify the variables in your data set.
· Which variables are quantitative/qualitative?
· If it is a quantitative variable, is it discrete or continuous?
· Describe the level of measurement for each variable included in the data set (nominal, ordinal, interval, ratio).
Answer and Explanation:
The data that is being analyzed is from the Bureau of Labor Statistics and consist of 364 records that contain job in Minnesota and have salaries that range from $30,000 to $200,000 per year. I have been asked to research and analyze the salary distributions for one of our major clients.
Data will be analyzed doing basic calculations.
There are two variables in this scenario. The first variable is job titles which are qualitative variables as they do not contain any numerical numbers. The second variable is salaries and is considered a quantitative variable since the data has a numeric property. Since the salary variable is quantitative, it would also be regarded as continuous as the amount of money that is made annually can continuously change due to pay increases, promotions, and bonuses.
By identifying the qualitative and quantitative variables, it is easier to determine the level of measurement for both variables. Due to the job titles being considered qualitative variables, and not having numerical significance, they would be classified as a nominal measurement. Where the listed salaries would be considered a ratio measurement because an absolute zero can occur. It is considered ratio since you are employed, you could not make less than zero. If you were paid zero, then you would not be employed.
2. Discuss the importance of the Measures of Center.
· Name and describe each measure of center.
· Discuss the advantages and/or disadvantages of each.
Answer and Explanation:
The measures of center are widely used to represent values in data sets. Measures of center include mean, median, mode, and midrange. These measurements are values that are found in the center of data sets. “The mean is generally the most important of all numerical measurements used to describe data, and it is what most people call an average” (Triola, 2018 pg. 82). A few of the advantages of using the mean is; it is familiar to most people, every data set has only one mean and is easily used for comparison. Disadvantages of the mean are that it is affected by extreme values, hard to calculate in large data, and not be calculated for group data with open-ended classes (Rasmussen 2019).
The median in a data set is the measure of the center that is the middle value when the data is arranged in an increasing or decreasing order (Triola, 2018). An advantage of the median is that the median is not affected by extreme values. The disadvantage of the median is that it is not as useful in statistical testing. Next is the mode, “The mode of a data set is the value(s) that occur(s) with the greatest frequency” (Triola, 2018 pg. 85). Important properties of mode are that that mode can be found with qualitative data, and can have one mode, more than one mode, or no mode (Triola, 2018). Also, the mode has disadvantages such as it might not be informative and can vary from each sample.
The final measure of center is midrange. “The midrange of a data set is the measure of center that is the value midway between the maximum and minimum values in the original data set;” “because the midrange uses only the maximum and minimum values, it is very sensitive to those extremes, so the midrange is not resistant” (Triola, 2018 pg. 86). The midrange is rarely used, is not always a value; however, it is easy to compute and provides a midpoint in the data.
3. Discuss the importance of the Measures of Variation.
· Name and describe each measure of variation.
· Discuss the advantages and/or disadvantages of each.
Answer and Explanation:
Measures of variation are the measurements that tell us how spread out the data points are,” and include range, variance, and standard deviation (Rasmussen 2018). The range is the difference between the maximum value and the minimum value (Triola, 2018). The range’s advantage is that is it easy compute. The disadvantages of the range are that you are only using extreme values, rarely used, and does not provide information regarding the data inside the range.
“Variance is the expectation of the squared deviation of a random variable from its mean. Informally, how spread out the data is. The advantages of variance are that it is used to calculate the standard deviation and that each value in the data set is used in calculation” (Rasmussen 2018). The disadvantages of variance are that it is not easily calculated manually, and extreme values can skew the variance.
Finally, we have the standard deviation. “Standard deviation is the measure of how much data values deviate away from the mean. If individual observations vary greatly from the group mean, the standard deviation is big; and vice versa” (Rasmussen 2018). Advantages of standard deviation include; The value is never negative and is useful in statistical methods and theoretical work, whereas the disadvantages are that it is hard to calculate manually and is not useful for nominal or ordinal data.
Using measures of center and variation gives the opportunity to better analyze data.
.
4. Calculate the measures of center and measures of variation from the data set and list them below. Be sure to include (a) an interpretation of each measure in context of the scenario (for example, if the median is larger than the mean, what does it mean? What does the value of standard deviation tell you?) and (b) correct units of measurement. Show your calculations in your spreadsheet. You do not need to include Excel functions in your written answer below.
· Mean
· Median
· Mode
· Midrange
· Range
· Variance
· Standard deviation
Answer and Explanation:
The average salary amount, or mean, of all the information gathered, is $71,879. The median or middle value of all the salaries listed is $66,525. Based on the mean and the median it could be considered a positive skew or skewed to the right.
There were five modes that populated in the array $71,420, $35,750, $72,850, $64,880, and $65,290. Due to the multiple modes, this is considered a multimodal mode.
To find the midrange first the maximum and minimum both had to be determined. The maximum is $199,980 and the minimum is $32,220. After finding what the range was in the salaries, I was able to calculate the midrange which is $116,100. The midrange tells us that the spread is diverse as there is a large difference between the min and the max of the salaries.
The next value to find was the variance which is 5.46E+08. The variance tells us the spread from the mean is minimal. Next, I calculated the range and got $167,670. Due to the wide range, there is great variability in the data. Lastly, is the standard deviation which calculates out to $23,367.36. This standard deviation tells us that the values in the data set are further away from the mean on an average.
References
Rasmussen (2019). Module 01: Review of Basic Statistics. In STA3215CBE: Inferential Statistics and Analytics: Fall 2018 [Live Lecture replay]. Retrieved from https://rasmussen.mycourselabs.com/labs/mod/forum/discuss.php?d=5175
Triola, M. F. (2018). Elementary Statistics, 13th Edition. [Bookshelf Ambassadored]. Retrieved
from https://ambassadored.vitalsource.com/#/books/9780134464244/