Loading...

Messages

Proposals

Stuck in your homework and missing deadline? Get urgent help in $10/Page with 24 hours deadline

Get Urgent Writing Help In Your Essays, Assignments, Homeworks, Dissertation, Thesis Or Coursework & Achieve A+ Grades.

Privacy Guaranteed - 100% Plagiarism Free Writing - Free Turnitin Report - Professional And Experienced Writers - 24/7 Online Support

How to find outliers in jmp

01/12/2021 Client: muhammad11 Deadline: 2 Day

Case 1 -­ Medical Malpractice: Descriptive Statistics, Graphics, and Exploratory Data Analysis

Marlene Smith, University of Colorado Denver Business School

2

Medical Malpractice: Descriptive Statistics, Graphics, and Exploratory Data Analysis

Background

According to a recent study published in the US News and World Report the cost of medical malpractice in the United States is $55.6 billion a year, which is 2.4 percent of annual health-­care spending. Another 2011 study published in the New England Journal of Medicine revealed that annually, during the period 1991 to 2005, 7.4% of all physicians licensed in the US had a malpractice claim. These staggering numbers not only contribute to the high cost of health care, but the size of successful malpractice claims also contributes to high premiums for medical malpractice insurance.

An insurance company wants to develop a better understanding of its claims paid out for medical malpractice lawsuits. Its records show claim payment amounts, as well as information about the presiding physician and the claimant for a number of recently adjudicated or settled lawsuits.

The Task

Using descriptive statistics and graphical displays, explore claim payment amounts, and identify factors that appear to influence the amount of the payment.

The Data MedicalMalpractice.jmp

The data set contains information about the last 118 claim payments made, covering a six month period. The eight variables in the data table are described below:

Amount Amount of the claim payment in dollars Severity The severity rating of damage to the patient, from 1 (emotional trauma) to 9

(death) Age Age of the claimant in years Private Attorney Whether the claimant was represented by a private attorney Marital Status Marital status of the claimant Specialty Specialty of the physician involved in the lawsuit Insurance Type of medical insurance carried by the patient Gender Patient Gender

The variables are coded in JMP with a Continuous, Ordinal or Nominal modeling type. This coding helps to make sure that JMP performs the correct analysis and produces appropriate graphs.

A first step in any analysis is to ensure that your variables have the correct Modeling Type:

• Continuous variables, like Amount, have numeric values (e.g.;; 2, 5, 3.35, 159.667,…). • Ordinal variables, such as Severity, have either numeric or character values which represent

ordered categories (e.g.;; small, medium and large;; 1-­9 severity rating scales,…). • Nominal variables, like Gender, can also have either numeric or character values, and represent

unordered categories or labels (e.g.;; the names of states, colors of M&Ms, machine numbers,…).

3

mean = 91045 median = 22750

Analysis

We begin by looking at the key variable of interest, the amount of claim payment. Exhibit 1 displays a histogram and summary statistics for Amount.

Exhibit 1 Distribution of Amount

(Analyze > Distribution;; Select Amount as Y, Columns, and click OK. For a horizontal layout select Stack under the top red triangle.)

From Exhibit 1 we see that the histogram of Amount is skewed right, meaning that there is a long tail, with several very high payments. The mean (average) payment is $91,045, while the median (middle) is $22,750. When a histogram is right skewed, as is the case here, the mean will exceed the median. This is because the mean is influenced by extreme values – the high payments that we observe in the histogram inflate the mean.

A measure of the spread of the data is the standard deviation (StdDev in Exhibit 1). The higher the standard deviation, the larger the spread, or variation, in the data. When the data are skewed, the standard deviation, like the mean, will be inflated.

Other useful summary statistics are the quartiles. The first quartile (next to 25.0% in Exhibit 1) is $7,500 and the third quartile (next to 75.0%) is $92,670. The interquartile range, defined as Q3 – Q1, is a measure of the amount of spread or variability in the middle 50% of the data. This value is displayed graphically in the outlier box plot (above the histogram). A larger version of this plot is displayed below.

The left edge of the box is the first quartile, the center line is the median or second quartile, and the right edge of the box is the third quartile. Hence, the width of the box is the interquartile range, or IQR.

4

(Notes: The center of the diamond is the mean. We will discuss this in a few moments. The red bracket at the top, which we won’t discuss further, denotes the “densest” region of the data.)

The outlier box plot helps us to visually identify potential outliers. The rule of thumb used to distinguish outliers from non-­outliers is this: if the histogram is approximately normal, or bell-­shaped, outliers are those points that extend beyond 1.5 IQRs of the box. The line extending from the right edge of the box, called a whisker, is roughly 1.5 IQRs in length (we say “roughly”, because it is actually drawn to the furthest point within that range, so it may not be quite 1.5 IQRs).

Let’s ignore, for sake of illustration, the fact that our data are right skewed. There are 16 points beyond the whisker, which we will consider to be outliers. In this case, the outliers are those points that are much larger than the rest.

Having identified several outliers, what should we do about them? Let’s consider removing them from the analysis. To do so, we will hide and exclude the points (rather than simply deleting them). Hide removes points from graphs, while Exclude removes them from future calculations.

Exhibit 2 is the new histogram for Amount after excluding and hiding the 16 outliers.

Exhibit 2 Amount after excluding and hiding 16 outliers

(To exclude and hide, draw a box around the points in the boxplot to select them. Then, select Rows > Hide and Exclude. Return to Analyze > Distribution and re-­generate the histogram.)

Note that there are now seven (7) new outliers! We might as well get rid of those seven outliers as well. The result is shown in Exhibit 3.

Exhibit 3 Amount after excluding and hiding a total 23 outliers

OK, so now we have six more outliers. How long can this game go on? You’re welcome to continue excluding and hiding outliers as you see fit. Or perhaps you’ve gotten the message: discarding outliers

5

from a skewed distribution is an exercise in futility, since observations that didn’t stand out at first will appear to be outliers after excluding the most extreme observations. Removing observations in this situation just forces other observations to take their place.

There’s an even more important reason not to exclude outliers from the analysis. There’s nothing wrong with those “outliers” — they’re just bigger than most of the other payments. By excluding the 23 outliers, we have removed the really high claim payments made by the insurance company. The average calculated on the remaining observations is $28,306, a number less than one-­third the original average. Imagine that the company uses the average and range of the truncated data set to forecast future payments. Upper management will be unpleasantly surprised to find many year-­end actual payments greatly exceeding the predicted payments and you, as the firm statistician, may well be out of a job.

In other words, why discard data points just because they’re unusual or inconvenient? There is great danger in the knee-­jerk exclusion of outliers. We’ll see some examples in future cases in which excluding outliers might make sense. The message here is to avoid doing so without good reason.

Let’s now turn to other variables in the data set.

First, we make sure none of the observations are hidden or excluded. The distribution of Age is shown in Exhibit 4.

Exhibit 4 Distribution of Age

(Use Rows > Clear Row States to unhide and unexclude.)

The oldest patient in the data set is 87, the youngest a newborn. The average age is 42.8 and the median age is 41.5 years.

The shape of this histogram is quite different from that of Amount, which was highly skewed right. Age doesn’t appear overly skewed, and the histogram is nearly symmetric. A symmetric distribution looks about the same on the right side as the left.

Now, we’ll examine the outlier box plot of Age. Once again, we’ve reproduced the box plot below. Recall that the peak of the diamond is the position of the mean. This outlier box plot tells us that the mean and median are quite close and, therefore, that the distribution is nearly symmetric. Because no points are shown beyond the whiskers, this outlier box also indicates an absence of potential outliers.

6

mean = 42.8 median = 41.5

We will next examine the distribution of Gender. Recall that for Amount and Age, which are continuous variables, we used histograms and summary statistics to characterize the shape, center and spread of the distributions. Since Gender is Nominal, we use a bar chart and a frequency distribution (Exhibit 5).

Exhibit 5 Distribution of Gender

(Analyze > Distribution)

From the bar chart and its accompanying frequency table we see that 71 of the 118 (60.2%) patients in this sample are female and 39.8% are male.

Along with bar charts, Pareto plots and pie charts can be used to display information about nominal (categorical) variables. Exhibit 6 shows a Pareto plot and pie chart for Insurance type.

Exhibit 6 Pareto Plot (Left) and Pie Chart (Right) of Insurance

(Analyze > Quality and Process > Pareto Plot, use Insurance as Y, Cause. Pie Chart is an option under the red triangle.)

7

Both plots sort the categories of the variable in descending order of frequency. Patients with private insurance coverage are the largest group in this sample, although apparently the type of insurance held by many patients is unknown. Workers compensation patients comprise the smallest group in this sample.

Now, we turn our attention to the key question being asked by management: Do any of the variables appear to influence to the size of the claim payment? Or, asked another way, are any of the variables related to payment amount? For example, do payments tend to be higher when the claimant is married? Or, are they higher for female claimants than for males?

A number of tools are available for exploring potential relationships between variables. At the end of the day, many graphical and analytic techniques may be used to explore relationships, depending on the data, the business problem, and the preferences of the analyst. In this section, we’ll use:

a. Dynamic plot-­linking b. Data Filter c. Side-­by-­Side (Comparative) Box Plots d. Graph Builder

Dynamic plot-­linking

If we select observations in a data table, those observations are also selected in all open graphs. Likewise, if we select observations in a plot, those observations are also selected in other plots and in the data table.

This dynamic linking can help us explore how different variables relate to one another. Consider the histogram of Amount and the bar graph of Gender in Exhibit 7 below. By clicking on the bar for Females, those same observations are highlighted in the histogram of Amount. Click on the bar for Males, and the observations for males are selected.

Exhibit 7 Distributions of Amount and Gender, Females

Are males and females distributed in a similar manner across the payment amounts? If so, we would conclude that Amount and Gender are not related, since males and females received roughly the same number of low, medium and high payment amounts. We explore this question further using the Data Filter.

8

The Data Filter

The Data Filter provides another method for exploring the distribution of one variable across the levels of another variable. For example, we can use the Data Filter to show the distribution of Amount for each Gender. In Exhibit 8 we see the Data Filter and results for females only.

Exhibit 8 Amount with Data Filter, Gender, Females

(Rows > Data Filter;; select Gender and click Add. Then, select Female to select the values for the females in the histogram. To update the Distribution output with the Amount values for females only, check the Include box in the data filter. Then, in the Distribution window select Automatic Recalc under the top red triangle > Script.)

When we select males in the Data Filter, the Distribution window will show only the amounts paid for males.

Exhibit 9 Amount with Data Filter, Gender, Males

Compare the output for females and males. The histograms for females and males look similar, with the possible exception of a few more extreme points for males (note that the scales are different). What about the summary statistics? The mean for males ($107,466) is much higher than for females ($80,175). But, recall that Amount is highly skewed, and extreme observations will have a large influence on the mean.

Does the information under Quantiles provide any additional insights (Exhibit 10)? Do females and males have roughly the same minimum and maximum values? What about the median and the first and third quartiles? Are they similar? In the same ball park?

9

Exhibit 10 Quantiles of Amount for Females (left) and Males (right)

From this analysis, there does not seem to be a notable difference in the distribution of Amount for males and females. Both distributions are right skewed, and the bulk of claim payments fall below $400,000 for both genders. We will examine this again in another case that uses more formal statistical methods, and will revisit this analysis in an exercise.

Side-­by-­Side (Comparative) Box Plots

Let’s now consider other variables. We’ll investigate whether payment amounts are related to whether or not a private attorney represented the claimant. In a complete analysis, we would start by exploring distributions of all variables. We’ll jump ahead and introduce a third method for comparing distribu

Homework is Completed By:

Writer Writer Name Amount Client Comments & Rating
Instant Homework Helper

ONLINE

Instant Homework Helper

$36

She helped me in last minute in a very reasonable price. She is a lifesaver, I got A+ grade in my homework, I will surely hire her again for my next assignments, Thumbs Up!

Order & Get This Solution Within 3 Hours in $25/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 3 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

Order & Get This Solution Within 6 Hours in $20/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 6 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

Order & Get This Solution Within 12 Hours in $15/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 12 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

6 writers have sent their proposals to do this homework:

Custom Coursework Service
Solution Provider
Coursework Helper
24/7 Assignment Help
Calculation Master
Smart Homework Helper
Writer Writer Name Offer Chat
Custom Coursework Service

ONLINE

Custom Coursework Service

I will provide you with the well organized and well research papers from different primary and secondary sources will write the content that will support your points.

$44 Chat With Writer
Solution Provider

ONLINE

Solution Provider

I am a PhD writer with 10 years of experience. I will be delivering high-quality, plagiarism-free work to you in the minimum amount of time. Waiting for your message.

$50 Chat With Writer
Coursework Helper

ONLINE

Coursework Helper

I am an experienced researcher here with master education. After reading your posting, I feel, you need an expert research writer to complete your project.Thank You

$27 Chat With Writer
24/7 Assignment Help

ONLINE

24/7 Assignment Help

I have read your project description carefully and you will get plagiarism free writing according to your requirements. Thank You

$30 Chat With Writer
Calculation Master

ONLINE

Calculation Master

I have written research reports, assignments, thesis, research proposals, and dissertations for different level students and on different subjects.

$41 Chat With Writer
Smart Homework Helper

ONLINE

Smart Homework Helper

I have written research reports, assignments, thesis, research proposals, and dissertations for different level students and on different subjects.

$38 Chat With Writer

Let our expert academic writers to help you in achieving a+ grades in your homework, assignment, quiz or exam.

Similar Homework Questions

Dicussion ( Management Concepts) Week 10 - Chromium vi phosphide formula - Crash rail mounting height - Lactic acid fermentation khan academy - 9 james dalton lane windera - How loud is 35 dba - America's cup the tension between technology and human decision makers - Mass volume density video - Articles using ethos pathos and logos - For anyone - Arb roof rack installation instructions - Disney in france case study answer - Where has muthoni gone - How to prepare 0.1 m potassium hydrogen phthalate - Business letter project - Australian government digital cadetship program - Am and pm anchor chart - Rapid resolution therapy training - A darker contour line usually every fifth line - Kit kat market research - Foundations of electrical networks - Political science research methods questions - Chemical compound formula for runner's high - Anth f - Biology - Food as thought mary maxfield pdf - Upper canine access cavity - 5.9 online shopping cart java - Pearson funeral home in emporia virginia - Biometrics ppt files - Running record conversion chart - Bci group 58 battery - Does mass affect the period of a pendulum - IT Doesn’t Matter - Pleasant comparative and superlative - Society of vascular technology - Mann kendall test stata - Business management study design 2019 - Lies my teacher told me chapter 4 - Australian ideal college hobart - Photosynthesis and cellular respiration practice test - Business ethics - Fundamentals of contemporary business communication 2nd ed ober 2007 - Agno3 nacl ionic equation - In the marketing management functions a swot analysis should - A6 size in mm - Strategic Human Resource Management - Homemade incubator without thermostat - What does hwarang mean in korean - Male dominance in a midsummer night's dream - Shankill wellbeing and treatment centre belfast bt13 1pd - Kp sports under armour - Working at university health network - SOCS185N: Culture and Society - Food Safety - Examples of comparison essay thesis statements - Paper - Apple retail iphone upgrade program - Pope innocent iv and the mongols - Introduction to intellectual disability - Philosophy of love syllabus - How can homework be harmful - 6 pages due by 24 hours and 48 hrs - The hollow men context - 5 moment hand hygiene - David brooks one nation slightly divisible - Ethics - Social media marketing a strategic approach edition - Basic needs of ancient communities - Teamwork - Organizational Economics DQ - Method of drawing the line in ethics - Glasgow enterprises started the period - NEED IN 8 HOURS or LESS (NO EXCEPTION) - The giver and gattaca - Poe trial of ascendancy locked door - Article - Pension data for barry financial services inc - Igcse travel and tourism revision notes - South australia stamp duty - Hays addressing model - Reaction of amide with lialh4 - Cys -d-12 - Young endeavour captains log - Watch eye of the storm documentary - Dual currency deposit explained - Paul and donna decker are married taxpayers - Ford motor company mission statement - Assignment 1 lenscrafters case study - 2016 carolina biological supply company worksheet answers - Discussion - Dermeze ointment side effects - Nyu paul mcghee division - Blank silk road map - Kg m3 to kn m3 - A surgeon is using material from a donated heart - Independent living skills assessment pdf - Kew and stredwick 2010 - Discussion 1 - Dramatic tenor vocal range