Loading...

Messages

Proposals

Stuck in your homework and missing deadline? Get urgent help in $10/Page with 24 hours deadline

Get Urgent Writing Help In Your Essays, Assignments, Homeworks, Dissertation, Thesis Or Coursework & Achieve A+ Grades.

Privacy Guaranteed - 100% Plagiarism Free Writing - Free Turnitin Report - Professional And Experienced Writers - 24/7 Online Support

How to find outliers in jmp

01/12/2021 Client: muhammad11 Deadline: 2 Day

Case 1 -­ Medical Malpractice: Descriptive Statistics, Graphics, and Exploratory Data Analysis

Marlene Smith, University of Colorado Denver Business School

2

Medical Malpractice: Descriptive Statistics, Graphics, and Exploratory Data Analysis

Background

According to a recent study published in the US News and World Report the cost of medical malpractice in the United States is $55.6 billion a year, which is 2.4 percent of annual health-­care spending. Another 2011 study published in the New England Journal of Medicine revealed that annually, during the period 1991 to 2005, 7.4% of all physicians licensed in the US had a malpractice claim. These staggering numbers not only contribute to the high cost of health care, but the size of successful malpractice claims also contributes to high premiums for medical malpractice insurance.

An insurance company wants to develop a better understanding of its claims paid out for medical malpractice lawsuits. Its records show claim payment amounts, as well as information about the presiding physician and the claimant for a number of recently adjudicated or settled lawsuits.

The Task

Using descriptive statistics and graphical displays, explore claim payment amounts, and identify factors that appear to influence the amount of the payment.

The Data MedicalMalpractice.jmp

The data set contains information about the last 118 claim payments made, covering a six month period. The eight variables in the data table are described below:

Amount Amount of the claim payment in dollars Severity The severity rating of damage to the patient, from 1 (emotional trauma) to 9

(death) Age Age of the claimant in years Private Attorney Whether the claimant was represented by a private attorney Marital Status Marital status of the claimant Specialty Specialty of the physician involved in the lawsuit Insurance Type of medical insurance carried by the patient Gender Patient Gender

The variables are coded in JMP with a Continuous, Ordinal or Nominal modeling type. This coding helps to make sure that JMP performs the correct analysis and produces appropriate graphs.

A first step in any analysis is to ensure that your variables have the correct Modeling Type:

• Continuous variables, like Amount, have numeric values (e.g.;; 2, 5, 3.35, 159.667,…). • Ordinal variables, such as Severity, have either numeric or character values which represent

ordered categories (e.g.;; small, medium and large;; 1-­9 severity rating scales,…). • Nominal variables, like Gender, can also have either numeric or character values, and represent

unordered categories or labels (e.g.;; the names of states, colors of M&Ms, machine numbers,…).

3

mean = 91045 median = 22750

Analysis

We begin by looking at the key variable of interest, the amount of claim payment. Exhibit 1 displays a histogram and summary statistics for Amount.

Exhibit 1 Distribution of Amount

(Analyze > Distribution;; Select Amount as Y, Columns, and click OK. For a horizontal layout select Stack under the top red triangle.)

From Exhibit 1 we see that the histogram of Amount is skewed right, meaning that there is a long tail, with several very high payments. The mean (average) payment is $91,045, while the median (middle) is $22,750. When a histogram is right skewed, as is the case here, the mean will exceed the median. This is because the mean is influenced by extreme values – the high payments that we observe in the histogram inflate the mean.

A measure of the spread of the data is the standard deviation (StdDev in Exhibit 1). The higher the standard deviation, the larger the spread, or variation, in the data. When the data are skewed, the standard deviation, like the mean, will be inflated.

Other useful summary statistics are the quartiles. The first quartile (next to 25.0% in Exhibit 1) is $7,500 and the third quartile (next to 75.0%) is $92,670. The interquartile range, defined as Q3 – Q1, is a measure of the amount of spread or variability in the middle 50% of the data. This value is displayed graphically in the outlier box plot (above the histogram). A larger version of this plot is displayed below.

The left edge of the box is the first quartile, the center line is the median or second quartile, and the right edge of the box is the third quartile. Hence, the width of the box is the interquartile range, or IQR.

4

(Notes: The center of the diamond is the mean. We will discuss this in a few moments. The red bracket at the top, which we won’t discuss further, denotes the “densest” region of the data.)

The outlier box plot helps us to visually identify potential outliers. The rule of thumb used to distinguish outliers from non-­outliers is this: if the histogram is approximately normal, or bell-­shaped, outliers are those points that extend beyond 1.5 IQRs of the box. The line extending from the right edge of the box, called a whisker, is roughly 1.5 IQRs in length (we say “roughly”, because it is actually drawn to the furthest point within that range, so it may not be quite 1.5 IQRs).

Let’s ignore, for sake of illustration, the fact that our data are right skewed. There are 16 points beyond the whisker, which we will consider to be outliers. In this case, the outliers are those points that are much larger than the rest.

Having identified several outliers, what should we do about them? Let’s consider removing them from the analysis. To do so, we will hide and exclude the points (rather than simply deleting them). Hide removes points from graphs, while Exclude removes them from future calculations.

Exhibit 2 is the new histogram for Amount after excluding and hiding the 16 outliers.

Exhibit 2 Amount after excluding and hiding 16 outliers

(To exclude and hide, draw a box around the points in the boxplot to select them. Then, select Rows > Hide and Exclude. Return to Analyze > Distribution and re-­generate the histogram.)

Note that there are now seven (7) new outliers! We might as well get rid of those seven outliers as well. The result is shown in Exhibit 3.

Exhibit 3 Amount after excluding and hiding a total 23 outliers

OK, so now we have six more outliers. How long can this game go on? You’re welcome to continue excluding and hiding outliers as you see fit. Or perhaps you’ve gotten the message: discarding outliers

5

from a skewed distribution is an exercise in futility, since observations that didn’t stand out at first will appear to be outliers after excluding the most extreme observations. Removing observations in this situation just forces other observations to take their place.

There’s an even more important reason not to exclude outliers from the analysis. There’s nothing wrong with those “outliers” — they’re just bigger than most of the other payments. By excluding the 23 outliers, we have removed the really high claim payments made by the insurance company. The average calculated on the remaining observations is $28,306, a number less than one-­third the original average. Imagine that the company uses the average and range of the truncated data set to forecast future payments. Upper management will be unpleasantly surprised to find many year-­end actual payments greatly exceeding the predicted payments and you, as the firm statistician, may well be out of a job.

In other words, why discard data points just because they’re unusual or inconvenient? There is great danger in the knee-­jerk exclusion of outliers. We’ll see some examples in future cases in which excluding outliers might make sense. The message here is to avoid doing so without good reason.

Let’s now turn to other variables in the data set.

First, we make sure none of the observations are hidden or excluded. The distribution of Age is shown in Exhibit 4.

Exhibit 4 Distribution of Age

(Use Rows > Clear Row States to unhide and unexclude.)

The oldest patient in the data set is 87, the youngest a newborn. The average age is 42.8 and the median age is 41.5 years.

The shape of this histogram is quite different from that of Amount, which was highly skewed right. Age doesn’t appear overly skewed, and the histogram is nearly symmetric. A symmetric distribution looks about the same on the right side as the left.

Now, we’ll examine the outlier box plot of Age. Once again, we’ve reproduced the box plot below. Recall that the peak of the diamond is the position of the mean. This outlier box plot tells us that the mean and median are quite close and, therefore, that the distribution is nearly symmetric. Because no points are shown beyond the whiskers, this outlier box also indicates an absence of potential outliers.

6

mean = 42.8 median = 41.5

We will next examine the distribution of Gender. Recall that for Amount and Age, which are continuous variables, we used histograms and summary statistics to characterize the shape, center and spread of the distributions. Since Gender is Nominal, we use a bar chart and a frequency distribution (Exhibit 5).

Exhibit 5 Distribution of Gender

(Analyze > Distribution)

From the bar chart and its accompanying frequency table we see that 71 of the 118 (60.2%) patients in this sample are female and 39.8% are male.

Along with bar charts, Pareto plots and pie charts can be used to display information about nominal (categorical) variables. Exhibit 6 shows a Pareto plot and pie chart for Insurance type.

Exhibit 6 Pareto Plot (Left) and Pie Chart (Right) of Insurance

(Analyze > Quality and Process > Pareto Plot, use Insurance as Y, Cause. Pie Chart is an option under the red triangle.)

7

Both plots sort the categories of the variable in descending order of frequency. Patients with private insurance coverage are the largest group in this sample, although apparently the type of insurance held by many patients is unknown. Workers compensation patients comprise the smallest group in this sample.

Now, we turn our attention to the key question being asked by management: Do any of the variables appear to influence to the size of the claim payment? Or, asked another way, are any of the variables related to payment amount? For example, do payments tend to be higher when the claimant is married? Or, are they higher for female claimants than for males?

A number of tools are available for exploring potential relationships between variables. At the end of the day, many graphical and analytic techniques may be used to explore relationships, depending on the data, the business problem, and the preferences of the analyst. In this section, we’ll use:

a. Dynamic plot-­linking b. Data Filter c. Side-­by-­Side (Comparative) Box Plots d. Graph Builder

Dynamic plot-­linking

If we select observations in a data table, those observations are also selected in all open graphs. Likewise, if we select observations in a plot, those observations are also selected in other plots and in the data table.

This dynamic linking can help us explore how different variables relate to one another. Consider the histogram of Amount and the bar graph of Gender in Exhibit 7 below. By clicking on the bar for Females, those same observations are highlighted in the histogram of Amount. Click on the bar for Males, and the observations for males are selected.

Exhibit 7 Distributions of Amount and Gender, Females

Are males and females distributed in a similar manner across the payment amounts? If so, we would conclude that Amount and Gender are not related, since males and females received roughly the same number of low, medium and high payment amounts. We explore this question further using the Data Filter.

8

The Data Filter

The Data Filter provides another method for exploring the distribution of one variable across the levels of another variable. For example, we can use the Data Filter to show the distribution of Amount for each Gender. In Exhibit 8 we see the Data Filter and results for females only.

Exhibit 8 Amount with Data Filter, Gender, Females

(Rows > Data Filter;; select Gender and click Add. Then, select Female to select the values for the females in the histogram. To update the Distribution output with the Amount values for females only, check the Include box in the data filter. Then, in the Distribution window select Automatic Recalc under the top red triangle > Script.)

When we select males in the Data Filter, the Distribution window will show only the amounts paid for males.

Exhibit 9 Amount with Data Filter, Gender, Males

Compare the output for females and males. The histograms for females and males look similar, with the possible exception of a few more extreme points for males (note that the scales are different). What about the summary statistics? The mean for males ($107,466) is much higher than for females ($80,175). But, recall that Amount is highly skewed, and extreme observations will have a large influence on the mean.

Does the information under Quantiles provide any additional insights (Exhibit 10)? Do females and males have roughly the same minimum and maximum values? What about the median and the first and third quartiles? Are they similar? In the same ball park?

9

Exhibit 10 Quantiles of Amount for Females (left) and Males (right)

From this analysis, there does not seem to be a notable difference in the distribution of Amount for males and females. Both distributions are right skewed, and the bulk of claim payments fall below $400,000 for both genders. We will examine this again in another case that uses more formal statistical methods, and will revisit this analysis in an exercise.

Side-­by-­Side (Comparative) Box Plots

Let’s now consider other variables. We’ll investigate whether payment amounts are related to whether or not a private attorney represented the claimant. In a complete analysis, we would start by exploring distributions of all variables. We’ll jump ahead and introduce a third method for comparing distribu

Homework is Completed By:

Writer Writer Name Amount Client Comments & Rating
Instant Homework Helper

ONLINE

Instant Homework Helper

$36

She helped me in last minute in a very reasonable price. She is a lifesaver, I got A+ grade in my homework, I will surely hire her again for my next assignments, Thumbs Up!

Order & Get This Solution Within 3 Hours in $25/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 3 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

Order & Get This Solution Within 6 Hours in $20/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 6 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

Order & Get This Solution Within 12 Hours in $15/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 12 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

6 writers have sent their proposals to do this homework:

Custom Coursework Service
Solution Provider
Coursework Helper
24/7 Assignment Help
Calculation Master
Smart Homework Helper
Writer Writer Name Offer Chat
Custom Coursework Service

ONLINE

Custom Coursework Service

I will provide you with the well organized and well research papers from different primary and secondary sources will write the content that will support your points.

$44 Chat With Writer
Solution Provider

ONLINE

Solution Provider

I am a PhD writer with 10 years of experience. I will be delivering high-quality, plagiarism-free work to you in the minimum amount of time. Waiting for your message.

$50 Chat With Writer
Coursework Helper

ONLINE

Coursework Helper

I am an experienced researcher here with master education. After reading your posting, I feel, you need an expert research writer to complete your project.Thank You

$27 Chat With Writer
24/7 Assignment Help

ONLINE

24/7 Assignment Help

I have read your project description carefully and you will get plagiarism free writing according to your requirements. Thank You

$30 Chat With Writer
Calculation Master

ONLINE

Calculation Master

I have written research reports, assignments, thesis, research proposals, and dissertations for different level students and on different subjects.

$41 Chat With Writer
Smart Homework Helper

ONLINE

Smart Homework Helper

I have written research reports, assignments, thesis, research proposals, and dissertations for different level students and on different subjects.

$38 Chat With Writer

Let our expert academic writers to help you in achieving a+ grades in your homework, assignment, quiz or exam.

Similar Homework Questions

West coast transit case study - How is energy stored in a plant - Broward college architecture - Ferrari ipo analysis - Pink puffers blue bloaters - Do food safely assessment answers - Safe at work self assessment test - Ns-final-p - Bromine water test for saturated and unsaturated compounds - Linear pair theorem example - Sdtm implementation guide 3.2 pdf - Subjective and objective data for diabetes mellitus - A Right to Experimental Drugs? - Night elie wiesel title meaning - Business Intelligence - MYSQL & DATABASE - Why does paris attack romeo at the tomb - Client consultation form template word - Ilearn qld health login - General capabilities victorian curriculum - What goes around comes around this adage characterizes kohlberg's - Insurance institute of ireland - Hardware, Software, and Network Requirements - Assignment Mini- Study Part I and Part II - It general controls review - 107nurw2 - I posted earlier but did not respond because I thought i would be able to complete it. I am unable to get done by midnight pacific time. Can someone please help and get this done by midinight today? - Film study guide the wizard of oz answers - Service and installation rules nsw - 250 words and two scholarly sources - POLICE REFORM - Bullet ant glove rite of passage - Essays guru only - Johns hopkins hospital financial statements 2018 - Bill king aged care facility - Multicultural health ritter pdf - Character letter for court doc - Which of the following activities is a reverse-flow channel of marketing? - Bariatric facility equipment - Welding terminology quiz answers - Mount baw baw accommodation - Paper airplane math lesson - Rainbow sentences in english - Nova lab mission 5 answers - Virtual team - Uipath latest version 2020 - As quiet as a lamb sentence - Which of the following statements is true about the glycocalyx - ADVANCED ENGINEERING MATHEMATICS BY ERWIN KREYSZIG 10TH EDITION SOLUTION - Only one accessory can be used at a time - Alibaba case study answers - Forgiving my father lucille clifton - Eddies neighbor marjorie steals a laptop - Other specified trauma and stressor related disorder icd 10 - Dissertation Topic for Cloud Computing 300-400 words - 7-1 Activity: Multimedia Presentation Planning Worksheet - Describe how nonverbal feedback conveys powerful messages - Statistics - How to breed clownfish dragon in dragon mania legends - If we must die poem answer key - Key concepts of risk and quality management in healthcare - Force vs displacement graph find velocity - Map application process of arapu - Tameside hospital switchboard number - Article Critique organizational behavior - All of the following are true regarding prepaid expenses except: - Sample team charter statement - Apogee field panel user's manual 125 3000 - Initiation phase in negotiation - P3 - Thornfield hall governess crossword - Security Architecture - Discussion - Sum of squares in excel - 2 Response to Discussion questions - A glass sphere with a radius of 15.0 cm - Volume of truncated cone - Riverbed modeler academic edition 17.5 license - Bsbmgt403 assessment answers - Theory - Peer evaluation sample essay - Silver acetate and sodium chloride - Sbs board thickness chart - Is graphite homogeneous or heterogeneous - In etruscan iconography which of the following symbolized regeneration - Give the systematic iupac name for the following - Stony brook university police exam - Rose bruford student accommodation - Excel project - I need 10 thousand words on instruction semantic web technology and ontology modelling - Animal farm important events - What does the concept of free cash flow represent - Stagnation definition erikson - What are pe eyes - With your ingenuity crochet dress in white - Ethics handbook for energy healing practitioners - How to create use case diagram in rational rose - Critical thinking in consumer behavior cases and experiential exercises answers - Factor pairs of 2 - Little red riding hood script - Cisco asr1001 x ios upgrade procedure