Loading...

Messages

Proposals

Stuck in your homework and missing deadline? Get urgent help in $10/Page with 24 hours deadline

Get Urgent Writing Help In Your Essays, Assignments, Homeworks, Dissertation, Thesis Or Coursework & Achieve A+ Grades.

Privacy Guaranteed - 100% Plagiarism Free Writing - Free Turnitin Report - Professional And Experienced Writers - 24/7 Online Support

6.2 4 practice modeling fitting linear models to data answers

22/11/2021 Client: muhammad11 Deadline: 2 Day

Data Science and Big Data Analytics

Chapter 6: Advanced Analytical Theory and Methods: Regression

1

Chapter Sections

6.1 Linear Regression

6.2 Logical Regression

6.3 Reasons to Choose and Cautions

6.4 Additional Regression Models

Summary

2

6 Regression

Regression analysis attempts to explain the influence that input (independent) variables have on the outcome (dependent) variable

Questions regression might answer

What is a person’s expected income?

What is probability an applicant will default on a loan?

Regression can find the input variables having the greatest statistical influence on the outcome

Then, can try to produce better values of input variables

E.g. – if 10-year-old reading level predicts students’ later success, then try to improve early age reading levels

3

6.1 Linear Regression

Models the relationship between several input variables and a continuous outcome variable

Assumption is that the relationship is linear

Various transformations can be used to achieve a linear relationship

Linear regression models are probabilistic

Involves randomness and uncertainty

Not deterministic like Ohm’s Law (V=IR)

4

6.1.1 Use Cases

Real estate example

Predict residential home prices

Possible inputs – living area, #bathrooms, #bedrooms, lot size, property taxes

Demand forecasting example

Restaurant predicts quantity of food needed

Possible inputs – weather, day of week, etc.

Medical example

Analyze effect of proposed radiation treatment

Possible inputs – radiation treatment duration, freq

5

6.1.2 Model Description

6

6.1.2 Model Description Example

Predict person’s annual income as a function of age and education

Ordinary Least Squares (OLS) is a common technique to estimate the parameters

7

6.1.2 Model Description Example

OLS

8

6.1.2 Model Description Example

9

6.1.2 Model Description With Normally Distributed Errors

Making additional assumptions on the error term provides further capabilities

It is common to assume the error term is a normally distributed random variable

Mean zero and constant variance

That is

10

With this assumption, the expected value is

And the variance is

6.1.2 Model Description With Normally Distributed Errors

11

Normality assumption with one input variable

E.g., for x=8, E(y)~20 but varies 15-25

6.1.2 Model Description With Normally Distributed Errors

12

6.1.2 Model Description Example in R

Be sure to get publisher's R downloads: http://www.wiley.com/WileyCDA/WileyTitle/productCd-111887613X.html

> income_input = as.data.frame(read.csv(“c:/data/income.csv”))

> income_input[1:10,]

> summary(income_input)

> library(lattice)

> splom(~income_input[c(2:5)], groups=NULL, data=income_input,

axis.line.tck=0, axis.text.alpha=0)

13

Scatterplot

Examine bottom line

income~age: strong + trend

income~educ: slight + trend

income~gender: no trend

6.1.2 Model Description Example in R

14

Quantify the linear relationship trends

> results <- lm(Income~Age+Education+Gender,income_input)

> summary(results)

Intercept: income of $7263 for newborn female

Age coef: ~1, year age increase -> $1k income incr

Educ coef: ~1.76, year educ + -> $1.76k income +

Gender coef: ~-0.93, male income decreases $930

Residuals – assumed to be normally distributed – vary from -37 to +37 (more information coming)

6.1.2 Model Description Example in R

15

Examine residuals – uncertainty or sampling error

Small p-values indicate statistically significant results

Age and Education highly significant, p<2e-16

Gender p=0.13 large, not significant at 90% confid. level

Therefore, drop variable gender from linear model

> results2 <- lm(Income~Age+Education,income_input)

> summary(results) # results about same as before

Residual standard error: residual standard deviation

R-squared (R2): variation of data explained by model

Here ~64% (R2 = 1 means model explains data perfectly)

F-statistic: tests entire model – here p value is small

6.1.2 Model Description Example in R

16

6.1.2 Model Description Categorical Variables

In the example in R, Gender is a binary variable

Variables like Gender are categorical variables in contrast to numeric variables where numeric differences are meaningful

The book section discusses how income by state could be implemented

17

6.1.2 Model Description Confidence Intervals on the Parameters

Once an acceptable linear model is developed, it is often useful to draw some inferences

R provides confidence intervals using confint() function

> confint(results2, level = .95)

For example, Education coefficient was 1.76, and now the corresponding 95% confidence interval is (1.53. 1.99)

18

6.1.2 Model Description Confidence Interval on Expected Outcome

In the income example, the regression line provides the expected income for a given Age and Education

Using the predict() function in R, a confidence interval on the expected outcome can be obtained

> Age <- 41

> Education <- 12

> new_pt <- data.frame(Age, Education)

> conf_int_pt <- predict(results2,new_pt,level=.95,

interval=“confidence”)

> conf_int_pt

Expected income = $68699, conf interval ($67831,$69567)

19

6.1.2 Model Description Prediction Interval on a Particular Outcome

The predict() function in R also provides upper/lower bounds on a particular outcome, prediction intervals

> pred_int_pt <- predict(results2,new_pt,level=.95,

interval=“prediction”)

> pred_int_pt

Expected income = $68699, pred interval ($44988,$92409)

This is a much wider interval because the confidence interval applies to the expected outcome that falls on the regression line, but the prediction interval applies to an outcome that may appear anywhere within the normal distribution

20

6.1.3 Diagnostics Evaluating the Linearity Assumption

A major assumption in linear regression modeling is that the relationship between the input and output variables is linear

The most fundamental way to evaluate this is to plot the outcome variable against each income variable

In the following figure a linear model would not apply

In such cases, a transformation might allow a linear model to apply

Class of dataset Groceries is transactions, containing 3 slots

transactionInfo # data frame with vectors having length of transactions

itemInfo # data frame storing item labels

data # binary evidence matrix of labels in transactions

> Groceries@itemInfo[1:10,]

> apply(Groceries@data[,10:20],2,function(r) paste(Groceries@itemInfo[r,"labels"],collapse=", "))

21

6.1.3 Diagnostics Evaluating the Linearity Assumption

Income as a quadratic function of Age

22

6.1.3 Diagnostics Evaluating the Residuals

The error terms was assumed to be normally distributed with zero mean and constant variance

> with(results2,{plot(fitted.values,residuals,ylim=c(-40,40)) })

23

6.1.3 Diagnostics Evaluating the Residuals

Next four figs don’t fit zero mean, const variance assumption

Nonlnear trend in residuals

Residuals not centered on zero

24

6.1.3 Diagnostics Evaluating the Residuals

Variance not

constant

Residuals not centered on zero

25

6.1.3 Diagnostics Evaluating the Normality Assumption

The normality assumption still has to be validate

> hist(results2$residuals)

Residuals centered on zero and appear normally distributed

26

6.1.3 Diagnostics Evaluating the Normality Assumption

Another option is to examine a Q-Q plot comparing observed data against quantiles (Q) of assumed dist

> qqnorm(results2$residuals)

> qqline(results2$residuals)

27

6.1.3 Diagnostics Evaluating the Normality Assumption

Normally distributed residuals

Non-normally distributed residuals

28

6.1.3 Diagnostics N-Fold Cross-Validation

To prevent overfitting, a common practice splits the dataset into training and test sets, develops the model on the training set and evaluates it on the test set

If the quantity of the dataset is insufficient for this, an N-fold cross-validation technique can be used

Dataset randomly split into N dataset of equal size

Model trained on N-1 of the sets, tested on remaining one

Process repeated N times

Average the N model errors over the N folds

Note: if N = size of dataset, this is leave-one-out procedure

29

6.1.3 Diagnostics Other Diagnostic Considerations

The model might be improved by including additional input variables

However, the adjusted R2 applies a penalty as the number of parameters increases

Residual plots should be examined for outliers

Points markedly different from the majority of points

They result from bad data, data processing errors, or actual rare occurrences

Finally, the magnitude and signs of the estimated parameters should be examined to see if they make sense

30

6.2 Logistic Regression Introduction

In linear regression modeling, the outcome variable is continuous – e.g., income ~ age and education

In logistic regression, the outcome variable is categorical, and this chapter focuses on two-valued outcomes like true/false, pass/fail, or yes/no

31

6.2.1 Logistic Regression Use Cases

Medical

Probability of a patient’s successful response to a specific medical treatment – input could include age, weight, etc.

Finance

Probability an applicant defaults on a loan

Marketing

Probability a wireless customer switches carriers (churns)

Engineering

Probability a mechanical part malfunctions or fails

32

6.2.2 Logistic Regression Model Description

Logical regression is based on the logistic function

As y -> infinity, f(y)->1; and as y->-infinity, f(y)->0

33

6.2.2 Logistic Regression Model Description

With the range of f(y) as (0,1), the logistic function models the probability of an outcome occurring

In contrast to linear regression, the values of y are not directly observed; only the values of f(y) in terms of success or failure are observed.

Called log odds ratio, or logit of p.

Maximum Likelihood Estimation (MLE) is used to estimate model parameters. MLR is beyond the scope of this book.

34

6.2.2 Logistic Regression Model Description: customer churn example

A wireless telecom company estimates probability of a customer churning (switching companies)

Variables collected for each customer: age (years), married (y/n), duration as customer (years), churned contacts (count), churned (true/false)

After analyzing the data and fitting a logical regression model, age and churned contacts were selected as the best predictor variables

35

6.2.2 Logistic Regression Model Description: customer churn example

36

6.2.3 Diagnostics Model Description: customer churn example

> head(churn_input) # Churned = 1 if cust churned

> sum(churn_input$Churned) # 1743/8000 churned

Use the Generalized Linear Model function glm()

> Churn_logistic1<-glm(Churned~Age+Married+Cust_years+Churned_contacts,data=churn_input,family=binomial(link=“logit”))

> summary(Churn_logistic1) # Age + Churned_contacts best

> Churn_logistic3<-glm(Churned~Age+Churned_contacts,data=churn_input,family=binomial(link=“logit”))

> summary(Churn_logistic3) # Age + Churned_contacts

37

6.2.3 Diagnostics Deviance and the Pseudo-R2

In logistic regression, deviance = -2logL

where L is the maximized value of the likelihood function used to obtain the parameter estimates

Two deviance values are provided

Null deviance = deviance based on only the y-intercept term

Residual deviance = deviance based on all parameters

Pseudo-R2 measures how well fitted model explains the data

Value near 1 indicates a good fit over the null model

38

6.2.3 Diagnostics Receiver Operating Characteristic (ROC) Curve

Logistic regression is often used to classify

In the Churn example, a customer can be classified as Churn if the model predicts high probability of churning

Although 0.5 is often used as the probability threshold, other values can be used based on desired error tradeoff

For two classes, C and nC, we have

True Positive: predict C, when actually C

True Negative: predict nC, when actually nC

False Positive: predict C, when actually nC

False Negative: predict nC, when actually C

39

6.2.3 Diagnostics Receiver Operating Characteristic (ROC) Curve

The Receiver Operating Characteristic (ROC) curve

Plots TPR against FPR

40

6.2.3 Diagnostics Receiver Operating Characteristic (ROC) Curve

> library(ROCR)

> Pred = predict(Churn_logistic3, type=“response”)

41

6.2.3 Diagnostics Receiver Operating Characteristic (ROC) Curve

42

6.2.3 Diagnostics Histogram of the Probabilities

It is interesting to visualize the counts of the customers who churned and who didn’t churn against the estimated churn probability.

43

6.3 Reasons to Choose and Cautions

Linear regression – outcome variable continuous

Logistic regression – outcome variable categorical

Both models assume a linear additive function of the inputs variables

If this is not true, the models perform poorly

In linear regression, the further assumption of normally distributed error terms is important for many statistical inferences

Although a set of input variables may be a good predictor of an output variable, “correlation does not imply causation”

44

6.4 Additional Regression Models

Multicollinearity is the condition when several input variables are highly correlated

This can lead to inappropriately large coefficients

To mitigate this problem

Ridge regression applies a penalty based on the size of the coefficients

Lasso regression applies a penalty proportional to the sum of the absolute values of the coefficients

Multinomial logistic regression – used for a more-than-two-state categorical outcome variable

45

Homework is Completed By:

Writer Writer Name Amount Client Comments & Rating
Instant Homework Helper

ONLINE

Instant Homework Helper

$36

She helped me in last minute in a very reasonable price. She is a lifesaver, I got A+ grade in my homework, I will surely hire her again for my next assignments, Thumbs Up!

Order & Get This Solution Within 3 Hours in $25/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 3 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

Order & Get This Solution Within 6 Hours in $20/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 6 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

Order & Get This Solution Within 12 Hours in $15/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 12 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

6 writers have sent their proposals to do this homework:

Finance Master
Homework Tutor
Assignment Helper
Premium Solutions
Peter O.
Instant Assignment Writer
Writer Writer Name Offer Chat
Finance Master

ONLINE

Finance Master

I have done dissertations, thesis, reports related to these topics, and I cover all the CHAPTERS accordingly and provide proper updates on the project.

$26 Chat With Writer
Homework Tutor

ONLINE

Homework Tutor

I have written research reports, assignments, thesis, research proposals, and dissertations for different level students and on different subjects.

$43 Chat With Writer
Assignment Helper

ONLINE

Assignment Helper

I have read your project description carefully and you will get plagiarism free writing according to your requirements. Thank You

$50 Chat With Writer
Premium Solutions

ONLINE

Premium Solutions

After reading your project details, I feel myself as the best option for you to fulfill this project with 100 percent perfection.

$19 Chat With Writer
Peter O.

ONLINE

Peter O.

I have read your project description carefully and you will get plagiarism free writing according to your requirements. Thank You

$48 Chat With Writer
Instant Assignment Writer

ONLINE

Instant Assignment Writer

I can assist you in plagiarism free writing as I have already done several related projects of writing. I have a master qualification with 5 years’ experience in; Essay Writing, Case Study Writing, Report Writing.

$24 Chat With Writer

Let our expert academic writers to help you in achieving a+ grades in your homework, assignment, quiz or exam.

Similar Homework Questions

Nouns verbs adjectives and adverbs - A bicycle component manufacturer produces hubs for bike wheels - Ignition coil f secondary circuit location - Biotic factors vs abiotic factors - A firm evaluates all of its projects by applying the irr rule - Harvard 3 minute step test - Disadvantages of break even analysis - List of nfl quarterbacks by salary - Call of the wild pages - Glasgow royal infirmary ward 43 - Minimum pitch for custom orb - Accounting 201 Rephrase in own words - Oedipus the king research paper topics - MKTG201 Week 2 Assignment 2 - Discussion - Black ridge generator brg 800 manual - Ght outside normal limits - Research Methods in Criminal Justice – Chapter 12 Questions. - Business Intellegence - Can a pmo accelerate the implementation process discussion - Instant displays co uk - Round white pill with blue specks fr - Pearl e white orthodontist specializes in correcting misaligned teeth - Cherubim and seraphim theology - O'brien's test shoulder doc - Builders warranty insurance victoria - Capsim emergency loan help - Alvar aalto between humanism and materialism pdf - Ethics in Criminal Justice: Minimum of 1600 words between all question, Must have Minimum of four scholarly sources. - Bmw mini big decisions under the brexit cloud case study - How to cite pcaob auditing standards apa - ASSIGNMENT 04 - Qualitative analysis lab procedure - What is the poem havisham about - Assignment: Pre-Employment Test Suitability - Why dedendum is greater than addendum - Netmba value chain - Baking soda and iodine chemical or physical - Ethics theory and contemporary issues 7th edition pdf - 35 48 in simplest form - Coca cola company mission vision and objectives - Using the Framingham Heart Study dataset provided, perform the ANOVA multivariable linear regression analysis using BMI as a continuous variable. - Good conflicts for short stories - Psychiatric notes - Whats the average phone bill - Keune permanent hair color chart - 4x 9y 27 - Greatest discoveries with bill nye genetics - Master business license wa - Bsbldr501 answers - Christ as good shepherd mausoleum of galla placidia - Abstudy incidentals allowance form - Diabetes type 2 - Physiology - Essay - Ronald mcdonald house annual report - Caroline bowen phonetic development - 2 page - Big data analytics lab exercises - Just part B - Shadow health tina jones neurological - Words that rhyme with homework - Experience hendrix sound concert in seattle paramount theatre september 1 - Myra levine's conservation model - Explain using examples the impact of individual perception on morale - Sas proc report across - Graph the curve r 1 cos2 9θ - The traditional hotel industry - Spontaneous communitas abolishes status. - Sparknotes pride and prejudice - How to calculate the rf value of amino acids - 5 dysfunctions of a team avoidance of accountability - Plate tectonics test study guide answers - Ct shirts 3 for 99 2018 - Data Leakage Research Paper - The sun has long been set - The american family by stephanie coontz - Followership and servant leadership compare and contrast army - 1: Describe at least 4 things that the earliest civilizations - Team Case Study Paper on KFC - Order 2436110: Comprehensive Discipline Management Plan Investigation - Touchstone 3 student book answer key pdf - Finite element analysis questions and answers pdf - Case 5 7 diamond foods accounting for nuts - Snapper rocks sea baths - Little red riding hood script - 100 points of identification nsw - Cooling the lava john mcphee - Hpt licence application form - Schneider electric press release - Solar accreditation course melbourne - Renoir the large bathers 1887 - Singapore airlines case study solution - Which product of prime polynomials is equivalent to 8x3 2x - Australian institute of embalmers - Psychsim 5 your mind on drugs answers - Was herb the right person to be assigned as the project manager? - Practicum – Assessing Client Family Progress - Calculate the required rate of return for manning enterprises - Harry potter lesson plan pdf