Loading...

Messages

Proposals

Stuck in your homework and missing deadline? Get urgent help in $10/Page with 24 hours deadline

Get Urgent Writing Help In Your Essays, Assignments, Homeworks, Dissertation, Thesis Or Coursework & Achieve A+ Grades.

Privacy Guaranteed - 100% Plagiarism Free Writing - Free Turnitin Report - Professional And Experienced Writers - 24/7 Online Support

Scree plot python

07/11/2020 Client: papadok01 Deadline: 10 Days

#This assignment is a tutorial so have a bit of fun with it

#If you would like to explore some additional options give it a try

#Goal is to provide some meaningful info to the restaurant owner

#Some notes below

#Both PCA and FA provide useful summary info for multivariate data, but

#all of the original variables are needed for their calculation, so

#the big question is can we use them to find a subset of variables to

#predict overall score?

#Also,trying to give meaningful labels to components is really hard.

#When the variables are on different scales you need to work with the

#correlation matrix. For this assignment they are on same scale so

#we will work with the raw data.

#PCA only helps if the original variables are correlated, if they

#are independent PCA will not help.

#Approach takes two steps

#First step find the dimensionality of the data, that is the

#number of original variables to be retained

#Second step find which ones, more on this below

# import packages for this example

import pandas as pd

import numpy as np # arrays and math functions

import matplotlib.pyplot as plt # static plotting

from sklearn.decomposition import PCA, FactorAnalysis

import statsmodels.formula.api as smf # R-like model specification

#Set some display options

pd.set_option('display.notebook_repr_html', False)

pd.set_option('display.max_columns', 40)

pd.set_option('display.max_rows', 10)

pd.set_option('display.width', 120)

#Read in the restaurant dataset

food_df = pd.read_csv('C:/Users/Jahee Koo/Desktop/MSPA/2018_Winter_410_regression/HW03 PCA/FACTOR1.csv')

#A good step to take is to convert all variable names to lower case

food_df.columns = [s.lower() for s in food_df.columns]

print(food_df)

print('')

print('----- Summary of Input Data -----')

print('')

# show the object is a DataFrame

print('Object type: ', type(food_df))

# show number of observations in the DataFrame

print('Number of observations: ', len(food_df))

# show variable names

print('Variable names: ', food_df.columns)

# show descriptive statistics

print(food_df.describe())

# show a portion of the beginning of the DataFrame

print(food_df.head())

#look at correlation structure

cdata = food_df.loc[:,['overall','taste','temp','freshness','wait','clean','friend','location','parking','view']]

corr = cdata[cdata.columns].corr()

print(corr)

#Use the correlation matrix to help provide advice to the restaurant owner

#Look at four different models and compare them

#Which model do you think is best and why?

#Model 1 full regression model

#Model 2 select my reduced regression model taste, wait and location

#Model 3 Full PCA model

#Model 4 Reduced PCA model with parking, taste and clean

#Model 5 FA model

#First find the PCA

#Second find the FA

#Run the models

#Compare the models and show VIF for each model

#PCA

print('')

print('----- Principal Component Analysis -----')

print('')

pca_data = food_df.loc[:,['taste','temp','freshness','wait','clean','friend','location','parking','view']]

pca = PCA()

P = pca.fit(pca_data)

print(pca_data)

np.set_printoptions(threshold=np.inf)

np.around([pca.components_], decimals=3)

#Note per Everett p209 pick the three variables with the largest

#absolute coefficient on the component not already picked

#So, choose parking, taste and clean for the PCA variable reduction model

# show summary of pca solution

pca_explained_variance = pca.explained_variance_ratio_

print('Proportion of variance explained:', pca_explained_variance)

# note that principal components analysis corresponds

# to finding eigenvalues and eigenvectors of the pca_data

pca_data_cormat = np.corrcoef(pca_data.T)

eigenvalues, eigenvectors = np.linalg.eig(pca_data_cormat)

np.around([eigenvalues], decimals=3)

print('Linear algebra demonstration: Proportion of variance explained: ',

eigenvalues/eigenvalues.sum())

np.around([eigenvectors], decimals=3)

# show the plot for the pricipal component analysis

plt.bar(np.arange(len(pca_explained_variance)), pca_explained_variance,

color = 'grey', alpha = 0.5, align = 'center')

plt.title('PCA Proportion of Total Variance')

plt.show()

# show a scree plot

d = {'eigenvalues': eigenvalues }

df1 = pd.DataFrame(data=d)

df2 =pd.Series([1,2,3,4,5,6,7,8,9])

#df2 = {'factors': factors}

# merge eigenvalues with # of factors

result = pd.concat([df1, df2], axis=1, join_axes=[df2.index])

print (result)

def scat(dataframe,var1,var2):

dataframe[var2].plot()

plt.title('Scree Plot')

plt.xlabel('# of factors')

plt.ylabel('Eigenvalues')

scat(result,'0','eigenvalues')

plt.show()

# provide partial listing of variable loadings on principal components

# transpose for variables by components listing

pca_loadings = pca.components_.T

# provide full formatted listing of loadings for first three components

# print loadings while rounding to three digits

# and suppress printing of very small numbers

# but do not suppress printing of zeroes

np.set_printoptions(precision = 3, suppress = True,

formatter={'float': '{: 0.3f}'.format})

print(pca_loadings[:,0:3])

# compute full set of principal components (scores)

C = pca.transform(pca_data)

print(C)

# add first three principal component scores to the original data frame

pca_data['pca1'] = C[:,0]

pca_data['pca2'] = C[:,1]

pca_data['pca3'] = C[:,2]

print(pca_data)

# add first three principal component scores to the food_df

food_df['pca1'] = C[:,0]

food_df['pca2'] = C[:,1]

food_df['pca3'] = C[:,2]

print(food_df)

# explore relationships between pairs of principal components

# working with the first three components only

pca_scores = pca_data.loc[:,['pca1','pca2', 'pca3']]

pca_model_cormat = \

np.corrcoef(pca_scores.as_matrix().transpose()).round(decimals=3)

print(pca_model_cormat)

#Looks like that worked

#Factor Analysis

print('')

print('----- Factor Analysis (Unrotated) -----')

print('')

# assume three factors will be sufficient

# this is an unrotated orthogonal solution

# maximum likelihood estimation is employed

# for best results set tolerance low and max iterations high

fa = FactorAnalysis(n_components = 3, tol=1e-8, max_iter=1000000)

#the unrotated solution

fa.fit(pca_data)

# retrieve the factor loadings as an array of arrays

# transpose for variables by factors listing of loadings

fa_loadings = fa.components_.T

print(fa_loadings)

# show the loadings of the variables on the factors

# for the unrotated maximum likelihood solution

# print loadings while rounding to three digits

# and suppress printing of very small numbers

# but do not suppress printing of zeroes

np.set_printoptions(precision = 3, suppress = True,

formatter={'float': '{: 0.3f}'.format})

print(fa_loadings)

# compute full set of factor scores

F = fa.transform(pca_data)

print(F)

# add factor scores to the original data frame

food_df['fa1'] = F[:,0]

food_df['fa2'] = F[:,1]

food_df['fa3'] = F[:,2]

print(food_df)

#Look at five different models and compare them

#Which model do you think is best and why?

#Model 1 full regression model

#Model 2 select my reduced regression model taste, wait and location

#Model 3 Full PCA model

#Model 4 Reduced PCA model with parking, taste and clean

#Model 5 FA model

#Run the Models

#Model 1 full model

regress_model_fit = smf.ols(formula = 'overall~taste+temp+freshness+wait+clean+friend+location+parking+view', data = food_df).fit()

# summary of model fit

print(regress_model_fit.summary())

#Model 2

#Note, Model 2 is a choice from looking at the correlation, you may choose a

#different selection for this if you like, just explain why

regress_model_fit = smf.ols(formula = 'overall~taste+wait+location', data = food_df).fit()

# summary of model fit

print(regress_model_fit.summary())

#Model 3

#regress the response overall on principal components

pca_model_fit = smf.ols(formula = 'overall~pca1+pca2+pca3', data = food_df).fit()

# summary of model fit

print(pca_model_fit.summary())

#Model 4

#regress the response overall on principal components

pca_model_fit = smf.ols(formula = 'overall~parking+taste+clean', data = food_df).fit()

# summary of model fit

print(pca_model_fit.summary())

#Model 5

#regress the response overall on factor scores

fa_model_fit = smf.ols(formula = 'overall~fa1+fa2+fa3', data = food_df).fit()

# summary of model fit

print(fa_model_fit.summary())

#next look at VIF to see what the full, choice, PCA and FA models did

# Break into left and right hand side; y and X then find VIF for each model

import statsmodels.formula.api as sm

from patsy import dmatrices

from statsmodels.stats.outliers_influence import variance_inflation_factor

y = food_df.loc[:,['overall']]

X = food_df.loc[:,['taste','temp','freshness','wait','clean','friend','location','parking','view']]

y, X = dmatrices('overall ~ taste+temp+freshness+wait+clean+friend+location+parking+view ', data=food_df, return_type="dataframe")

# For each Xi, calculate VIF

vif = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]

print('')

print('----- VIF for Full Regression Model -----')

print('')

print(vif)

#VIF for choice model

y = food_df.loc[:,['overall']]

X = food_df.loc[:,['taste','clean','location']]

y, X = dmatrices('overall ~ taste+clean+location ', data=food_df, return_type="dataframe")

vif = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]

print('')

print('----- VIF for Choice Model -----')

print('')

print(vif)

#VIF for PCA

y = food_df.loc[:,['overall']]

X = food_df.loc[:,['pca1','pca2','pca3']]

y, X = dmatrices('overall ~ pca1+pca2+pca3 ', data=food_df, return_type="dataframe")

vif = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]

print('')

print('----- VIF for PCA Model -----')

print('')

print(vif)

#VIF for FA

y = food_df.loc[:,['overall']]

X = food_df.loc[:,['fa1','fa2','fa3']]

y, X = dmatrices('overall ~ fa1+fa2+fa3 ', data=food_df, return_type="dataframe")

vif = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]

print('')

print('----- VIF for FA Model -----')

print('')

print(vif)

#Which model do you like best and why?

#For the full regression model sum the coefficients for each three variable

#grouping, taste, temp freshness group 1

#wait, clean, friend group 2

# location, parking, view group 3

#How do you interpret this info?

#Compare with the choice model

Homework is Completed By:

Writer Writer Name Amount Client Comments & Rating
Instant Homework Helper

ONLINE

Instant Homework Helper

$36

She helped me in last minute in a very reasonable price. She is a lifesaver, I got A+ grade in my homework, I will surely hire her again for my next assignments, Thumbs Up!

Order & Get This Solution Within 3 Hours in $25/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 3 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

Order & Get This Solution Within 6 Hours in $20/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 6 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

Order & Get This Solution Within 12 Hours in $15/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 12 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

6 writers have sent their proposals to do this homework:

Essay Writing Help
Buy Coursework Help
Quality Homework Helper
Top Grade Essay
Writer Writer Name Offer Chat
Essay Writing Help

ONLINE

Essay Writing Help

I am a qualified and experienced Writer, Researcher, Tutor, analyst and Consultant. I hold MBA (Strategic Management) (Finance and Marketing) & CPA.K (Accounting and Finance.)

$162 Chat With Writer
Buy Coursework Help

ONLINE

Buy Coursework Help

Hi dear, I am ready to do your homework in a reasonable price.

$162 Chat With Writer
Quality Homework Helper

ONLINE

Quality Homework Helper

Hi dear, I am ready to do your homework in a reasonable price.

$162 Chat With Writer
Top Grade Essay

ONLINE

Top Grade Essay

Working on this platform from a couple of time with exposure of dynamic writing skills gathered with years experience on different other websites.

$162 Chat With Writer

Let our expert academic writers to help you in achieving a+ grades in your homework, assignment, quiz or exam.

Similar Homework Questions

Order 2180770: constitutional law - Class 4 composite restoration - Pearson professional centers-parramatta nsw - Estimated variance of slope calculator - I need a response to these 2 discussions for my Marketing in a Global Environment - Atrium house of vettii - What is ch2oh in biology - Wk1/1 - 132 windermere road hamilton qld 4007 - Aa - Introducing apache hadoop the modern data operating system - Coca cola amatil vending machine overcharge - Horus and jesus parallels - What is Traffic Racer and High Way rider Mod - The bet anton chekhov setting - 3a seascape view sellicks beach - What is a reflective journal in nursing - Chain measurement in metres - Are government securities assets or liabilities - 665 wandong road wandong - Which is more precise a beaker or a buret - Cu denver course catalogue - Online course registration system class diagram - Research-based Persuasive Essay - Discussion - Practical Connection Assignment - Btu is the unit of - MGT312T Week 5 Discussion - Triage assessment form crisis intervention - Nomenclature lab - Renaissance and mannerism in cinquecento italy - The 5 process groups of project management - This is america analysis essay - Marketing Analysis - Marketing Strategy - Discussion question on Unilever's case - Results Section of the Thesis - “Health care costs are out of control in the United States, and increasing conflicts between employers and employees are likely as employers try to reduce their health benefits costs. - Social Work Professional Statement - Rail fence cipher decryption example - Using models in science teaching - Week 2 - Uc berkeley extension classes - Snagit not recording microphone - Cloud Computing - Strategic management text and cases 7th edition ebook - Chemistry research topics high school - How are drugs sorted into therapeutic groups and classes - 27 razorback road flinders - Niamerica online bill payment richland - Uber positioning map - User training - Mark kyle webber wentzel - Carlsberg strategic analysis - What does cross burning mean - STATISTICS - In Money We Trust? Documentary and Money Book Analysis - In which sentence are the italicized words a dependent clause - Macbeth observation interpretation and critique - How to make boba poppers - Social impact of covid-19 - MT355 Unit 9 Assignment: Writing and Presenting a Research Report - Amex hungary case study answers - 44 salisbury drive terrigal - Hendrich ii fall risk assessment tool - Swot analysis of cadbury - Global supply chain management simulation cell phone tips - Observational studies are sometimes referred to as natural experiments - Week Six - Correlations Exercises - Was roy lichtenstein marriedwas roy lichtenstein married - Which individual was associated with the development of daoist ideas - Collect pictures stories poems and information about maharaja ranjit singh - Hp deskjet 1510 paper jam - Bacterial identification virtual lab key - Mercurys number of moons - Con ecmu sms 1 - Elevator pitch template pdf - Assignment - Duncan and duncan walgett - Cite your sources using apa format week 5 assignment - Children's law chambers parramatta - 737 800 startup checklist - SOCW 6361 - Week 9 discussion Board - Reciprocal inhibition in healthcare - What are the three major categories of elements - Royal adelaide hospital uniform - Valley of the gods hawaii - Parallel lines and proportional parts - Evidence based project topic - Discussion - Smtp 500 line too long - Sara lee croissants discontinued - Church on the rock ocho rios - Assignment - Zero plagiarism - Professional issue - Operations security - 10 axioms of vector space - Quality associates inc case problem solution free - Journal - A perfect day for bananafish