Loading...

Messages

Proposals

Stuck in your homework and missing deadline? Get urgent help in $10/Page with 24 hours deadline

Get Urgent Writing Help In Your Essays, Assignments, Homeworks, Dissertation, Thesis Or Coursework & Achieve A+ Grades.

Privacy Guaranteed - 100% Plagiarism Free Writing - Free Turnitin Report - Professional And Experienced Writers - 24/7 Online Support

Scree plot python

07/11/2020 Client: papadok01 Deadline: 10 Days

#This assignment is a tutorial so have a bit of fun with it

#If you would like to explore some additional options give it a try

#Goal is to provide some meaningful info to the restaurant owner

#Some notes below

#Both PCA and FA provide useful summary info for multivariate data, but

#all of the original variables are needed for their calculation, so

#the big question is can we use them to find a subset of variables to

#predict overall score?

#Also,trying to give meaningful labels to components is really hard.

#When the variables are on different scales you need to work with the

#correlation matrix. For this assignment they are on same scale so

#we will work with the raw data.

#PCA only helps if the original variables are correlated, if they

#are independent PCA will not help.

#Approach takes two steps

#First step find the dimensionality of the data, that is the

#number of original variables to be retained

#Second step find which ones, more on this below

# import packages for this example

import pandas as pd

import numpy as np # arrays and math functions

import matplotlib.pyplot as plt # static plotting

from sklearn.decomposition import PCA, FactorAnalysis

import statsmodels.formula.api as smf # R-like model specification

#Set some display options

pd.set_option('display.notebook_repr_html', False)

pd.set_option('display.max_columns', 40)

pd.set_option('display.max_rows', 10)

pd.set_option('display.width', 120)

#Read in the restaurant dataset

food_df = pd.read_csv('C:/Users/Jahee Koo/Desktop/MSPA/2018_Winter_410_regression/HW03 PCA/FACTOR1.csv')

#A good step to take is to convert all variable names to lower case

food_df.columns = [s.lower() for s in food_df.columns]

print(food_df)

print('')

print('----- Summary of Input Data -----')

print('')

# show the object is a DataFrame

print('Object type: ', type(food_df))

# show number of observations in the DataFrame

print('Number of observations: ', len(food_df))

# show variable names

print('Variable names: ', food_df.columns)

# show descriptive statistics

print(food_df.describe())

# show a portion of the beginning of the DataFrame

print(food_df.head())

#look at correlation structure

cdata = food_df.loc[:,['overall','taste','temp','freshness','wait','clean','friend','location','parking','view']]

corr = cdata[cdata.columns].corr()

print(corr)

#Use the correlation matrix to help provide advice to the restaurant owner

#Look at four different models and compare them

#Which model do you think is best and why?

#Model 1 full regression model

#Model 2 select my reduced regression model taste, wait and location

#Model 3 Full PCA model

#Model 4 Reduced PCA model with parking, taste and clean

#Model 5 FA model

#First find the PCA

#Second find the FA

#Run the models

#Compare the models and show VIF for each model

#PCA

print('')

print('----- Principal Component Analysis -----')

print('')

pca_data = food_df.loc[:,['taste','temp','freshness','wait','clean','friend','location','parking','view']]

pca = PCA()

P = pca.fit(pca_data)

print(pca_data)

np.set_printoptions(threshold=np.inf)

np.around([pca.components_], decimals=3)

#Note per Everett p209 pick the three variables with the largest

#absolute coefficient on the component not already picked

#So, choose parking, taste and clean for the PCA variable reduction model

# show summary of pca solution

pca_explained_variance = pca.explained_variance_ratio_

print('Proportion of variance explained:', pca_explained_variance)

# note that principal components analysis corresponds

# to finding eigenvalues and eigenvectors of the pca_data

pca_data_cormat = np.corrcoef(pca_data.T)

eigenvalues, eigenvectors = np.linalg.eig(pca_data_cormat)

np.around([eigenvalues], decimals=3)

print('Linear algebra demonstration: Proportion of variance explained: ',

eigenvalues/eigenvalues.sum())

np.around([eigenvectors], decimals=3)

# show the plot for the pricipal component analysis

plt.bar(np.arange(len(pca_explained_variance)), pca_explained_variance,

color = 'grey', alpha = 0.5, align = 'center')

plt.title('PCA Proportion of Total Variance')

plt.show()

# show a scree plot

d = {'eigenvalues': eigenvalues }

df1 = pd.DataFrame(data=d)

df2 =pd.Series([1,2,3,4,5,6,7,8,9])

#df2 = {'factors': factors}

# merge eigenvalues with # of factors

result = pd.concat([df1, df2], axis=1, join_axes=[df2.index])

print (result)

def scat(dataframe,var1,var2):

dataframe[var2].plot()

plt.title('Scree Plot')

plt.xlabel('# of factors')

plt.ylabel('Eigenvalues')

scat(result,'0','eigenvalues')

plt.show()

# provide partial listing of variable loadings on principal components

# transpose for variables by components listing

pca_loadings = pca.components_.T

# provide full formatted listing of loadings for first three components

# print loadings while rounding to three digits

# and suppress printing of very small numbers

# but do not suppress printing of zeroes

np.set_printoptions(precision = 3, suppress = True,

formatter={'float': '{: 0.3f}'.format})

print(pca_loadings[:,0:3])

# compute full set of principal components (scores)

C = pca.transform(pca_data)

print(C)

# add first three principal component scores to the original data frame

pca_data['pca1'] = C[:,0]

pca_data['pca2'] = C[:,1]

pca_data['pca3'] = C[:,2]

print(pca_data)

# add first three principal component scores to the food_df

food_df['pca1'] = C[:,0]

food_df['pca2'] = C[:,1]

food_df['pca3'] = C[:,2]

print(food_df)

# explore relationships between pairs of principal components

# working with the first three components only

pca_scores = pca_data.loc[:,['pca1','pca2', 'pca3']]

pca_model_cormat = \

np.corrcoef(pca_scores.as_matrix().transpose()).round(decimals=3)

print(pca_model_cormat)

#Looks like that worked

#Factor Analysis

print('')

print('----- Factor Analysis (Unrotated) -----')

print('')

# assume three factors will be sufficient

# this is an unrotated orthogonal solution

# maximum likelihood estimation is employed

# for best results set tolerance low and max iterations high

fa = FactorAnalysis(n_components = 3, tol=1e-8, max_iter=1000000)

#the unrotated solution

fa.fit(pca_data)

# retrieve the factor loadings as an array of arrays

# transpose for variables by factors listing of loadings

fa_loadings = fa.components_.T

print(fa_loadings)

# show the loadings of the variables on the factors

# for the unrotated maximum likelihood solution

# print loadings while rounding to three digits

# and suppress printing of very small numbers

# but do not suppress printing of zeroes

np.set_printoptions(precision = 3, suppress = True,

formatter={'float': '{: 0.3f}'.format})

print(fa_loadings)

# compute full set of factor scores

F = fa.transform(pca_data)

print(F)

# add factor scores to the original data frame

food_df['fa1'] = F[:,0]

food_df['fa2'] = F[:,1]

food_df['fa3'] = F[:,2]

print(food_df)

#Look at five different models and compare them

#Which model do you think is best and why?

#Model 1 full regression model

#Model 2 select my reduced regression model taste, wait and location

#Model 3 Full PCA model

#Model 4 Reduced PCA model with parking, taste and clean

#Model 5 FA model

#Run the Models

#Model 1 full model

regress_model_fit = smf.ols(formula = 'overall~taste+temp+freshness+wait+clean+friend+location+parking+view', data = food_df).fit()

# summary of model fit

print(regress_model_fit.summary())

#Model 2

#Note, Model 2 is a choice from looking at the correlation, you may choose a

#different selection for this if you like, just explain why

regress_model_fit = smf.ols(formula = 'overall~taste+wait+location', data = food_df).fit()

# summary of model fit

print(regress_model_fit.summary())

#Model 3

#regress the response overall on principal components

pca_model_fit = smf.ols(formula = 'overall~pca1+pca2+pca3', data = food_df).fit()

# summary of model fit

print(pca_model_fit.summary())

#Model 4

#regress the response overall on principal components

pca_model_fit = smf.ols(formula = 'overall~parking+taste+clean', data = food_df).fit()

# summary of model fit

print(pca_model_fit.summary())

#Model 5

#regress the response overall on factor scores

fa_model_fit = smf.ols(formula = 'overall~fa1+fa2+fa3', data = food_df).fit()

# summary of model fit

print(fa_model_fit.summary())

#next look at VIF to see what the full, choice, PCA and FA models did

# Break into left and right hand side; y and X then find VIF for each model

import statsmodels.formula.api as sm

from patsy import dmatrices

from statsmodels.stats.outliers_influence import variance_inflation_factor

y = food_df.loc[:,['overall']]

X = food_df.loc[:,['taste','temp','freshness','wait','clean','friend','location','parking','view']]

y, X = dmatrices('overall ~ taste+temp+freshness+wait+clean+friend+location+parking+view ', data=food_df, return_type="dataframe")

# For each Xi, calculate VIF

vif = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]

print('')

print('----- VIF for Full Regression Model -----')

print('')

print(vif)

#VIF for choice model

y = food_df.loc[:,['overall']]

X = food_df.loc[:,['taste','clean','location']]

y, X = dmatrices('overall ~ taste+clean+location ', data=food_df, return_type="dataframe")

vif = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]

print('')

print('----- VIF for Choice Model -----')

print('')

print(vif)

#VIF for PCA

y = food_df.loc[:,['overall']]

X = food_df.loc[:,['pca1','pca2','pca3']]

y, X = dmatrices('overall ~ pca1+pca2+pca3 ', data=food_df, return_type="dataframe")

vif = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]

print('')

print('----- VIF for PCA Model -----')

print('')

print(vif)

#VIF for FA

y = food_df.loc[:,['overall']]

X = food_df.loc[:,['fa1','fa2','fa3']]

y, X = dmatrices('overall ~ fa1+fa2+fa3 ', data=food_df, return_type="dataframe")

vif = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]

print('')

print('----- VIF for FA Model -----')

print('')

print(vif)

#Which model do you like best and why?

#For the full regression model sum the coefficients for each three variable

#grouping, taste, temp freshness group 1

#wait, clean, friend group 2

# location, parking, view group 3

#How do you interpret this info?

#Compare with the choice model

Homework is Completed By:

Writer Writer Name Amount Client Comments & Rating
Instant Homework Helper

ONLINE

Instant Homework Helper

$36

She helped me in last minute in a very reasonable price. She is a lifesaver, I got A+ grade in my homework, I will surely hire her again for my next assignments, Thumbs Up!

Order & Get This Solution Within 3 Hours in $25/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 3 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

Order & Get This Solution Within 6 Hours in $20/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 6 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

Order & Get This Solution Within 12 Hours in $15/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 12 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

6 writers have sent their proposals to do this homework:

Essay Writing Help
Buy Coursework Help
Quality Homework Helper
Top Grade Essay
Writer Writer Name Offer Chat
Essay Writing Help

ONLINE

Essay Writing Help

I am a qualified and experienced Writer, Researcher, Tutor, analyst and Consultant. I hold MBA (Strategic Management) (Finance and Marketing) & CPA.K (Accounting and Finance.)

$162 Chat With Writer
Buy Coursework Help

ONLINE

Buy Coursework Help

Hi dear, I am ready to do your homework in a reasonable price.

$162 Chat With Writer
Quality Homework Helper

ONLINE

Quality Homework Helper

Hi dear, I am ready to do your homework in a reasonable price.

$162 Chat With Writer
Top Grade Essay

ONLINE

Top Grade Essay

Working on this platform from a couple of time with exposure of dynamic writing skills gathered with years experience on different other websites.

$162 Chat With Writer

Let our expert academic writers to help you in achieving a+ grades in your homework, assignment, quiz or exam.

Similar Homework Questions

Principles of modern software management - Kelly coe net worth - Write a program to calculate compound interest in c++ - Determining the chemical formula for copper gluconate - National hurricane center projected path - Dynamically continuous innovation - English speaking exam topics - Translational Research And Population Health Management - Api 6d vs api 598 - Analyzing a Qualitative Research Report - Cis 3 methyl 4 octene - Hey - Case 8 - Geometric shapes real life - Moe's southwest grill stock price - My old kentucky home roger sterling - International Trade - What happens in the cask of amontillado - Cardiac drugs for nurses - Ibis cardiff city centre churchill way - Why is foopets not free anymore - Strategies for addressing behavior problems in the classroom 6th edition - NEEDS HISTORY 1000 EXPERT!!! Early Western HISTORY - How does a hovercraft skirt work - Volkswagen ethical issues case study - BI A - Use case diagram for dental clinic system - Disability rights movement in canada timeline - An elevator mass is to be designed - Atkins or fadkins by karen e bledsoe answer key - Student with coding homework for short nyt crossword - Peaceful end of life theory - Dare to dream performance horses - Music analysis questions - Dyson core competencies - Hazell and jefferies benson - Bob bonner leaving magic 98 - Laker company reported the following january purchases - Nike supply and demand curve - Usa today innovation and evolution in a troubled industry - Bora ring by judith wright analysis - How to do ideal gas law problems - How to make an ecosystem in a 2 liter bottle - Discussion 1 10/28/2020 - Kinetics of dye fading lab answers - Walker and avant 2005 concept analysis - Measurements in biology lab answers - Mixed methods research question - EXEMPLIFICATION ESSAY DUE IN 48 HOURS - The aliens have landed at our school - Types of quadrilaterals project - Cardiff maths past papers - WEEK 1 DISCUSSION QUESTIONS - Words that rhyme with hayden - Bumping into mr ravioli analysis - Tp castt poetry analysis answers - Haywood academy show my homework - Amoeba sisters video recap alleles and genes answers - Multiplicative inverse of 3 2i - Is digital communication good or bad - Aghast used in a sentence - Weinstein and fantini model of curriculum development - Associative meaning in semantics - Map grid of australia definition - Interview Paper - Data governance committee charter - Dimplex heater timer instructions - YOUR OWN WORDS - Organizational behavior chapter 7 motivation concepts - How to write a naturalistic observation paper - Medical surgical test bank questions - The influence of the IOM report on Nursing practice - The ones who walk away from omelas discussion questions - 7 forms of energy worksheet - Uts library journal articles - Which statement is most likely correct about retirement planning - Speech Analysis - C program to print 1 2 4 8 16 - Market drayton infant school - 3820 assignment 2 - Babies r us endless earnings settlement - I need someone write 3 typed (w/1.5 spacing) pages. - Billy blue college of design review - Abelia mosanensis korean spring agm - Forensic accounting case study series deloitte - Ted talk about addiction - Theasourses - Math Quiz - What is a partial balance sheet - Maya angelou poems for elementary students - Hovells creek estate lara - Four categories of etl technologies - The equity multiplier is equal to - Sylvia plath on death - Grade 12 creative writing module - Emory sports fitness camp - West coast hifi osborne park - Mafs 912 g co 3.11 - Http www rachaelrayshow com club rr giveaways - Young vic directors programme