Loading...

Messages

Proposals

Stuck in your homework and missing deadline? Get urgent help in $10/Page with 24 hours deadline

Get Urgent Writing Help In Your Essays, Assignments, Homeworks, Dissertation, Thesis Or Coursework & Achieve A+ Grades.

Privacy Guaranteed - 100% Plagiarism Free Writing - Free Turnitin Report - Professional And Experienced Writers - 24/7 Online Support

Principal component analysis python pandas

20/10/2021 Client: muhammad11 Deadline: 2 Day

Principal Components And Factor Analysis Using Python Code

#This assignment is a tutorial so have a bit of fun with it

#If you would like to explore some additional options give it a try

#Goal is to provide some meaningful info to the restaurant owner

#Some notes below

#Both PCA and FA provide useful summary info for multivariate data, but

#all of the original variables are needed for their calculation, so

#the big question is can we use them to find a subset of variables to

#predict overall score?

#Also,trying to give meaningful labels to components is really hard.

#When the variables are on different scales you need to work with the

#correlation matrix. For this assignment they are on same scale so

#we will work with the raw data.

#PCA only helps if the original variables are correlated, if they

#are independent PCA will not help.

#Approach takes two steps

#First step find the dimensionality of the data, that is the

#number of original variables to be retained

#Second step find which ones, more on this below

# import packages for this example

import pandas as pd

import numpy as np # arrays and math functions

import matplotlib.pyplot as plt # static plotting

from sklearn.decomposition import PCA, FactorAnalysis

import statsmodels.formula.api as smf # R-like model specification

#Set some display options

pd.set_option('display.notebook_repr_html', False)

pd.set_option('display.max_columns', 40)

pd.set_option('display.max_rows', 10)

pd.set_option('display.width', 120)

#Read in the restaurant dataset

food_df = pd.read_csv('C:/Users/Jahee Koo/Desktop/MSPA/2018_Winter_410_regression/HW03 PCA/FACTOR1.csv')

#A good step to take is to convert all variable names to lower case

food_df.columns = [s.lower() for s in food_df.columns]

print(food_df)

print('')

print('----- Summary of Input Data -----')

print('')

# show the object is a DataFrame

print('Object type: ', type(food_df))

# show number of observations in the DataFrame

print('Number of observations: ', len(food_df))

# show variable names

print('Variable names: ', food_df.columns)

# show descriptive statistics

print(food_df.describe())

# show a portion of the beginning of the DataFrame

print(food_df.head())

#look at correlation structure

cdata = food_df.loc[:,['overall','taste','temp','freshness','wait','clean','friend','location','parking','view']]

corr = cdata[cdata.columns].corr()

print(corr)

#Use the correlation matrix to help provide advice to the restaurant owner

#Look at four different models and compare them

#Which model do you think is best and why?

#Model 1 full regression model

#Model 2 select my reduced regression model taste, wait and location

#Model 3 Full PCA model

#Model 4 Reduced PCA model with parking, taste and clean

#Model 5 FA model

#First find the PCA

#Second find the FA

#Run the models

#Compare the models and show VIF for each model

#PCA

print('')

print('----- Principal Component Analysis -----')

print('')

pca_data = food_df.loc[:,['taste','temp','freshness','wait','clean','friend','location','parking','view']]

pca = PCA()

P = pca.fit(pca_data)

print(pca_data)

np.set_printoptions(threshold=np.inf)

np.around([pca.components_], decimals=3)

#Note per Everett p209 pick the three variables with the largest

#absolute coefficient on the component not already picked

#So, choose parking, taste and clean for the PCA variable reduction model

# show summary of pca solution

pca_explained_variance = pca.explained_variance_ratio_

print('Proportion of variance explained:', pca_explained_variance)

# note that principal components analysis corresponds

# to finding eigenvalues and eigenvectors of the pca_data

pca_data_cormat = np.corrcoef(pca_data.T)

eigenvalues, eigenvectors = np.linalg.eig(pca_data_cormat)

np.around([eigenvalues], decimals=3)

print('Linear algebra demonstration: Proportion of variance explained: ',

eigenvalues/eigenvalues.sum())

np.around([eigenvectors], decimals=3)

# show the plot for the pricipal component analysis

plt.bar(np.arange(len(pca_explained_variance)), pca_explained_variance,

color = 'grey', alpha = 0.5, align = 'center')

plt.title('PCA Proportion of Total Variance')

plt.show()

# show a scree plot

d = {'eigenvalues': eigenvalues }

df1 = pd.DataFrame(data=d)

df2 =pd.Series([1,2,3,4,5,6,7,8,9])

#df2 = {'factors': factors}

# merge eigenvalues with # of factors

result = pd.concat([df1, df2], axis=1, join_axes=[df2.index])

print (result)

def scat(dataframe,var1,var2):

dataframe[var2].plot()

plt.title('Scree Plot')

plt.xlabel('# of factors')

plt.ylabel('Eigenvalues')

scat(result,'0','eigenvalues')

plt.show()

# provide partial listing of variable loadings on principal components

# transpose for variables by components listing

pca_loadings = pca.components_.T

# provide full formatted listing of loadings for first three components

# print loadings while rounding to three digits

# and suppress printing of very small numbers

# but do not suppress printing of zeroes

np.set_printoptions(precision = 3, suppress = True,

formatter={'float': '{: 0.3f}'.format})

print(pca_loadings[:,0:3])

# compute full set of principal components (scores)

C = pca.transform(pca_data)

print(C)

# add first three principal component scores to the original data frame

pca_data['pca1'] = C[:,0]

pca_data['pca2'] = C[:,1]

pca_data['pca3'] = C[:,2]

print(pca_data)

# add first three principal component scores to the food_df

food_df['pca1'] = C[:,0]

food_df['pca2'] = C[:,1]

food_df['pca3'] = C[:,2]

print(food_df)

# explore relationships between pairs of principal components

# working with the first three components only

pca_scores = pca_data.loc[:,['pca1','pca2', 'pca3']]

pca_model_cormat = \

np.corrcoef(pca_scores.as_matrix().transpose()).round(decimals=3)

print(pca_model_cormat)

#Looks like that worked

#Factor Analysis

print('')

print('----- Factor Analysis (Unrotated) -----')

print('')

# assume three factors will be sufficient

# this is an unrotated orthogonal solution

# maximum likelihood estimation is employed

# for best results set tolerance low and max iterations high

fa = FactorAnalysis(n_components = 3, tol=1e-8, max_iter=1000000)

#the unrotated solution

fa.fit(pca_data)

# retrieve the factor loadings as an array of arrays

# transpose for variables by factors listing of loadings

fa_loadings = fa.components_.T

print(fa_loadings)

# show the loadings of the variables on the factors

# for the unrotated maximum likelihood solution

# print loadings while rounding to three digits

# and suppress printing of very small numbers

# but do not suppress printing of zeroes

np.set_printoptions(precision = 3, suppress = True,

formatter={'float': '{: 0.3f}'.format})

print(fa_loadings)

# compute full set of factor scores

F = fa.transform(pca_data)

print(F)

# add factor scores to the original data frame

food_df['fa1'] = F[:,0]

food_df['fa2'] = F[:,1]

food_df['fa3'] = F[:,2]

print(food_df)

#Look at five different models and compare them

#Which model do you think is best and why?

#Model 1 full regression model

#Model 2 select my reduced regression model taste, wait and location

#Model 3 Full PCA model

#Model 4 Reduced PCA model with parking, taste and clean

#Model 5 FA model

#Run the Models

#Model 1 full model

regress_model_fit = smf.ols(formula = 'overall~taste+temp+freshness+wait+clean+friend+location+parking+view', data = food_df).fit()

# summary of model fit

print(regress_model_fit.summary())

#Model 2

#Note, Model 2 is a choice from looking at the correlation, you may choose a

#different selection for this if you like, just explain why

regress_model_fit = smf.ols(formula = 'overall~taste+wait+location', data = food_df).fit()

# summary of model fit

print(regress_model_fit.summary())

#Model 3

#regress the response overall on principal components

pca_model_fit = smf.ols(formula = 'overall~pca1+pca2+pca3', data = food_df).fit()

# summary of model fit

print(pca_model_fit.summary())

#Model 4

#regress the response overall on principal components

pca_model_fit = smf.ols(formula = 'overall~parking+taste+clean', data = food_df).fit()

# summary of model fit

print(pca_model_fit.summary())

#Model 5

#regress the response overall on factor scores

fa_model_fit = smf.ols(formula = 'overall~fa1+fa2+fa3', data = food_df).fit()

# summary of model fit

print(fa_model_fit.summary())

#next look at VIF to see what the full, choice, PCA and FA models did

# Break into left and right hand side; y and X then find VIF for each model

import statsmodels.formula.api as sm

from patsy import dmatrices

from statsmodels.stats.outliers_influence import variance_inflation_factor

y = food_df.loc[:,['overall']]

X = food_df.loc[:,['taste','temp','freshness','wait','clean','friend','location','parking','view']]

y, X = dmatrices('overall ~ taste+temp+freshness+wait+clean+friend+location+parking+view ', data=food_df, return_type="dataframe")

# For each Xi, calculate VIF

vif = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]

print('')

print('----- VIF for Full Regression Model -----')

print('')

print(vif)

#VIF for choice model

y = food_df.loc[:,['overall']]

X = food_df.loc[:,['taste','clean','location']]

y, X = dmatrices('overall ~ taste+clean+location ', data=food_df, return_type="dataframe")

vif = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]

print('')

print('----- VIF for Choice Model -----')

print('')

print(vif)

#VIF for PCA

y = food_df.loc[:,['overall']]

X = food_df.loc[:,['pca1','pca2','pca3']]

y, X = dmatrices('overall ~ pca1+pca2+pca3 ', data=food_df, return_type="dataframe")

vif = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]

print('')

print('----- VIF for PCA Model -----')

print('')

print(vif)

#VIF for FA

y = food_df.loc[:,['overall']]

X = food_df.loc[:,['fa1','fa2','fa3']]

y, X = dmatrices('overall ~ fa1+fa2+fa3 ', data=food_df, return_type="dataframe")

vif = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]

print('')

print('----- VIF for FA Model -----')

print('')

print(vif)

#Which model do you like best and why?

#For the full regression model sum the coefficients for each three variable

#grouping, taste, temp freshness group 1

#wait, clean, friend group 2

# location, parking, view group 3

#How do you interpret this info?

#Compare with the choice model

Homework is Completed By:

Writer Writer Name Amount Client Comments & Rating
Instant Homework Helper

ONLINE

Instant Homework Helper

$36

She helped me in last minute in a very reasonable price. She is a lifesaver, I got A+ grade in my homework, I will surely hire her again for my next assignments, Thumbs Up!

Order & Get This Solution Within 3 Hours in $25/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 3 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

Order & Get This Solution Within 6 Hours in $20/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 6 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

Order & Get This Solution Within 12 Hours in $15/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 12 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

6 writers have sent their proposals to do this homework:

George M.
Finance Homework Help
Professional Coursework Help
Engineering Help
Engineering Mentor
Financial Hub
Writer Writer Name Offer Chat
George M.

ONLINE

George M.

I have read your project details and I can provide you QUALITY WORK within your given timeline and budget.

$30 Chat With Writer
Finance Homework Help

ONLINE

Finance Homework Help

I am an elite class writer with more than 6 years of experience as an academic writer. I will provide you the 100 percent original and plagiarism-free content.

$20 Chat With Writer
Professional Coursework Help

ONLINE

Professional Coursework Help

As an experienced writer, I have extensive experience in business writing, report writing, business profile writing, writing business reports and business plans for my clients.

$50 Chat With Writer
Engineering Help

ONLINE

Engineering Help

I will provide you with the well organized and well research papers from different primary and secondary sources will write the content that will support your points.

$18 Chat With Writer
Engineering Mentor

ONLINE

Engineering Mentor

Being a Ph.D. in the Business field, I have been doing academic writing for the past 7 years and have a good command over writing research papers, essay, dissertations and all kinds of academic writing and proofreading.

$46 Chat With Writer
Financial Hub

ONLINE

Financial Hub

I have assisted scholars, business persons, startups, entrepreneurs, marketers, managers etc in their, pitches, presentations, market research, business plans etc.

$16 Chat With Writer

Let our expert academic writers to help you in achieving a+ grades in your homework, assignment, quiz or exam.

Similar Homework Questions

Auditing - The ten commandments catholic - Mark 8 29 worksheet gcu - Google and motorola merger - The greatest baroque church musician (composer) was - Is liz claiborne a devil worshipper - Vosburgh electronics corporation balance sheet - Managing training and development 7th edition - #3 - Learning focused strategies lesson plan template - 4th grade mission project - Exercise based on the opening text in thanks a million - Australian army dress manual - Object oriented programming monash - Isotopes and atomic mass phet answers - The road to hell case study answers - Given the network plan that follows compute the early - A student titrates potassium hydrogen phthalate - 6es7 392 1am00 0aa0 wiring - Tutor - Ben rhydding primary school - Ihs global alliance products - How does the correctional system punish offenders - Law & Ethics in the Business Environment - Which god helps gilgamesh and enkidu kill humbaba the terrible, guardian of the cedar forest? - What is the consensus model in criminal justice - Calculate the ph of a solution made by mixing - How does iscsi handle the process of authentication - City of stirling r codes - America History Discussion - Medical assistant code of ethics pdf - I need 3 pages Case Study on Senior Management. - Identify the structure for the following nmr spectrum - Reflection Paper - Just one of those days limp bizkit unedited - Rsoft photonics cad suite - History of forensic botany - 500words - 1766 l32bxba analog input - Temporary accounts include all of the following except - 2015 vce methods exam 2 - Bridge rectifier output voltage formula - Perceptual set definition psychology - Ashbeck water eden valley - Specific and latent heat worksheet answers - Macbeth we have scorched the snake - BLAW - Saxophone woodwind family instruments - Studies of religion syllabus stage 6 - Facial massage movements diagram - How to work out conjugate acid base pairs - 3 contactor bypass schematic - Down by the lane where the watermelons grow - Intensive distribution exclusive distribution selective distribution franchising - Colour symbol image template - Outside Reading - Autozone case study - Colonial first state upload documents - F 02 in sap pdf - Common cross stitch mistakes - Haghill park primary school - Variable costing unit product cost - Assignment 2 block business letter - Which of the following statements about crafting a strategy to be competitively successful - Unified communications at boeing case study - Why is tennis scored 15 30 40 - Code 61 62 flange dimensions - Geometry carnival games - Kevin bridges meets chad hogan - A manufacturing firm would begin preparation of its master budget by constructing a: - Money making nip straighten out my jewelry - Redox titration real life application - From net earnings of $740 per month - Pat steir the brueghel series a vanitas of style - Introduction to assembly language ppt - Ccis d2l - Heart i want to make love to you - Proving geometric theorems worksheet - Protein synthesis & amino acid worksheet - How many miles per second does light travel - Probability and statistics homework answers - Southwest airlines tangible and intangible resources - Portfolio Project - Block method vs point by point - Frequency response of transistor amplifier - Bbc bitesize maths calculator - Old spice brand image - AirBnB Case Study Outline - Is everyone really equal review - Nrs 490 capstone project change proposal - Explore the best ringtone options for your phone - What is a conditional equation - Greek numerical prefixes in chemistry - One brain or two - MARISSA JONES ONLY!!!! - How to factorise cubics - King edward medical college punjab university - Determinants of group behaviour ppt - Powerpoint presentation - In rumelt's work, the final broad test of strategy is its