Loading...

Messages

Proposals

Stuck in your homework and missing deadline? Get urgent help in $10/Page with 24 hours deadline

Get Urgent Writing Help In Your Essays, Assignments, Homeworks, Dissertation, Thesis Or Coursework & Achieve A+ Grades.

Privacy Guaranteed - 100% Plagiarism Free Writing - Free Turnitin Report - Professional And Experienced Writers - 24/7 Online Support

Principal component analysis python pandas

20/10/2021 Client: muhammad11 Deadline: 2 Day

Principal Components And Factor Analysis Using Python Code

#This assignment is a tutorial so have a bit of fun with it

#If you would like to explore some additional options give it a try

#Goal is to provide some meaningful info to the restaurant owner

#Some notes below

#Both PCA and FA provide useful summary info for multivariate data, but

#all of the original variables are needed for their calculation, so

#the big question is can we use them to find a subset of variables to

#predict overall score?

#Also,trying to give meaningful labels to components is really hard.

#When the variables are on different scales you need to work with the

#correlation matrix. For this assignment they are on same scale so

#we will work with the raw data.

#PCA only helps if the original variables are correlated, if they

#are independent PCA will not help.

#Approach takes two steps

#First step find the dimensionality of the data, that is the

#number of original variables to be retained

#Second step find which ones, more on this below

# import packages for this example

import pandas as pd

import numpy as np # arrays and math functions

import matplotlib.pyplot as plt # static plotting

from sklearn.decomposition import PCA, FactorAnalysis

import statsmodels.formula.api as smf # R-like model specification

#Set some display options

pd.set_option('display.notebook_repr_html', False)

pd.set_option('display.max_columns', 40)

pd.set_option('display.max_rows', 10)

pd.set_option('display.width', 120)

#Read in the restaurant dataset

food_df = pd.read_csv('C:/Users/Jahee Koo/Desktop/MSPA/2018_Winter_410_regression/HW03 PCA/FACTOR1.csv')

#A good step to take is to convert all variable names to lower case

food_df.columns = [s.lower() for s in food_df.columns]

print(food_df)

print('')

print('----- Summary of Input Data -----')

print('')

# show the object is a DataFrame

print('Object type: ', type(food_df))

# show number of observations in the DataFrame

print('Number of observations: ', len(food_df))

# show variable names

print('Variable names: ', food_df.columns)

# show descriptive statistics

print(food_df.describe())

# show a portion of the beginning of the DataFrame

print(food_df.head())

#look at correlation structure

cdata = food_df.loc[:,['overall','taste','temp','freshness','wait','clean','friend','location','parking','view']]

corr = cdata[cdata.columns].corr()

print(corr)

#Use the correlation matrix to help provide advice to the restaurant owner

#Look at four different models and compare them

#Which model do you think is best and why?

#Model 1 full regression model

#Model 2 select my reduced regression model taste, wait and location

#Model 3 Full PCA model

#Model 4 Reduced PCA model with parking, taste and clean

#Model 5 FA model

#First find the PCA

#Second find the FA

#Run the models

#Compare the models and show VIF for each model

#PCA

print('')

print('----- Principal Component Analysis -----')

print('')

pca_data = food_df.loc[:,['taste','temp','freshness','wait','clean','friend','location','parking','view']]

pca = PCA()

P = pca.fit(pca_data)

print(pca_data)

np.set_printoptions(threshold=np.inf)

np.around([pca.components_], decimals=3)

#Note per Everett p209 pick the three variables with the largest

#absolute coefficient on the component not already picked

#So, choose parking, taste and clean for the PCA variable reduction model

# show summary of pca solution

pca_explained_variance = pca.explained_variance_ratio_

print('Proportion of variance explained:', pca_explained_variance)

# note that principal components analysis corresponds

# to finding eigenvalues and eigenvectors of the pca_data

pca_data_cormat = np.corrcoef(pca_data.T)

eigenvalues, eigenvectors = np.linalg.eig(pca_data_cormat)

np.around([eigenvalues], decimals=3)

print('Linear algebra demonstration: Proportion of variance explained: ',

eigenvalues/eigenvalues.sum())

np.around([eigenvectors], decimals=3)

# show the plot for the pricipal component analysis

plt.bar(np.arange(len(pca_explained_variance)), pca_explained_variance,

color = 'grey', alpha = 0.5, align = 'center')

plt.title('PCA Proportion of Total Variance')

plt.show()

# show a scree plot

d = {'eigenvalues': eigenvalues }

df1 = pd.DataFrame(data=d)

df2 =pd.Series([1,2,3,4,5,6,7,8,9])

#df2 = {'factors': factors}

# merge eigenvalues with # of factors

result = pd.concat([df1, df2], axis=1, join_axes=[df2.index])

print (result)

def scat(dataframe,var1,var2):

dataframe[var2].plot()

plt.title('Scree Plot')

plt.xlabel('# of factors')

plt.ylabel('Eigenvalues')

scat(result,'0','eigenvalues')

plt.show()

# provide partial listing of variable loadings on principal components

# transpose for variables by components listing

pca_loadings = pca.components_.T

# provide full formatted listing of loadings for first three components

# print loadings while rounding to three digits

# and suppress printing of very small numbers

# but do not suppress printing of zeroes

np.set_printoptions(precision = 3, suppress = True,

formatter={'float': '{: 0.3f}'.format})

print(pca_loadings[:,0:3])

# compute full set of principal components (scores)

C = pca.transform(pca_data)

print(C)

# add first three principal component scores to the original data frame

pca_data['pca1'] = C[:,0]

pca_data['pca2'] = C[:,1]

pca_data['pca3'] = C[:,2]

print(pca_data)

# add first three principal component scores to the food_df

food_df['pca1'] = C[:,0]

food_df['pca2'] = C[:,1]

food_df['pca3'] = C[:,2]

print(food_df)

# explore relationships between pairs of principal components

# working with the first three components only

pca_scores = pca_data.loc[:,['pca1','pca2', 'pca3']]

pca_model_cormat = \

np.corrcoef(pca_scores.as_matrix().transpose()).round(decimals=3)

print(pca_model_cormat)

#Looks like that worked

#Factor Analysis

print('')

print('----- Factor Analysis (Unrotated) -----')

print('')

# assume three factors will be sufficient

# this is an unrotated orthogonal solution

# maximum likelihood estimation is employed

# for best results set tolerance low and max iterations high

fa = FactorAnalysis(n_components = 3, tol=1e-8, max_iter=1000000)

#the unrotated solution

fa.fit(pca_data)

# retrieve the factor loadings as an array of arrays

# transpose for variables by factors listing of loadings

fa_loadings = fa.components_.T

print(fa_loadings)

# show the loadings of the variables on the factors

# for the unrotated maximum likelihood solution

# print loadings while rounding to three digits

# and suppress printing of very small numbers

# but do not suppress printing of zeroes

np.set_printoptions(precision = 3, suppress = True,

formatter={'float': '{: 0.3f}'.format})

print(fa_loadings)

# compute full set of factor scores

F = fa.transform(pca_data)

print(F)

# add factor scores to the original data frame

food_df['fa1'] = F[:,0]

food_df['fa2'] = F[:,1]

food_df['fa3'] = F[:,2]

print(food_df)

#Look at five different models and compare them

#Which model do you think is best and why?

#Model 1 full regression model

#Model 2 select my reduced regression model taste, wait and location

#Model 3 Full PCA model

#Model 4 Reduced PCA model with parking, taste and clean

#Model 5 FA model

#Run the Models

#Model 1 full model

regress_model_fit = smf.ols(formula = 'overall~taste+temp+freshness+wait+clean+friend+location+parking+view', data = food_df).fit()

# summary of model fit

print(regress_model_fit.summary())

#Model 2

#Note, Model 2 is a choice from looking at the correlation, you may choose a

#different selection for this if you like, just explain why

regress_model_fit = smf.ols(formula = 'overall~taste+wait+location', data = food_df).fit()

# summary of model fit

print(regress_model_fit.summary())

#Model 3

#regress the response overall on principal components

pca_model_fit = smf.ols(formula = 'overall~pca1+pca2+pca3', data = food_df).fit()

# summary of model fit

print(pca_model_fit.summary())

#Model 4

#regress the response overall on principal components

pca_model_fit = smf.ols(formula = 'overall~parking+taste+clean', data = food_df).fit()

# summary of model fit

print(pca_model_fit.summary())

#Model 5

#regress the response overall on factor scores

fa_model_fit = smf.ols(formula = 'overall~fa1+fa2+fa3', data = food_df).fit()

# summary of model fit

print(fa_model_fit.summary())

#next look at VIF to see what the full, choice, PCA and FA models did

# Break into left and right hand side; y and X then find VIF for each model

import statsmodels.formula.api as sm

from patsy import dmatrices

from statsmodels.stats.outliers_influence import variance_inflation_factor

y = food_df.loc[:,['overall']]

X = food_df.loc[:,['taste','temp','freshness','wait','clean','friend','location','parking','view']]

y, X = dmatrices('overall ~ taste+temp+freshness+wait+clean+friend+location+parking+view ', data=food_df, return_type="dataframe")

# For each Xi, calculate VIF

vif = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]

print('')

print('----- VIF for Full Regression Model -----')

print('')

print(vif)

#VIF for choice model

y = food_df.loc[:,['overall']]

X = food_df.loc[:,['taste','clean','location']]

y, X = dmatrices('overall ~ taste+clean+location ', data=food_df, return_type="dataframe")

vif = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]

print('')

print('----- VIF for Choice Model -----')

print('')

print(vif)

#VIF for PCA

y = food_df.loc[:,['overall']]

X = food_df.loc[:,['pca1','pca2','pca3']]

y, X = dmatrices('overall ~ pca1+pca2+pca3 ', data=food_df, return_type="dataframe")

vif = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]

print('')

print('----- VIF for PCA Model -----')

print('')

print(vif)

#VIF for FA

y = food_df.loc[:,['overall']]

X = food_df.loc[:,['fa1','fa2','fa3']]

y, X = dmatrices('overall ~ fa1+fa2+fa3 ', data=food_df, return_type="dataframe")

vif = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]

print('')

print('----- VIF for FA Model -----')

print('')

print(vif)

#Which model do you like best and why?

#For the full regression model sum the coefficients for each three variable

#grouping, taste, temp freshness group 1

#wait, clean, friend group 2

# location, parking, view group 3

#How do you interpret this info?

#Compare with the choice model

Homework is Completed By:

Writer Writer Name Amount Client Comments & Rating
Instant Homework Helper

ONLINE

Instant Homework Helper

$36

She helped me in last minute in a very reasonable price. She is a lifesaver, I got A+ grade in my homework, I will surely hire her again for my next assignments, Thumbs Up!

Order & Get This Solution Within 3 Hours in $25/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 3 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

Order & Get This Solution Within 6 Hours in $20/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 6 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

Order & Get This Solution Within 12 Hours in $15/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 12 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

6 writers have sent their proposals to do this homework:

George M.
Finance Homework Help
Professional Coursework Help
Engineering Help
Engineering Mentor
Financial Hub
Writer Writer Name Offer Chat
George M.

ONLINE

George M.

I have read your project details and I can provide you QUALITY WORK within your given timeline and budget.

$30 Chat With Writer
Finance Homework Help

ONLINE

Finance Homework Help

I am an elite class writer with more than 6 years of experience as an academic writer. I will provide you the 100 percent original and plagiarism-free content.

$20 Chat With Writer
Professional Coursework Help

ONLINE

Professional Coursework Help

As an experienced writer, I have extensive experience in business writing, report writing, business profile writing, writing business reports and business plans for my clients.

$50 Chat With Writer
Engineering Help

ONLINE

Engineering Help

I will provide you with the well organized and well research papers from different primary and secondary sources will write the content that will support your points.

$18 Chat With Writer
Engineering Mentor

ONLINE

Engineering Mentor

Being a Ph.D. in the Business field, I have been doing academic writing for the past 7 years and have a good command over writing research papers, essay, dissertations and all kinds of academic writing and proofreading.

$46 Chat With Writer
Financial Hub

ONLINE

Financial Hub

I have assisted scholars, business persons, startups, entrepreneurs, marketers, managers etc in their, pitches, presentations, market research, business plans etc.

$16 Chat With Writer

Let our expert academic writers to help you in achieving a+ grades in your homework, assignment, quiz or exam.

Similar Homework Questions

Unitary executive theory john yoo - Coles kellyville plaza - Informatics and nursing opportunities and challenges 5th edition test bank - Student parking salford university - Martinez company's relevant range of production - Shark tank pitch script - Strategic management text and cases 7th edition ebook - Security architecture 10 - Wk 3, IOP 480: Cultural Assessment - Show and tell clean - Four generic architectural components of a public communications network - Willington hall riding centre - Sci 256 environmental science and human population worksheet - Engages in some pregame banter crossword - Case study - Mass media ethics case studies - A place for zero activities - Caravan and camping show nambour - Ge tetra contour installation instructions - In search of respect selling crack in el barrio ebook - The espresso lane to global markets cage analysis - Djd brick & blocklaying - Discussion - Tv1 on demand emmerdale - Mk wiring accessories catalogue - Methods for discussing ethical disagreements productively - **Due Oct 12th** by midnight (EST) NO MORE THAN 2100 words APA Format - Gpo box 9898 in your capital city - Introduction to the microscope lab activity answers - Muslim flight attendant refuses to serve alcohol - Unit 5 Discussion - What substance in msa confers selectivity? why? - Worley parsons melbourne office - Microsoft security development lifecycle - What role did lehman's executives play in the company's collapse - Using Online Class Services to Upskill: Finding the Right Courses for Professional Development - Vocabulary rubric for words - Er diagram for restaurant management system with explanation - Restaurant pro forma income statement - 10 ml volumetric flask uncertainty - St peter and paul northfields - Pneumatic vs mechanical cistern - American university computer science - 4 3 milestone one final project part i - Qqq - Example of a character sketch bible study - Problem solving - The no show consultant case study - How to calculate investment in a closed economy - Acara australian curriculum science - Coca cola company strategy implementation - Geographic Crime Mapping - Cba everyday offset account - Uranium has two isotopes of masses - As nzs 3000 2018 pdf free download - Saltwater light bulb experiment - Hello love florence warner guitar chords - Jamie eason live fit meal plan - Morphology questions and answers - Elliot water by the spoonful - Kks power plant classification system pdf - Butl_ Learnign Feamework - Quality concepts course - Accounting Principle Case Study - Give up and settle nigahiga - Penny from heaven chapter summary - Bathroom floor slope to drain in australia - Fun home alison bechdel pdf free - Best Practices - Angela bulloch betaville - Va fiduciary hub columbia sc phone number - How to multiply algebraic fractions - What effect does the contrasedative have on mildred - NIST CSF compliant. - Bromination of acetanilide limiting reagent - Qualitative analysis of group 1 cations conclusion - Instinct theory of motivation pdf - Scm globe simulation - Terrorism and bitcoin - Com 100 exam 1 answers - Construction drawing grid lines - Five myths and realities about zero based budgeting - 971 52 country code - The norm of voluntary participation threatens the social research goal of generalizability. - Swinburne online teaching periods 2021 - Portfolio Essay - A steel strip emerges from the hot roll - Application Security - ADEL - Physiology mcq with answers - Post mortem care documentation - Dob in a tax cheat ato - How to cite nist special publication in apa - Difference matters communicating social identity - Latex registered trademark symbols - Pentaho advantages and disadvantages - Boeing turnover rate - Epic mycenturahealth mch billing guest pay - The three mrs wright's - Complementary and Alternative Health Care