The objective of this Portfolio Project is mining data from a data warehouse, which contains data from the Northwind database that was constructed during your installation of PostgreSQL.
Below are the summarized tasks for this Portfolio Project.
Data Warehouse:
Create a data warehouse database, including the fact and dimension tables (star schema).
Create the schema for each table.
Populate the tables using either ETL (Pentaho) or SQL (PostgreSQL).
Preprocessing for SAS:
Extract data from the data warehouse, creating a file for input into SAS. The format of the file is your choice. Ensure SAS University Edition accepts your selected format.
Statistical Analysis Using SAS:
Import data created in the preprocessing step.
Conduct statistical analysis using the appropriate statistics from each category:
Summary statistics
Classification
Clustering
Association
Prepare an analysis report.
Using your plan prepared in Module 3, Milestone 1, and leveraging the data warehouse and preprocessing steps in Module 6, Milestone 2, complete the tasks under Statistical Analysis Using SAS.
Your analysis report must include:
An analysis of each variable in the data set
An analysis to determine which variables could serve as appropriate classifier variables
An analysis to determine if any variables are candidates for clustering
An analysis to determine if any variables have associations
Any tables, histograms, or scatterplot graphs necessary to support your analyses
A recommendation as to the suitability of this data set for meeting your organization’s business goal