Deliverables: Two Files: (1) Submit this report with answers to all questions including
output screenshots. (2) Submit an R script that contains all commands with comments that briefly describe each commands purpose.
Run an exercise on the credit approval dataset, completing this report and providing the commands, output screenshots, and discussion/interpretation as requested. Ensure that all commands are saved in this report and in an R script.
a. Introduction: Describe the expected output and behavior of the Apriori method and
what it will accomplish for the credit approval data. Use the knowledge gained
through the tutorial and lectures. Provide a one-paragraph, masters level
response. (80-120 words)
b. Data Pre-Processing: Load the credit approval data into R Studio using the read.csv
i. What data pre-processing does the Apriori method require for the credit
approval data? Include the commands you ran and the output screenshots.
Command: >
c. Apriori Method – Default Parameters:
i. Run the Apriori method with the default parameters and store the generated
rules in a variable called ‘rules’. Include the command, the output
screenshot, and provide a one-paragraph, masters-level discussion of the
returned rules and the default arguments. (8-120 words)
Command: >
ii. Run the inspect command to display the first 10 rules. Include the command,
the output screenshot, and interpretation of the returned rules and metrics.
Command: >
d. Apriori Method – Two Runs with Non-Default Parameters using different
combinations of confidence, support, and minimum length values. For each run,
specify the input parameters used (include the command, output screenshot, and
discuss the rules returned) and then run the inspect command to preview the first 10
rules (include the command, output screenshot, and discuss which of the returned
rules is the strongest and why):
i. 1st Non-Default Parameter Run:
Command: >
Inspect Command: >
ii. 2nd Non-Default Parameter Run:
Command: >
Inspect Command: >
iii. How does changing the confidence, support, and minimum length values
affect the returned rules? Provide a one-paragraph, masters-level response. (80-120 words)
iv. What are the differences between the confidence, support, and lift metrics
for identifying the strongest rules? (80-120 words)
e. Apriori Method – Include only rules that have class=’+’ or class=’-’ on the righthand
side. Store the output in the variable ‘rules’. (Hint: See the generating rules
for the specified item sets section of the week 3 tutorial. You may need to adjust the
confidence and support values). Include the command and output screenshot:
Command: >
i. Run the inspect command to preview the first 10 rules. Include the
command, output screenshot, a discussion identifying the strongest rules. (60-100 words)
Command: >
ii. What do the returned rules suggest about the credit approval decision? Even
though the attribute names are abstracted, provide a discussion about how
the rules with abstracted attribute names, the strongest rules, and any other
factors you wish to include positively or negatively impacted the credit
approval decision (i.e. class=’+’ and class=’-’). Provide a one-paragraph,
masters-level response. (80-120 words)
f. Apriori Method – Prune the Returned Rules:
i. Why do we as data analysts prune the rules returned by the Apriori method? (80-120 words)
ii. Run the following commands on the variable ‘rules’ (generated above) to find the redundant rules. Include the output screenshot from the
commands and provide a discussion about the rules that were removed.
> rules.sorted<-sort(rules, by = “lift”)
> inspect(rules.sorted)
> subset.matrix<-is.subset(rules.sorted, rules.sorted)
> subset.matrix[lower.tri(subset.matrix, diag=T)]<-NA
> redundant<-colSums(subset.matrix, na.rm=T)>=1
> which(redundant)
iii. Run the following commands to remove the redundant rules and display the
remaining rules. Include the commands, the output screenshot, and a
discussion on which rules remain.
> rules.pruned<-rules.sorted[!redundant]
> inspect(rules.pruned)
Output :
Discussion :
g. Rules Visualization – Choose any visualization method discussed in the tutorial to
visualize the pruned rules in the previous step. Provide the command, output
screenshot of the visualization, and a discussion about how effectively the plot
represents the rules and the metrics for ranking rules.
Command: >
i. Why do we consider more than one metric to identify the strongest rules?
Provide a one-paragraph, masters-level response. (80-120 words)
ii. Which part of this exercise did you find the most challenging
and why? What approach did you take the resolve the challenge?
iii. An example of page 13-14 of the week 3 tutorial shows how to
display additional rule interest measurements including leverage, conviction,
and coverage. What additional information do these metrics provide about
the rules and why would we consider them? Check this link out for more
information: Additional Metrics for Association Rules