Loading...

Messages

Proposals

Stuck in your homework and missing deadline? Get urgent help in $10/Page with 24 hours deadline

Get Urgent Writing Help In Your Essays, Assignments, Homeworks, Dissertation, Thesis Or Coursework & Achieve A+ Grades.

Privacy Guaranteed - 100% Plagiarism Free Writing - Free Turnitin Report - Professional And Experienced Writers - 24/7 Online Support

Apriori property

17/12/2020 Client: saad24vbs Deadline: 7 Days

Data Science and Big Data Analytics


Chapter 5: Advanced Analytical Theory and Methods: Association Rules


1


Chapter Sections


5.1 Overview


5.2 Apriori Algorithm


5.3 Evaluation of Candidate Rules


5.4 Example: Transactions in a Grocery Store


5.5 Validation and Testing


5.6 Diagnostics


2


5.1 Overview


Association rules method


Unsupervised learning method


Descriptive (not predictive) method


Used to find hidden relationships in data


The relationships are represented as rules


Questions association rules might answer


Which products tend to be purchased together


What products do similar customers tend to buy


3


5.1 Overview


Example – general logic of association rules


4


5.1 Overview


Rules have the form X -> Y


When X is observed, Y is also observed


Itemset


Collection of items or entities


k-itemset = {item 1, item 2,…,item k}


Examples


Items purchased in one transaction


Set of hyperlinks clicked by a user in one session


5


5.1 Overview – Apriori Algorithm


Apriori is the most fundamental algorithm


Given itemset L, support of L is the percent of transactions that contain L


Frequent itemset – items appear together “often enough”


Minimum support defines “often enough” (% transactions)


If an itemset is frequent, then any subset is frequent


6


5.1 Overview – Apriori Algorithm


If {B,C,D} frequent, then all subsets frequent


7


5.2 Apriori Algorithm Frequent = minimum support


Bottom-up iterative algorithm


Identify the frequent (min support) 1-itemsets


Frequent 1-itemsets are paired into 2-itemsets, and the frequent 2-itemsets are identified, etc.


Definitions for next slide


D = transaction database


d = minimum support threshold


N = maximum length of itemset (optional parameter)


Ck = set of candidate k-itemsets


Lk = set of k-itemsets with minimum support


8


5.2 Apriori Algorithm


9


5.3 Evaluation of Candidate Rules Confidence


Frequent itemsets can form candidate rules


Confidence measures the certainty of a rule


Minimum confidence – predefined threshold


Problem with confidence


Given a rule X->Y, confidence considers only the antecedent (X) and the co-occurrence of X and Y


Cannot tell if a rule contains true implication


10


5.3 Evaluation of Candidate Rules Lift


Lift measures how much more often X and Y occur together than expected if statistically independent


Lift = 1 if X and Y are statistically independent


Lift > 1 indicates the degree of usefulness of the rule


Example – in 1000 transactions,


If {milk, eggs} appears in 300, {milk} in 500, and {eggs} in 400, then Lift(milk->eggs) = 0.3/(0.5*0.4) = 1.5


If {milk, bread} appears in 400, {milk} in 500, and {bread} in 400, then Lift(milk->bread) = 0.4/(0.5*0.4) = 2.0


11


5.3 Evaluation of Candidate Rules Leverage


Leverage measures the difference in the probability of X and Y appearing together compared to statistical independence


Leverage = 0 if X and Y are statistically independent


Leverage > 0 indicates degree of usefulness of rule


Example – in 1000 transactions,


If {milk, eggs} appears in 300, {milk} in 500, and {eggs} in 400, then Leverage(milk->eggs) = 0.3 - 0.5*0.4 = 0.1


If {milk, bread} appears in 400, {milk} in 500, and {bread} in 400, then Leverage (milk->bread) = 0.4 - 0.5*0.4 = 0.2


12


5.4 Applications of Association Rules


The term market basket analysis refers to a specific implementation of association rules


For better merchandising – products to include/exclude from inventory each month


Placement of products within related products


Association rules also used for


Recommender systems – Amazon, Netflix


Clickstream analysis from web usage log files


Website visitors to page X click on links A,B,C more than on links D,E,F


13


5.5 Example: Grocery Store Transactions 5.5.1 The Groceries Dataset


Packages -> Install -> arules, arulesViz # don’t enter next line


> install.packages(c("arules", "arulesViz")) # appears on console


> library('arules')


> library('arulesViz')


> data(Groceries)


> summary(Groceries) # indicates 9835 rows


Class of dataset Groceries is transactions, containing 3 slots


transactionInfo # data frame with vectors having length of transactions


itemInfo # data frame storing item labels


data # binary evidence matrix of labels in transactions


> Groceries@itemInfo[1:10,]


> apply(Groceries@data[,10:20],2,function(r) paste(Groceries@itemInfo[r,"labels"],collapse=", "))


14


5.5 Example: Grocery Store Transactions 5.5.2 Frequent Itemset Generation


To illustrate the Apriori algorithm, the code below does each iteration separately.


Assume minimum support threshold = 0.02 (0.02 * 9853 = 198 items), get 122 itemsets total


First, get itemsets of length 1


> itemsets<-apriori(Groceries,parameter=list(minlen=1,maxlen=1,support=0.02,target="frequent itemsets"))


> summary(itemsets) # found 59 itemsets


> inspect(head(sort(itemsets,by="support"),10)) # lists top 10


Second, get itemsets of length 2


> itemsets<-apriori(Groceries,parameter=list(minlen=2,maxlen=2,support=0.02,target="frequent itemsets"))


> summary(itemsets) # found 61 itemsets


> inspect(head(sort(itemsets,by="support"),10)) # lists top 10


Third, get itemsets of length 3


> itemsets<-apriori(Groceries,parameter=list(minlen=3,maxlen=3,support=0.02,target="frequent itemsets"))


> summary(itemsets) # found 2 itemsets


> inspect(head(sort(itemsets,by="support"),10)) # lists top 10


> summary(itemsets) # found 59 itemsets> inspect(head(sort(itemsets,by="support"),10)) # lists top 10 supported items


15


5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization


The Apriori algorithm will now generate rules.


Set minimum support threshold to 0.001 (allows more rules, presumably for the scatterplot) and minimum confidence threshold to 0.6 to generate 2,918 rules.


> rules <- apriori(Groceries,parameter=list(support=0.001,confidence=0.6,target="rules"))


> summary(rules) # finds 2918 rules


> plot(rules) # displays scatterplot


The scatterplot shows that the highest lift occurs at a low support and a low confidence.


16


5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization


> plot(rules)


17


5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization


Get scatterplot matrix to compare the support, confidence, and lift of the 2918 rules


> plot(rules@quality) # displays scatterplot matrix


Lift is proportional to confidence with several linear groupings.


Note that Lift = Confidence/Support(Y), so when support of Y remains the same, lift is proportional to confidence and the slope of the linear trend is the reciprocal of Support(Y).


18


5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization


> plot(rules)


19


5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization


Compute the 1/Support(Y) which is the slope


> slope<-sort(round(rules@quality$lift/rules@quality$confidence,2))


Display the number of times each slope appears in dataset


> unlist(lapply(split(slope,f=slope),length))


Display the top 10 rules sorted by lift


> inspect(head(sort(rules,by="lift"),10))


Rule {Instant food products, soda} -> {hamburger meat}


has the highest lift of 19 (page 154)


20


5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization


Find the rules with confidence above 0.9


> confidentRules<-rules[quality(rules)$confidence>0.9]


> confidentRules # set of 127 rules


Plot a matrix-based visualization of the LHS v RHS of rules


> plot(confidentRules,method="matrix",measure=c("lift","confidence"),control=list(reorder=TRUE))


The legend on the right is a color matrix indicating the lift and the confidence to which each square in the main matrix corresponds


21


5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization


> plot(rules)


22


5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization


Visualize the top 5 rules with the highest lift.


> highLiftRules<-head(sort(rules,by="lift"),5)


> plot(highLiftRules,method="graph",control=list(type="items"))


In the graph, the arrow always points from an item on the LHS to an item on the RHS.


For example, the arrows that connects ham, processed cheese, and white bread suggest the rule


{ham, processed cheese} -> {white bread}


Size of circle indicates support and shade represents lift


23


5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization


24


5.6 Validation and Testing


The frequent and high confidence itemsets are found by pre-specified minimum support and minimum confidence levels


Measures like lift and/or leverage then ensure that interesting rules are identified rather than coincidental ones


However, some of the remaining rules may be considered subjectively uninteresting because they don’t yield unexpected profitable actions


E.g., rules like {paper} -> {pencil} are not interesting/meaningful


Incorporating subjective knowledge requires domain experts


Good rules provide valuable insights for institutions to improve their business operations


25


5.7 Diagnostics


Although minimum support is pre-specified in phases 3&4, this level can be adjusted to target the range of the number of rules – variants/improvements of Apriori are available


For large datasets the Apriori algorithm can be computationally expensive – efficiency improvements


Partitioning


Sampling


Transaction reduction


Hash-based itemset counting


Dynamic itemset counting


26


Applied Sciences

Architecture and Design

Biology

Business & Finance

Chemistry

Computer Science

Geography

Geology

Education

Engineering

English

Environmental science

Spanish

Government

History

Human Resource Management

Information Systems

Law

Literature

Mathematics

Nursing

Physics

Political Science

Psychology

Reading

Science

Social Science

Home

Blog

Archive

Contact

google+twitterfacebook

Copyright © 2019 HomeworkMarket.com

Homework is Completed By:

Writer Writer Name Amount Client Comments & Rating
Instant Homework Helper

ONLINE

Instant Homework Helper

$36

She helped me in last minute in a very reasonable price. She is a lifesaver, I got A+ grade in my homework, I will surely hire her again for my next assignments, Thumbs Up!

Order & Get This Solution Within 3 Hours in $25/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 3 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

Order & Get This Solution Within 6 Hours in $20/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 6 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

Order & Get This Solution Within 12 Hours in $15/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 12 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

6 writers have sent their proposals to do this homework:

Top Essay Tutor
University Coursework Help
Homework Guru
Best Coursework Help
Helping Hand
Writer Writer Name Offer Chat
Top Essay Tutor

ONLINE

Top Essay Tutor

I have more than 12 years of experience in managing online classes, exams, and quizzes on different websites like; Connect, McGraw-Hill, and Blackboard. I always provide a guarantee to my clients for their grades.

$40 Chat With Writer
University Coursework Help

ONLINE

University Coursework Help

Hi dear, I am ready to do your homework in a reasonable price.

$37 Chat With Writer
Homework Guru

ONLINE

Homework Guru

Hi dear, I am ready to do your homework in a reasonable price and in a timely manner.

$37 Chat With Writer
Best Coursework Help

ONLINE

Best Coursework Help

I am an Academic writer with 10 years of experience. As an Academic writer, my aim is to generate unique content without Plagiarism as per the client’s requirements.

$35 Chat With Writer
Helping Hand

ONLINE

Helping Hand

I am an Academic writer with 10 years of experience. As an Academic writer, my aim is to generate unique content without Plagiarism as per the client’s requirements.

$35 Chat With Writer

Let our expert academic writers to help you in achieving a+ grades in your homework, assignment, quiz or exam.

Similar Homework Questions

History of public health - Organizations must continually train their current employees because of _____. - What problems do ups's information systems solve - In and out burger secret codes - Class One: Part One - Razor over comb definition - Art thou afeard translation - Zurich company reports pretax financial income - Green mountain coffee roasters statement of cash flows 2019 - Fit2work police check form - Cornwall voluntary sector forum - How to respond to an osha complaint letter - 2019 osha 300 log excel - Access Control Reflection in Work Environment - The inside story of how the iphone crippled blackberry - Sodium carbonate and phenolphthalein reaction - O vanillin and p toluidine - Batch processing teach ict - Leaky barrel model - Assignment 2 - +91-8306951337 get your love back by vashikaran IN Udaipur - Blue jay taxonomy chart - Statistical Case Study 2 - Step input steady state error - Fitness components for cricket - Scavenger food chain examples - 1. Write a summary in the box utilizing the techniques presented in class - Ataps referral form nsw - How to disable mcafee endpoint encryption for removable media - Chandler macleod online assessment - 3.05 media and politics assessment - Dermal filler aftercare instructions - Bundaberg rum 4.5 litre cradle - Homework - Wk 2 – Apply: Visual Representation of Data [due Mon] - Culture Identity - V2 u2 2as proof - IVR Continued - Paper 5 - Pediatrics - Commercial building allowance hong kong - Requirement gathering techniques interview - Module 2 Discussion - During may joliet fabrics corporation manufactured - Closing statement - Vector worksheet 2 answers - University of nottingham nightingale hall - Finance - Heb flyin saucy directions - Socio - Stella artois dan murphys - Who can complete this assignment by tomorrow, September 1st by 11am? - Persuasive Speak and self review - Cloud Service Interruptions from DDOS - Business Econometrics - Reproduction in bacteria ppt - Midwest lighting inc case study - Incentive Program - Do they work? - Discussion Board (respond to student post below) - Psy 104 week 3 assignment - A senior accounting major at midsouth state university - Types of strategies ppt - Infant soap note example - Visual basic web browser - What is the main problem in the book the crossover - Ethical guidelines for statistical practice - Psychology Paper due 10/06/2020 - One subject lesson plan template - Challenges facing financial managers today - Application of cross cultural psychology presentation psy 450 - The extraordinary leader zenger folkman pdf - Collective noun for yachts - Monash university council regulations - Wk 3, HCS 341: DQ - The oxbow thomas cole color - History of corrugated iron in australia - Business 313 - Aed defibrillator exam answers - Module 4 Journal Article Analysis - Rea diagram revenue cycle cardinalities - Harvard project management simulation answers - 4 frames of colorblind racism - What is the thesis of vampires never die - Cirque du soleil permanent locations - Nantucket hammocks inc uses dealer incentives - Chapter 9 journalizing purchases and cash payments answers - Chapter 11 managing weight and eating behaviors vocabulary practice answers - Prepare an income statement for marsh corporation - Gate control theory of pain nursing - Name two common types of alloy that passivate - Week 15 - James r jordan sr - Legalism sacred texts - Rank indicator in lte - Introduction to sociology seagull 10th edition pdf - Nursing Information management and technology Week 2 - Excel - Operational Excellence - Contact sheet in adobe bridge - If a company’s beta were to double, would its expected return double?