Loading...

Messages

Proposals

Stuck in your homework and missing deadline? Get urgent help in $10/Page with 24 hours deadline

Get Urgent Writing Help In Your Essays, Assignments, Homeworks, Dissertation, Thesis Or Coursework & Achieve A+ Grades.

Privacy Guaranteed - 100% Plagiarism Free Writing - Free Turnitin Report - Professional And Experienced Writers - 24/7 Online Support

Apriori property

17/12/2020 Client: saad24vbs Deadline: 7 Days

Data Science and Big Data Analytics


Chapter 5: Advanced Analytical Theory and Methods: Association Rules


1


Chapter Sections


5.1 Overview


5.2 Apriori Algorithm


5.3 Evaluation of Candidate Rules


5.4 Example: Transactions in a Grocery Store


5.5 Validation and Testing


5.6 Diagnostics


2


5.1 Overview


Association rules method


Unsupervised learning method


Descriptive (not predictive) method


Used to find hidden relationships in data


The relationships are represented as rules


Questions association rules might answer


Which products tend to be purchased together


What products do similar customers tend to buy


3


5.1 Overview


Example – general logic of association rules


4


5.1 Overview


Rules have the form X -> Y


When X is observed, Y is also observed


Itemset


Collection of items or entities


k-itemset = {item 1, item 2,…,item k}


Examples


Items purchased in one transaction


Set of hyperlinks clicked by a user in one session


5


5.1 Overview – Apriori Algorithm


Apriori is the most fundamental algorithm


Given itemset L, support of L is the percent of transactions that contain L


Frequent itemset – items appear together “often enough”


Minimum support defines “often enough” (% transactions)


If an itemset is frequent, then any subset is frequent


6


5.1 Overview – Apriori Algorithm


If {B,C,D} frequent, then all subsets frequent


7


5.2 Apriori Algorithm Frequent = minimum support


Bottom-up iterative algorithm


Identify the frequent (min support) 1-itemsets


Frequent 1-itemsets are paired into 2-itemsets, and the frequent 2-itemsets are identified, etc.


Definitions for next slide


D = transaction database


d = minimum support threshold


N = maximum length of itemset (optional parameter)


Ck = set of candidate k-itemsets


Lk = set of k-itemsets with minimum support


8


5.2 Apriori Algorithm


9


5.3 Evaluation of Candidate Rules Confidence


Frequent itemsets can form candidate rules


Confidence measures the certainty of a rule


Minimum confidence – predefined threshold


Problem with confidence


Given a rule X->Y, confidence considers only the antecedent (X) and the co-occurrence of X and Y


Cannot tell if a rule contains true implication


10


5.3 Evaluation of Candidate Rules Lift


Lift measures how much more often X and Y occur together than expected if statistically independent


Lift = 1 if X and Y are statistically independent


Lift > 1 indicates the degree of usefulness of the rule


Example – in 1000 transactions,


If {milk, eggs} appears in 300, {milk} in 500, and {eggs} in 400, then Lift(milk->eggs) = 0.3/(0.5*0.4) = 1.5


If {milk, bread} appears in 400, {milk} in 500, and {bread} in 400, then Lift(milk->bread) = 0.4/(0.5*0.4) = 2.0


11


5.3 Evaluation of Candidate Rules Leverage


Leverage measures the difference in the probability of X and Y appearing together compared to statistical independence


Leverage = 0 if X and Y are statistically independent


Leverage > 0 indicates degree of usefulness of rule


Example – in 1000 transactions,


If {milk, eggs} appears in 300, {milk} in 500, and {eggs} in 400, then Leverage(milk->eggs) = 0.3 - 0.5*0.4 = 0.1


If {milk, bread} appears in 400, {milk} in 500, and {bread} in 400, then Leverage (milk->bread) = 0.4 - 0.5*0.4 = 0.2


12


5.4 Applications of Association Rules


The term market basket analysis refers to a specific implementation of association rules


For better merchandising – products to include/exclude from inventory each month


Placement of products within related products


Association rules also used for


Recommender systems – Amazon, Netflix


Clickstream analysis from web usage log files


Website visitors to page X click on links A,B,C more than on links D,E,F


13


5.5 Example: Grocery Store Transactions 5.5.1 The Groceries Dataset


Packages -> Install -> arules, arulesViz # don’t enter next line


> install.packages(c("arules", "arulesViz")) # appears on console


> library('arules')


> library('arulesViz')


> data(Groceries)


> summary(Groceries) # indicates 9835 rows


Class of dataset Groceries is transactions, containing 3 slots


transactionInfo # data frame with vectors having length of transactions


itemInfo # data frame storing item labels


data # binary evidence matrix of labels in transactions


> Groceries@itemInfo[1:10,]


> apply(Groceries@data[,10:20],2,function(r) paste(Groceries@itemInfo[r,"labels"],collapse=", "))


14


5.5 Example: Grocery Store Transactions 5.5.2 Frequent Itemset Generation


To illustrate the Apriori algorithm, the code below does each iteration separately.


Assume minimum support threshold = 0.02 (0.02 * 9853 = 198 items), get 122 itemsets total


First, get itemsets of length 1


> itemsets<-apriori(Groceries,parameter=list(minlen=1,maxlen=1,support=0.02,target="frequent itemsets"))


> summary(itemsets) # found 59 itemsets


> inspect(head(sort(itemsets,by="support"),10)) # lists top 10


Second, get itemsets of length 2


> itemsets<-apriori(Groceries,parameter=list(minlen=2,maxlen=2,support=0.02,target="frequent itemsets"))


> summary(itemsets) # found 61 itemsets


> inspect(head(sort(itemsets,by="support"),10)) # lists top 10


Third, get itemsets of length 3


> itemsets<-apriori(Groceries,parameter=list(minlen=3,maxlen=3,support=0.02,target="frequent itemsets"))


> summary(itemsets) # found 2 itemsets


> inspect(head(sort(itemsets,by="support"),10)) # lists top 10


> summary(itemsets) # found 59 itemsets> inspect(head(sort(itemsets,by="support"),10)) # lists top 10 supported items


15


5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization


The Apriori algorithm will now generate rules.


Set minimum support threshold to 0.001 (allows more rules, presumably for the scatterplot) and minimum confidence threshold to 0.6 to generate 2,918 rules.


> rules <- apriori(Groceries,parameter=list(support=0.001,confidence=0.6,target="rules"))


> summary(rules) # finds 2918 rules


> plot(rules) # displays scatterplot


The scatterplot shows that the highest lift occurs at a low support and a low confidence.


16


5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization


> plot(rules)


17


5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization


Get scatterplot matrix to compare the support, confidence, and lift of the 2918 rules


> plot(rules@quality) # displays scatterplot matrix


Lift is proportional to confidence with several linear groupings.


Note that Lift = Confidence/Support(Y), so when support of Y remains the same, lift is proportional to confidence and the slope of the linear trend is the reciprocal of Support(Y).


18


5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization


> plot(rules)


19


5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization


Compute the 1/Support(Y) which is the slope


> slope<-sort(round(rules@quality$lift/rules@quality$confidence,2))


Display the number of times each slope appears in dataset


> unlist(lapply(split(slope,f=slope),length))


Display the top 10 rules sorted by lift


> inspect(head(sort(rules,by="lift"),10))


Rule {Instant food products, soda} -> {hamburger meat}


has the highest lift of 19 (page 154)


20


5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization


Find the rules with confidence above 0.9


> confidentRules<-rules[quality(rules)$confidence>0.9]


> confidentRules # set of 127 rules


Plot a matrix-based visualization of the LHS v RHS of rules


> plot(confidentRules,method="matrix",measure=c("lift","confidence"),control=list(reorder=TRUE))


The legend on the right is a color matrix indicating the lift and the confidence to which each square in the main matrix corresponds


21


5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization


> plot(rules)


22


5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization


Visualize the top 5 rules with the highest lift.


> highLiftRules<-head(sort(rules,by="lift"),5)


> plot(highLiftRules,method="graph",control=list(type="items"))


In the graph, the arrow always points from an item on the LHS to an item on the RHS.


For example, the arrows that connects ham, processed cheese, and white bread suggest the rule


{ham, processed cheese} -> {white bread}


Size of circle indicates support and shade represents lift


23


5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization


24


5.6 Validation and Testing


The frequent and high confidence itemsets are found by pre-specified minimum support and minimum confidence levels


Measures like lift and/or leverage then ensure that interesting rules are identified rather than coincidental ones


However, some of the remaining rules may be considered subjectively uninteresting because they don’t yield unexpected profitable actions


E.g., rules like {paper} -> {pencil} are not interesting/meaningful


Incorporating subjective knowledge requires domain experts


Good rules provide valuable insights for institutions to improve their business operations


25


5.7 Diagnostics


Although minimum support is pre-specified in phases 3&4, this level can be adjusted to target the range of the number of rules – variants/improvements of Apriori are available


For large datasets the Apriori algorithm can be computationally expensive – efficiency improvements


Partitioning


Sampling


Transaction reduction


Hash-based itemset counting


Dynamic itemset counting


26


Applied Sciences

Architecture and Design

Biology

Business & Finance

Chemistry

Computer Science

Geography

Geology

Education

Engineering

English

Environmental science

Spanish

Government

History

Human Resource Management

Information Systems

Law

Literature

Mathematics

Nursing

Physics

Political Science

Psychology

Reading

Science

Social Science

Home

Blog

Archive

Contact

google+twitterfacebook

Copyright © 2019 HomeworkMarket.com

Homework is Completed By:

Writer Writer Name Amount Client Comments & Rating
Instant Homework Helper

ONLINE

Instant Homework Helper

$36

She helped me in last minute in a very reasonable price. She is a lifesaver, I got A+ grade in my homework, I will surely hire her again for my next assignments, Thumbs Up!

Order & Get This Solution Within 3 Hours in $25/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 3 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

Order & Get This Solution Within 6 Hours in $20/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 6 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

Order & Get This Solution Within 12 Hours in $15/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 12 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

6 writers have sent their proposals to do this homework:

Top Essay Tutor
University Coursework Help
Homework Guru
Best Coursework Help
Helping Hand
Writer Writer Name Offer Chat
Top Essay Tutor

ONLINE

Top Essay Tutor

I have more than 12 years of experience in managing online classes, exams, and quizzes on different websites like; Connect, McGraw-Hill, and Blackboard. I always provide a guarantee to my clients for their grades.

$40 Chat With Writer
University Coursework Help

ONLINE

University Coursework Help

Hi dear, I am ready to do your homework in a reasonable price.

$37 Chat With Writer
Homework Guru

ONLINE

Homework Guru

Hi dear, I am ready to do your homework in a reasonable price and in a timely manner.

$37 Chat With Writer
Best Coursework Help

ONLINE

Best Coursework Help

I am an Academic writer with 10 years of experience. As an Academic writer, my aim is to generate unique content without Plagiarism as per the client’s requirements.

$35 Chat With Writer
Helping Hand

ONLINE

Helping Hand

I am an Academic writer with 10 years of experience. As an Academic writer, my aim is to generate unique content without Plagiarism as per the client’s requirements.

$35 Chat With Writer

Let our expert academic writers to help you in achieving a+ grades in your homework, assignment, quiz or exam.

Similar Homework Questions

Eurographics symposium on geometry processing - Christine ewing is a licensed cpa - Disk Image - Niddrie mill primary school - Explain 7cs of communication with examples - What is a windshield survey nursing - Characteristics of payment system ppt - Faraday's law experiment report - Starbucks 10-K report - 700-W1,2,3,6,7 - Bod reaction rate constant - Dracula chapter 11 summary - Hacksaw ridge questions and answers - Ai 900 exam questions - Bright futures family day care - Georgia o'keeffe watercolor flowers - Deep transverse arrest ppt - Paper - Blue bank remote deposit capture project - Onx enterprise solutions ltd - Kim Woods only - Http changingminds org disciplines leadership styles leadership_styles htm - Legal and Ethical Issues in Mental Health - Beth r jordan lives at 2322 - Mslexia poetry pamphlet competition - Why is foopets not free anymore - How to change fraction to decimal in casio calculator - 1830 census statistics worksheet answer key - Qualitative analysis of group 4 cations lab report - YouTube - Soler theory health and social care - Airasia and the tune group diversifying across industries - Twm sion cati cave - Midmark iq manager download - Admn2 - A four reaction copper cycle post lab answers - Katrina turquotte ilford - Graph for sin x cos y - 30000 psi to gpa - Game theory behavioral economics - There will come soft rains answers - Arzen wetsuit review - Advantages and disadvantages of participative management - How to write magnification - Specific governance plan that exists at a company - Uc berkeley extension classes - Usps mail fraud report - Sandra cisneros woman hollering creek summary - Spontaneous communitas abolishes status. - Examples of word for word plagiarism and paraphrasing plagiarism - Document on Information Technology and Organizational Learning - Is a walrus a homeothermic endotherm - Mlc masterkey personal super - My tafe login gcit - Discussion - Business ethics william shaw 9th edition - Wmi bios configuration tool - Snapps presentation format - Aqa gcse physics equations - Roller pump vs centrifugal pump ecmo - As the twig is bent so grows the tree meaning - Integrated marketing communications multiple choice questions - Search text in bluebeam - Beeler furniture company bank reconciliation june - Identifying your followership style questionnaire - What is the role of the frontal association area - Mole richardson 20k led - Rules of stage movement - Binary to gray code truth table - Unit 3 Assignment: TED Talk Reflection Paper - Christology from above - How to calculate total score in spss - As 1554 part 1 - Revise!!!! - Hr discussion - Ulster hospital visiting times - Which is the last step in developing the master budget - 4 elms medical centre - Clinical exercise physiology griffith - Community windshield survey paper - AUDIT PROCEDURES FOR PROPERTY, PLANT, AND EQUIPMENT - Japanese Civilization - Imeche code of conduct - The crayon box that talked coloring page - Multifactor productivity questions and answers - Nutritional Assessment - Linear optimization model excel - Girl FRiend ((LOVE)) vashikaran +91-7023339183 specialist molviji - Neuro obs post fall in aged care - The conditional tense in italian - Gios pizza mount barker - Adams v lindsell 1818 case summary - Aspland maternity home hyde - 2.2 lbs to kg - Write up on guest lecture - Why did mr stone develop heart failure - Summary - What are the key considerations when printing a graphic novel? - Eservices comcourts gov au - Jb hi fi phone covers