Loading...

Messages

Proposals

Stuck in your homework and missing deadline? Get urgent help in $10/Page with 24 hours deadline

Get Urgent Writing Help In Your Essays, Assignments, Homeworks, Dissertation, Thesis Or Coursework & Achieve A+ Grades.

Privacy Guaranteed - 100% Plagiarism Free Writing - Free Turnitin Report - Professional And Experienced Writers - 24/7 Online Support

Apriori property

17/12/2020 Client: saad24vbs Deadline: 7 Days

Data Science and Big Data Analytics


Chapter 5: Advanced Analytical Theory and Methods: Association Rules


1


Chapter Sections


5.1 Overview


5.2 Apriori Algorithm


5.3 Evaluation of Candidate Rules


5.4 Example: Transactions in a Grocery Store


5.5 Validation and Testing


5.6 Diagnostics


2


5.1 Overview


Association rules method


Unsupervised learning method


Descriptive (not predictive) method


Used to find hidden relationships in data


The relationships are represented as rules


Questions association rules might answer


Which products tend to be purchased together


What products do similar customers tend to buy


3


5.1 Overview


Example – general logic of association rules


4


5.1 Overview


Rules have the form X -> Y


When X is observed, Y is also observed


Itemset


Collection of items or entities


k-itemset = {item 1, item 2,…,item k}


Examples


Items purchased in one transaction


Set of hyperlinks clicked by a user in one session


5


5.1 Overview – Apriori Algorithm


Apriori is the most fundamental algorithm


Given itemset L, support of L is the percent of transactions that contain L


Frequent itemset – items appear together “often enough”


Minimum support defines “often enough” (% transactions)


If an itemset is frequent, then any subset is frequent


6


5.1 Overview – Apriori Algorithm


If {B,C,D} frequent, then all subsets frequent


7


5.2 Apriori Algorithm Frequent = minimum support


Bottom-up iterative algorithm


Identify the frequent (min support) 1-itemsets


Frequent 1-itemsets are paired into 2-itemsets, and the frequent 2-itemsets are identified, etc.


Definitions for next slide


D = transaction database


d = minimum support threshold


N = maximum length of itemset (optional parameter)


Ck = set of candidate k-itemsets


Lk = set of k-itemsets with minimum support


8


5.2 Apriori Algorithm


9


5.3 Evaluation of Candidate Rules Confidence


Frequent itemsets can form candidate rules


Confidence measures the certainty of a rule


Minimum confidence – predefined threshold


Problem with confidence


Given a rule X->Y, confidence considers only the antecedent (X) and the co-occurrence of X and Y


Cannot tell if a rule contains true implication


10


5.3 Evaluation of Candidate Rules Lift


Lift measures how much more often X and Y occur together than expected if statistically independent


Lift = 1 if X and Y are statistically independent


Lift > 1 indicates the degree of usefulness of the rule


Example – in 1000 transactions,


If {milk, eggs} appears in 300, {milk} in 500, and {eggs} in 400, then Lift(milk->eggs) = 0.3/(0.5*0.4) = 1.5


If {milk, bread} appears in 400, {milk} in 500, and {bread} in 400, then Lift(milk->bread) = 0.4/(0.5*0.4) = 2.0


11


5.3 Evaluation of Candidate Rules Leverage


Leverage measures the difference in the probability of X and Y appearing together compared to statistical independence


Leverage = 0 if X and Y are statistically independent


Leverage > 0 indicates degree of usefulness of rule


Example – in 1000 transactions,


If {milk, eggs} appears in 300, {milk} in 500, and {eggs} in 400, then Leverage(milk->eggs) = 0.3 - 0.5*0.4 = 0.1


If {milk, bread} appears in 400, {milk} in 500, and {bread} in 400, then Leverage (milk->bread) = 0.4 - 0.5*0.4 = 0.2


12


5.4 Applications of Association Rules


The term market basket analysis refers to a specific implementation of association rules


For better merchandising – products to include/exclude from inventory each month


Placement of products within related products


Association rules also used for


Recommender systems – Amazon, Netflix


Clickstream analysis from web usage log files


Website visitors to page X click on links A,B,C more than on links D,E,F


13


5.5 Example: Grocery Store Transactions 5.5.1 The Groceries Dataset


Packages -> Install -> arules, arulesViz # don’t enter next line


> install.packages(c("arules", "arulesViz")) # appears on console


> library('arules')


> library('arulesViz')


> data(Groceries)


> summary(Groceries) # indicates 9835 rows


Class of dataset Groceries is transactions, containing 3 slots


transactionInfo # data frame with vectors having length of transactions


itemInfo # data frame storing item labels


data # binary evidence matrix of labels in transactions


> Groceries@itemInfo[1:10,]


> apply(Groceries@data[,10:20],2,function(r) paste(Groceries@itemInfo[r,"labels"],collapse=", "))


14


5.5 Example: Grocery Store Transactions 5.5.2 Frequent Itemset Generation


To illustrate the Apriori algorithm, the code below does each iteration separately.


Assume minimum support threshold = 0.02 (0.02 * 9853 = 198 items), get 122 itemsets total


First, get itemsets of length 1


> itemsets<-apriori(Groceries,parameter=list(minlen=1,maxlen=1,support=0.02,target="frequent itemsets"))


> summary(itemsets) # found 59 itemsets


> inspect(head(sort(itemsets,by="support"),10)) # lists top 10


Second, get itemsets of length 2


> itemsets<-apriori(Groceries,parameter=list(minlen=2,maxlen=2,support=0.02,target="frequent itemsets"))


> summary(itemsets) # found 61 itemsets


> inspect(head(sort(itemsets,by="support"),10)) # lists top 10


Third, get itemsets of length 3


> itemsets<-apriori(Groceries,parameter=list(minlen=3,maxlen=3,support=0.02,target="frequent itemsets"))


> summary(itemsets) # found 2 itemsets


> inspect(head(sort(itemsets,by="support"),10)) # lists top 10


> summary(itemsets) # found 59 itemsets> inspect(head(sort(itemsets,by="support"),10)) # lists top 10 supported items


15


5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization


The Apriori algorithm will now generate rules.


Set minimum support threshold to 0.001 (allows more rules, presumably for the scatterplot) and minimum confidence threshold to 0.6 to generate 2,918 rules.


> rules <- apriori(Groceries,parameter=list(support=0.001,confidence=0.6,target="rules"))


> summary(rules) # finds 2918 rules


> plot(rules) # displays scatterplot


The scatterplot shows that the highest lift occurs at a low support and a low confidence.


16


5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization


> plot(rules)


17


5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization


Get scatterplot matrix to compare the support, confidence, and lift of the 2918 rules


> plot(rules@quality) # displays scatterplot matrix


Lift is proportional to confidence with several linear groupings.


Note that Lift = Confidence/Support(Y), so when support of Y remains the same, lift is proportional to confidence and the slope of the linear trend is the reciprocal of Support(Y).


18


5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization


> plot(rules)


19


5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization


Compute the 1/Support(Y) which is the slope


> slope<-sort(round(rules@quality$lift/rules@quality$confidence,2))


Display the number of times each slope appears in dataset


> unlist(lapply(split(slope,f=slope),length))


Display the top 10 rules sorted by lift


> inspect(head(sort(rules,by="lift"),10))


Rule {Instant food products, soda} -> {hamburger meat}


has the highest lift of 19 (page 154)


20


5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization


Find the rules with confidence above 0.9


> confidentRules<-rules[quality(rules)$confidence>0.9]


> confidentRules # set of 127 rules


Plot a matrix-based visualization of the LHS v RHS of rules


> plot(confidentRules,method="matrix",measure=c("lift","confidence"),control=list(reorder=TRUE))


The legend on the right is a color matrix indicating the lift and the confidence to which each square in the main matrix corresponds


21


5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization


> plot(rules)


22


5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization


Visualize the top 5 rules with the highest lift.


> highLiftRules<-head(sort(rules,by="lift"),5)


> plot(highLiftRules,method="graph",control=list(type="items"))


In the graph, the arrow always points from an item on the LHS to an item on the RHS.


For example, the arrows that connects ham, processed cheese, and white bread suggest the rule


{ham, processed cheese} -> {white bread}


Size of circle indicates support and shade represents lift


23


5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization


24


5.6 Validation and Testing


The frequent and high confidence itemsets are found by pre-specified minimum support and minimum confidence levels


Measures like lift and/or leverage then ensure that interesting rules are identified rather than coincidental ones


However, some of the remaining rules may be considered subjectively uninteresting because they don’t yield unexpected profitable actions


E.g., rules like {paper} -> {pencil} are not interesting/meaningful


Incorporating subjective knowledge requires domain experts


Good rules provide valuable insights for institutions to improve their business operations


25


5.7 Diagnostics


Although minimum support is pre-specified in phases 3&4, this level can be adjusted to target the range of the number of rules – variants/improvements of Apriori are available


For large datasets the Apriori algorithm can be computationally expensive – efficiency improvements


Partitioning


Sampling


Transaction reduction


Hash-based itemset counting


Dynamic itemset counting


26


Applied Sciences

Architecture and Design

Biology

Business & Finance

Chemistry

Computer Science

Geography

Geology

Education

Engineering

English

Environmental science

Spanish

Government

History

Human Resource Management

Information Systems

Law

Literature

Mathematics

Nursing

Physics

Political Science

Psychology

Reading

Science

Social Science

Home

Blog

Archive

Contact

google+twitterfacebook

Copyright © 2019 HomeworkMarket.com

Homework is Completed By:

Writer Writer Name Amount Client Comments & Rating
Instant Homework Helper

ONLINE

Instant Homework Helper

$36

She helped me in last minute in a very reasonable price. She is a lifesaver, I got A+ grade in my homework, I will surely hire her again for my next assignments, Thumbs Up!

Order & Get This Solution Within 3 Hours in $25/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 3 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

Order & Get This Solution Within 6 Hours in $20/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 6 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

Order & Get This Solution Within 12 Hours in $15/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 12 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

6 writers have sent their proposals to do this homework:

Top Essay Tutor
University Coursework Help
Homework Guru
Best Coursework Help
Helping Hand
Writer Writer Name Offer Chat
Top Essay Tutor

ONLINE

Top Essay Tutor

I have more than 12 years of experience in managing online classes, exams, and quizzes on different websites like; Connect, McGraw-Hill, and Blackboard. I always provide a guarantee to my clients for their grades.

$40 Chat With Writer
University Coursework Help

ONLINE

University Coursework Help

Hi dear, I am ready to do your homework in a reasonable price.

$37 Chat With Writer
Homework Guru

ONLINE

Homework Guru

Hi dear, I am ready to do your homework in a reasonable price and in a timely manner.

$37 Chat With Writer
Best Coursework Help

ONLINE

Best Coursework Help

I am an Academic writer with 10 years of experience. As an Academic writer, my aim is to generate unique content without Plagiarism as per the client’s requirements.

$35 Chat With Writer
Helping Hand

ONLINE

Helping Hand

I am an Academic writer with 10 years of experience. As an Academic writer, my aim is to generate unique content without Plagiarism as per the client’s requirements.

$35 Chat With Writer

Let our expert academic writers to help you in achieving a+ grades in your homework, assignment, quiz or exam.

Similar Homework Questions

W3 - Individual differences in organizational behavior pdf - Discussion / Answer 2 questions and comment on 2 students / 200~300 words for answers / need in 12 hours - Dc shunt generator construction - Strayer week 2 discussion - Heat transfer crossword puzzle - Essay 2 to 3 pages - Introducing apache hadoop the modern data operating system - 7 pool table room size - Nisc bryant - Economic surplus is maximized in a competitive market when - Dr cakebread and partners - Is the american red cross an ngo - American red cross swot analysis - Module 7 sam project 1a - Red cross case management - List of nursing interventions and rationales - Garden variety flower shop uses clay pots - Psychiatric evaluation form template - Position Paper - Man-1415 - Visual basic gpa calculator source code - Respond to 2 Colleagues Assignment post Wk4 1-2 paragraphs each - Power-Flow Problem - Bogotobogo android - Home hub 5 power saving - Cheerleading for 3 people - 22:00 in 12 hour - Management - Lesson 6.1 practice b use similar polygons answers - 90 tottenham court road - Essential property of food equipment - Excel 2016 essential training video from lynda com - Biblical concepts in counseling colorado springs - Glife grantham student login - 107nurwk3tr - RisK Mgt Project part - 2-3 Page - Future of the juvenile justice system proposal presentation - Jib approved electrician rate - Antenatal classes south lanarkshire - Ati ambulation transferring range of motion - 3.5mm audio cable male to female jaycar - Frozen french fries near me - Career2 successfactors eu royal mail - Attached are the requested documents - Case 1 IT590 - Types of plastics used in electronics - SMGT 501 - Analysis Paper Week 8 - Bible the greatest of these is love - A kubota tractor acquired on january - (Crj 303) Spurious Correlation Generators - Uncle in macedonian language - Walden university mission of social change - The perils of confusing management and leadership - Burgundy book sick pay - Ucsd price center map - Wall street journal app for windows - Economics discussion questions - Critical study of literature year 11 - History of mathematics in africa - Human resource quiz - P&g production & growth answers - Hard news story structure - AppliedManagerialFinance_Assessment1 - Tina jones musculoskeletal objective data - Wilsonart landmark wood 7981k 12 - Map of zambia with towns - Week 8 Discussion Question JGR100 - Cryokinetics is a rehabilitative technique based on - Monash imaging registrar roster - Conflict Resolution Research paper (LAW) - Review of ophthalmology pdf - Http www 123test com disc personality test - HR Paper DUE by 10 am 8/9 - Sonnet 116 modern english - Fair day's pay for a fair day's work taylor - East tamaki auckland postcode - Collapsible coiler reeler wire spooler - Morrisons distribution centre swan valley - Ken daly net worth - Tragedy of the commons environmental science worksheet - Accounting 201 Rephrase in own words - A lot of noise and no walnuts idiom meaning - Uses of lab equipment - Member railwayspensions co uk - Morden golden glow elder - Serving in florida barbara ehrenreich thesis - John frederick nims love poem analysis - We shall remain after the mayflower questions and answers - What is 5 of $10000 - Mq edu au library - Computer concepts in action worksheet answers - One door opens when another door close - Origins reflection powerpoint - BHS380 Module 3 Discussion Post 2 - C11 Lesson 1 Exam SCORE 95 PERCENT - Echidna physical adaptation that helps survival - Iot in structural engineering - What are the four market structures - Pest analysis on coca cola