Loading...

Messages

Proposals

Stuck in your homework and missing deadline? Get urgent help in $10/Page with 24 hours deadline

Get Urgent Writing Help In Your Essays, Assignments, Homeworks, Dissertation, Thesis Or Coursework & Achieve A+ Grades.

Privacy Guaranteed - 100% Plagiarism Free Writing - Free Turnitin Report - Professional And Experienced Writers - 24/7 Online Support

Apriori property

17/12/2020 Client: saad24vbs Deadline: 7 Days

Data Science and Big Data Analytics


Chapter 5: Advanced Analytical Theory and Methods: Association Rules


1


Chapter Sections


5.1 Overview


5.2 Apriori Algorithm


5.3 Evaluation of Candidate Rules


5.4 Example: Transactions in a Grocery Store


5.5 Validation and Testing


5.6 Diagnostics


2


5.1 Overview


Association rules method


Unsupervised learning method


Descriptive (not predictive) method


Used to find hidden relationships in data


The relationships are represented as rules


Questions association rules might answer


Which products tend to be purchased together


What products do similar customers tend to buy


3


5.1 Overview


Example – general logic of association rules


4


5.1 Overview


Rules have the form X -> Y


When X is observed, Y is also observed


Itemset


Collection of items or entities


k-itemset = {item 1, item 2,…,item k}


Examples


Items purchased in one transaction


Set of hyperlinks clicked by a user in one session


5


5.1 Overview – Apriori Algorithm


Apriori is the most fundamental algorithm


Given itemset L, support of L is the percent of transactions that contain L


Frequent itemset – items appear together “often enough”


Minimum support defines “often enough” (% transactions)


If an itemset is frequent, then any subset is frequent


6


5.1 Overview – Apriori Algorithm


If {B,C,D} frequent, then all subsets frequent


7


5.2 Apriori Algorithm Frequent = minimum support


Bottom-up iterative algorithm


Identify the frequent (min support) 1-itemsets


Frequent 1-itemsets are paired into 2-itemsets, and the frequent 2-itemsets are identified, etc.


Definitions for next slide


D = transaction database


d = minimum support threshold


N = maximum length of itemset (optional parameter)


Ck = set of candidate k-itemsets


Lk = set of k-itemsets with minimum support


8


5.2 Apriori Algorithm


9


5.3 Evaluation of Candidate Rules Confidence


Frequent itemsets can form candidate rules


Confidence measures the certainty of a rule


Minimum confidence – predefined threshold


Problem with confidence


Given a rule X->Y, confidence considers only the antecedent (X) and the co-occurrence of X and Y


Cannot tell if a rule contains true implication


10


5.3 Evaluation of Candidate Rules Lift


Lift measures how much more often X and Y occur together than expected if statistically independent


Lift = 1 if X and Y are statistically independent


Lift > 1 indicates the degree of usefulness of the rule


Example – in 1000 transactions,


If {milk, eggs} appears in 300, {milk} in 500, and {eggs} in 400, then Lift(milk->eggs) = 0.3/(0.5*0.4) = 1.5


If {milk, bread} appears in 400, {milk} in 500, and {bread} in 400, then Lift(milk->bread) = 0.4/(0.5*0.4) = 2.0


11


5.3 Evaluation of Candidate Rules Leverage


Leverage measures the difference in the probability of X and Y appearing together compared to statistical independence


Leverage = 0 if X and Y are statistically independent


Leverage > 0 indicates degree of usefulness of rule


Example – in 1000 transactions,


If {milk, eggs} appears in 300, {milk} in 500, and {eggs} in 400, then Leverage(milk->eggs) = 0.3 - 0.5*0.4 = 0.1


If {milk, bread} appears in 400, {milk} in 500, and {bread} in 400, then Leverage (milk->bread) = 0.4 - 0.5*0.4 = 0.2


12


5.4 Applications of Association Rules


The term market basket analysis refers to a specific implementation of association rules


For better merchandising – products to include/exclude from inventory each month


Placement of products within related products


Association rules also used for


Recommender systems – Amazon, Netflix


Clickstream analysis from web usage log files


Website visitors to page X click on links A,B,C more than on links D,E,F


13


5.5 Example: Grocery Store Transactions 5.5.1 The Groceries Dataset


Packages -> Install -> arules, arulesViz # don’t enter next line


> install.packages(c("arules", "arulesViz")) # appears on console


> library('arules')


> library('arulesViz')


> data(Groceries)


> summary(Groceries) # indicates 9835 rows


Class of dataset Groceries is transactions, containing 3 slots


transactionInfo # data frame with vectors having length of transactions


itemInfo # data frame storing item labels


data # binary evidence matrix of labels in transactions


> Groceries@itemInfo[1:10,]


> apply(Groceries@data[,10:20],2,function(r) paste(Groceries@itemInfo[r,"labels"],collapse=", "))


14


5.5 Example: Grocery Store Transactions 5.5.2 Frequent Itemset Generation


To illustrate the Apriori algorithm, the code below does each iteration separately.


Assume minimum support threshold = 0.02 (0.02 * 9853 = 198 items), get 122 itemsets total


First, get itemsets of length 1


> itemsets<-apriori(Groceries,parameter=list(minlen=1,maxlen=1,support=0.02,target="frequent itemsets"))


> summary(itemsets) # found 59 itemsets


> inspect(head(sort(itemsets,by="support"),10)) # lists top 10


Second, get itemsets of length 2


> itemsets<-apriori(Groceries,parameter=list(minlen=2,maxlen=2,support=0.02,target="frequent itemsets"))


> summary(itemsets) # found 61 itemsets


> inspect(head(sort(itemsets,by="support"),10)) # lists top 10


Third, get itemsets of length 3


> itemsets<-apriori(Groceries,parameter=list(minlen=3,maxlen=3,support=0.02,target="frequent itemsets"))


> summary(itemsets) # found 2 itemsets


> inspect(head(sort(itemsets,by="support"),10)) # lists top 10


> summary(itemsets) # found 59 itemsets> inspect(head(sort(itemsets,by="support"),10)) # lists top 10 supported items


15


5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization


The Apriori algorithm will now generate rules.


Set minimum support threshold to 0.001 (allows more rules, presumably for the scatterplot) and minimum confidence threshold to 0.6 to generate 2,918 rules.


> rules <- apriori(Groceries,parameter=list(support=0.001,confidence=0.6,target="rules"))


> summary(rules) # finds 2918 rules


> plot(rules) # displays scatterplot


The scatterplot shows that the highest lift occurs at a low support and a low confidence.


16


5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization


> plot(rules)


17


5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization


Get scatterplot matrix to compare the support, confidence, and lift of the 2918 rules


> plot(rules@quality) # displays scatterplot matrix


Lift is proportional to confidence with several linear groupings.


Note that Lift = Confidence/Support(Y), so when support of Y remains the same, lift is proportional to confidence and the slope of the linear trend is the reciprocal of Support(Y).


18


5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization


> plot(rules)


19


5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization


Compute the 1/Support(Y) which is the slope


> slope<-sort(round(rules@quality$lift/rules@quality$confidence,2))


Display the number of times each slope appears in dataset


> unlist(lapply(split(slope,f=slope),length))


Display the top 10 rules sorted by lift


> inspect(head(sort(rules,by="lift"),10))


Rule {Instant food products, soda} -> {hamburger meat}


has the highest lift of 19 (page 154)


20


5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization


Find the rules with confidence above 0.9


> confidentRules<-rules[quality(rules)$confidence>0.9]


> confidentRules # set of 127 rules


Plot a matrix-based visualization of the LHS v RHS of rules


> plot(confidentRules,method="matrix",measure=c("lift","confidence"),control=list(reorder=TRUE))


The legend on the right is a color matrix indicating the lift and the confidence to which each square in the main matrix corresponds


21


5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization


> plot(rules)


22


5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization


Visualize the top 5 rules with the highest lift.


> highLiftRules<-head(sort(rules,by="lift"),5)


> plot(highLiftRules,method="graph",control=list(type="items"))


In the graph, the arrow always points from an item on the LHS to an item on the RHS.


For example, the arrows that connects ham, processed cheese, and white bread suggest the rule


{ham, processed cheese} -> {white bread}


Size of circle indicates support and shade represents lift


23


5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization


24


5.6 Validation and Testing


The frequent and high confidence itemsets are found by pre-specified minimum support and minimum confidence levels


Measures like lift and/or leverage then ensure that interesting rules are identified rather than coincidental ones


However, some of the remaining rules may be considered subjectively uninteresting because they don’t yield unexpected profitable actions


E.g., rules like {paper} -> {pencil} are not interesting/meaningful


Incorporating subjective knowledge requires domain experts


Good rules provide valuable insights for institutions to improve their business operations


25


5.7 Diagnostics


Although minimum support is pre-specified in phases 3&4, this level can be adjusted to target the range of the number of rules – variants/improvements of Apriori are available


For large datasets the Apriori algorithm can be computationally expensive – efficiency improvements


Partitioning


Sampling


Transaction reduction


Hash-based itemset counting


Dynamic itemset counting


26


Applied Sciences

Architecture and Design

Biology

Business & Finance

Chemistry

Computer Science

Geography

Geology

Education

Engineering

English

Environmental science

Spanish

Government

History

Human Resource Management

Information Systems

Law

Literature

Mathematics

Nursing

Physics

Political Science

Psychology

Reading

Science

Social Science

Home

Blog

Archive

Contact

google+twitterfacebook

Copyright © 2019 HomeworkMarket.com

Homework is Completed By:

Writer Writer Name Amount Client Comments & Rating
Instant Homework Helper

ONLINE

Instant Homework Helper

$36

She helped me in last minute in a very reasonable price. She is a lifesaver, I got A+ grade in my homework, I will surely hire her again for my next assignments, Thumbs Up!

Order & Get This Solution Within 3 Hours in $25/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 3 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

Order & Get This Solution Within 6 Hours in $20/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 6 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

Order & Get This Solution Within 12 Hours in $15/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 12 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

6 writers have sent their proposals to do this homework:

Top Essay Tutor
University Coursework Help
Homework Guru
Best Coursework Help
Helping Hand
Writer Writer Name Offer Chat
Top Essay Tutor

ONLINE

Top Essay Tutor

I have more than 12 years of experience in managing online classes, exams, and quizzes on different websites like; Connect, McGraw-Hill, and Blackboard. I always provide a guarantee to my clients for their grades.

$40 Chat With Writer
University Coursework Help

ONLINE

University Coursework Help

Hi dear, I am ready to do your homework in a reasonable price.

$37 Chat With Writer
Homework Guru

ONLINE

Homework Guru

Hi dear, I am ready to do your homework in a reasonable price and in a timely manner.

$37 Chat With Writer
Best Coursework Help

ONLINE

Best Coursework Help

I am an Academic writer with 10 years of experience. As an Academic writer, my aim is to generate unique content without Plagiarism as per the client’s requirements.

$35 Chat With Writer
Helping Hand

ONLINE

Helping Hand

I am an Academic writer with 10 years of experience. As an Academic writer, my aim is to generate unique content without Plagiarism as per the client’s requirements.

$35 Chat With Writer

Let our expert academic writers to help you in achieving a+ grades in your homework, assignment, quiz or exam.

Similar Homework Questions

Pre-Calculus homework ONLINE 50 to start - Peoples first community bank tallahassee fl - Magic whiteboard dragons den - Llanishen court surgery cardiff - Organization Plan - Amazon succession plan - What happens in act 3 of the crucible - Conduire conjugation present tense - Karlamilyi national park camping - Corporations law exam notes - 21823 friendship view drive springdale ar - Bs5308 instrument cable specification - What material contributes greatest to the compressive strength of bone - 8-4-1 project 2: short answer responses about human expression - Esta semana mis padres 1 en bayamón puerto rico - Maths for finance : Test 23 questions - Discussion Board - International school of stavanger - Icom m710 control software - 365 project part -2 - Follow up discussion provide information or clarification - RM D.6 - The master switch tim wu pdf - Curve sketching homework answers - 10/3-2 - Food dye lab report - Discussion - Data driven decision making in education ppt - Describe the concept of "Economies of Scale" - Assignment 2 - Walk the Talk - Research Customer Service Policies - Arriva bus times milton keynes 8 - NURS-6050N-66/NURS-6050C-66-Policy & Advocacy - Monash mechanical engineering staff - Psychotherapy With group - Playing for pizza john grisham sparknotes - Week 4 - Assignment: Evaluate Visual Data Mining - Reel king potty cheats - Direct line pet claim form - Upper yarra reservoir park - Research paper - Fas 167 definition - Similarities between knights and samurai - Solarwinds npm hardware requirements - 3 5 6 as a improper fraction - How to draw nyquist plot - Take a look at my girlfriend she's the only one - Chemistry year 7 quiz - How to do limiting reagent questions - According to j rachels what is the core of ethics - Manual handling lifting chart - How to link documents in pebblepad - Deep water training pdf - Barkly downs station size - Emotional intelligence includes all of the following except - Which of the following statements about biodegradable polymers is true - Bio Ethics PHI324 - A wall of fire rising by edwidge danticat - Rate of reaction zinc and hydrochloric acid experiment - Is slavery an abstract noun - Discussion - Sap itsm configuration guide - Hygiene and sanitation in the bible - Phat katz tattoo colonial heights - Rage 2 pestilent park storage - Bilingualism in america hayakawa answers - The strategic role of a company's reward system is to - Prepare an income statement for marsh corporation - Userinfo getusertype in salesforce - Gender Appropriate And Inappropriate Behaviors - Food and beverages at southwestern university football games - Gcu reference guide - Revaluation surplus in balance sheet - The supply curve of a pure monopolist - Laerd statistics pearson correlation spss - The entity integrity rule states that - H101 - Crazymaking is the result of which conflict style - Hcf health insurance packages - Project closure report template - Accounting Assignment Help - Strategic management accounting course outline - Since you ethan derose lyrics - The all or none law - Macroeconomic analysis - Benzoin ir spectrum analysis - Stanford history education group sheg stanford edu answer key - .85 lambda to afr - Holyrood secondary school number - Better world books charity rating - A statistical method for identifying cost behavior is the: - Stars hide your fires analysis - 500 WORDS assignment - Unit 4 IP Motivation, Incentive, and Conflict Resolution - CCIS - Human Communication - Questions - What does in excelsis deo mean - Simplify sum of products calculator - Zach tuohy tattoo rose - Mkt