Loading...

Messages

Proposals

Stuck in your homework and missing deadline? Get urgent help in $10/Page with 24 hours deadline

Get Urgent Writing Help In Your Essays, Assignments, Homeworks, Dissertation, Thesis Or Coursework & Achieve A+ Grades.

Privacy Guaranteed - 100% Plagiarism Free Writing - Free Turnitin Report - Professional And Experienced Writers - 24/7 Online Support

Apriori property

17/12/2020 Client: saad24vbs Deadline: 7 Days

Data Science and Big Data Analytics


Chapter 5: Advanced Analytical Theory and Methods: Association Rules


1


Chapter Sections


5.1 Overview


5.2 Apriori Algorithm


5.3 Evaluation of Candidate Rules


5.4 Example: Transactions in a Grocery Store


5.5 Validation and Testing


5.6 Diagnostics


2


5.1 Overview


Association rules method


Unsupervised learning method


Descriptive (not predictive) method


Used to find hidden relationships in data


The relationships are represented as rules


Questions association rules might answer


Which products tend to be purchased together


What products do similar customers tend to buy


3


5.1 Overview


Example – general logic of association rules


4


5.1 Overview


Rules have the form X -> Y


When X is observed, Y is also observed


Itemset


Collection of items or entities


k-itemset = {item 1, item 2,…,item k}


Examples


Items purchased in one transaction


Set of hyperlinks clicked by a user in one session


5


5.1 Overview – Apriori Algorithm


Apriori is the most fundamental algorithm


Given itemset L, support of L is the percent of transactions that contain L


Frequent itemset – items appear together “often enough”


Minimum support defines “often enough” (% transactions)


If an itemset is frequent, then any subset is frequent


6


5.1 Overview – Apriori Algorithm


If {B,C,D} frequent, then all subsets frequent


7


5.2 Apriori Algorithm Frequent = minimum support


Bottom-up iterative algorithm


Identify the frequent (min support) 1-itemsets


Frequent 1-itemsets are paired into 2-itemsets, and the frequent 2-itemsets are identified, etc.


Definitions for next slide


D = transaction database


d = minimum support threshold


N = maximum length of itemset (optional parameter)


Ck = set of candidate k-itemsets


Lk = set of k-itemsets with minimum support


8


5.2 Apriori Algorithm


9


5.3 Evaluation of Candidate Rules Confidence


Frequent itemsets can form candidate rules


Confidence measures the certainty of a rule


Minimum confidence – predefined threshold


Problem with confidence


Given a rule X->Y, confidence considers only the antecedent (X) and the co-occurrence of X and Y


Cannot tell if a rule contains true implication


10


5.3 Evaluation of Candidate Rules Lift


Lift measures how much more often X and Y occur together than expected if statistically independent


Lift = 1 if X and Y are statistically independent


Lift > 1 indicates the degree of usefulness of the rule


Example – in 1000 transactions,


If {milk, eggs} appears in 300, {milk} in 500, and {eggs} in 400, then Lift(milk->eggs) = 0.3/(0.5*0.4) = 1.5


If {milk, bread} appears in 400, {milk} in 500, and {bread} in 400, then Lift(milk->bread) = 0.4/(0.5*0.4) = 2.0


11


5.3 Evaluation of Candidate Rules Leverage


Leverage measures the difference in the probability of X and Y appearing together compared to statistical independence


Leverage = 0 if X and Y are statistically independent


Leverage > 0 indicates degree of usefulness of rule


Example – in 1000 transactions,


If {milk, eggs} appears in 300, {milk} in 500, and {eggs} in 400, then Leverage(milk->eggs) = 0.3 - 0.5*0.4 = 0.1


If {milk, bread} appears in 400, {milk} in 500, and {bread} in 400, then Leverage (milk->bread) = 0.4 - 0.5*0.4 = 0.2


12


5.4 Applications of Association Rules


The term market basket analysis refers to a specific implementation of association rules


For better merchandising – products to include/exclude from inventory each month


Placement of products within related products


Association rules also used for


Recommender systems – Amazon, Netflix


Clickstream analysis from web usage log files


Website visitors to page X click on links A,B,C more than on links D,E,F


13


5.5 Example: Grocery Store Transactions 5.5.1 The Groceries Dataset


Packages -> Install -> arules, arulesViz # don’t enter next line


> install.packages(c("arules", "arulesViz")) # appears on console


> library('arules')


> library('arulesViz')


> data(Groceries)


> summary(Groceries) # indicates 9835 rows


Class of dataset Groceries is transactions, containing 3 slots


transactionInfo # data frame with vectors having length of transactions


itemInfo # data frame storing item labels


data # binary evidence matrix of labels in transactions


> Groceries@itemInfo[1:10,]


> apply(Groceries@data[,10:20],2,function(r) paste(Groceries@itemInfo[r,"labels"],collapse=", "))


14


5.5 Example: Grocery Store Transactions 5.5.2 Frequent Itemset Generation


To illustrate the Apriori algorithm, the code below does each iteration separately.


Assume minimum support threshold = 0.02 (0.02 * 9853 = 198 items), get 122 itemsets total


First, get itemsets of length 1


> itemsets<-apriori(Groceries,parameter=list(minlen=1,maxlen=1,support=0.02,target="frequent itemsets"))


> summary(itemsets) # found 59 itemsets


> inspect(head(sort(itemsets,by="support"),10)) # lists top 10


Second, get itemsets of length 2


> itemsets<-apriori(Groceries,parameter=list(minlen=2,maxlen=2,support=0.02,target="frequent itemsets"))


> summary(itemsets) # found 61 itemsets


> inspect(head(sort(itemsets,by="support"),10)) # lists top 10


Third, get itemsets of length 3


> itemsets<-apriori(Groceries,parameter=list(minlen=3,maxlen=3,support=0.02,target="frequent itemsets"))


> summary(itemsets) # found 2 itemsets


> inspect(head(sort(itemsets,by="support"),10)) # lists top 10


> summary(itemsets) # found 59 itemsets> inspect(head(sort(itemsets,by="support"),10)) # lists top 10 supported items


15


5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization


The Apriori algorithm will now generate rules.


Set minimum support threshold to 0.001 (allows more rules, presumably for the scatterplot) and minimum confidence threshold to 0.6 to generate 2,918 rules.


> rules <- apriori(Groceries,parameter=list(support=0.001,confidence=0.6,target="rules"))


> summary(rules) # finds 2918 rules


> plot(rules) # displays scatterplot


The scatterplot shows that the highest lift occurs at a low support and a low confidence.


16


5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization


> plot(rules)


17


5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization


Get scatterplot matrix to compare the support, confidence, and lift of the 2918 rules


> plot(rules@quality) # displays scatterplot matrix


Lift is proportional to confidence with several linear groupings.


Note that Lift = Confidence/Support(Y), so when support of Y remains the same, lift is proportional to confidence and the slope of the linear trend is the reciprocal of Support(Y).


18


5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization


> plot(rules)


19


5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization


Compute the 1/Support(Y) which is the slope


> slope<-sort(round(rules@quality$lift/rules@quality$confidence,2))


Display the number of times each slope appears in dataset


> unlist(lapply(split(slope,f=slope),length))


Display the top 10 rules sorted by lift


> inspect(head(sort(rules,by="lift"),10))


Rule {Instant food products, soda} -> {hamburger meat}


has the highest lift of 19 (page 154)


20


5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization


Find the rules with confidence above 0.9


> confidentRules<-rules[quality(rules)$confidence>0.9]


> confidentRules # set of 127 rules


Plot a matrix-based visualization of the LHS v RHS of rules


> plot(confidentRules,method="matrix",measure=c("lift","confidence"),control=list(reorder=TRUE))


The legend on the right is a color matrix indicating the lift and the confidence to which each square in the main matrix corresponds


21


5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization


> plot(rules)


22


5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization


Visualize the top 5 rules with the highest lift.


> highLiftRules<-head(sort(rules,by="lift"),5)


> plot(highLiftRules,method="graph",control=list(type="items"))


In the graph, the arrow always points from an item on the LHS to an item on the RHS.


For example, the arrows that connects ham, processed cheese, and white bread suggest the rule


{ham, processed cheese} -> {white bread}


Size of circle indicates support and shade represents lift


23


5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization


24


5.6 Validation and Testing


The frequent and high confidence itemsets are found by pre-specified minimum support and minimum confidence levels


Measures like lift and/or leverage then ensure that interesting rules are identified rather than coincidental ones


However, some of the remaining rules may be considered subjectively uninteresting because they don’t yield unexpected profitable actions


E.g., rules like {paper} -> {pencil} are not interesting/meaningful


Incorporating subjective knowledge requires domain experts


Good rules provide valuable insights for institutions to improve their business operations


25


5.7 Diagnostics


Although minimum support is pre-specified in phases 3&4, this level can be adjusted to target the range of the number of rules – variants/improvements of Apriori are available


For large datasets the Apriori algorithm can be computationally expensive – efficiency improvements


Partitioning


Sampling


Transaction reduction


Hash-based itemset counting


Dynamic itemset counting


26


Applied Sciences

Architecture and Design

Biology

Business & Finance

Chemistry

Computer Science

Geography

Geology

Education

Engineering

English

Environmental science

Spanish

Government

History

Human Resource Management

Information Systems

Law

Literature

Mathematics

Nursing

Physics

Political Science

Psychology

Reading

Science

Social Science

Home

Blog

Archive

Contact

google+twitterfacebook

Copyright © 2019 HomeworkMarket.com

Homework is Completed By:

Writer Writer Name Amount Client Comments & Rating
Instant Homework Helper

ONLINE

Instant Homework Helper

$36

She helped me in last minute in a very reasonable price. She is a lifesaver, I got A+ grade in my homework, I will surely hire her again for my next assignments, Thumbs Up!

Order & Get This Solution Within 3 Hours in $25/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 3 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

Order & Get This Solution Within 6 Hours in $20/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 6 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

Order & Get This Solution Within 12 Hours in $15/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 12 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

6 writers have sent their proposals to do this homework:

Top Essay Tutor
University Coursework Help
Homework Guru
Best Coursework Help
Helping Hand
Writer Writer Name Offer Chat
Top Essay Tutor

ONLINE

Top Essay Tutor

I have more than 12 years of experience in managing online classes, exams, and quizzes on different websites like; Connect, McGraw-Hill, and Blackboard. I always provide a guarantee to my clients for their grades.

$40 Chat With Writer
University Coursework Help

ONLINE

University Coursework Help

Hi dear, I am ready to do your homework in a reasonable price.

$37 Chat With Writer
Homework Guru

ONLINE

Homework Guru

Hi dear, I am ready to do your homework in a reasonable price and in a timely manner.

$37 Chat With Writer
Best Coursework Help

ONLINE

Best Coursework Help

I am an Academic writer with 10 years of experience. As an Academic writer, my aim is to generate unique content without Plagiarism as per the client’s requirements.

$35 Chat With Writer
Helping Hand

ONLINE

Helping Hand

I am an Academic writer with 10 years of experience. As an Academic writer, my aim is to generate unique content without Plagiarism as per the client’s requirements.

$35 Chat With Writer

Let our expert academic writers to help you in achieving a+ grades in your homework, assignment, quiz or exam.

Similar Homework Questions

Comparative essay outline example - Interpersonal conflict 10th edition pdf - Revising - One discussion and 2 replies - V for vendetta creedy - Siemens top+ program - Shakespeare sonnet 29 literary devices - Mylabsplus montclair state university - Discussion: Developing a Graphic Model - Sally macdonald fox 26 houston age - Do metals want to gain electrons - Conventions of a narrative - Weekly reflection - Paper - Uncle frank montana 1948 - Power Point-3 Team Management - Hbr working capital simulation for sunflower nutraceuticals - Managing differences the central challenge of global strategy pdf - Thesis Recommendation - A state trooper is hidden 30 feet - All nitrates are soluble except - 10 95 police code - Political factors of nike - The physics classroom forces - How to report shapiro wilk test apa - Firefighter ground ladder parts - The menstrual cycle depends on events within the female ovary - How many soldiers died on hacksaw ridge - Consultation card for massage - China physical features map - Owl pellet food web - Virtual lab punnett squares worksheet answer key - Ut eid password change - PSY 6-2 - WK 5 SOCW 6443 Assignment: Considering Alternative Treatment Options for Anxiety Disorders - Meijer one stop portal userphone - WEEK 8 Discussion 1 Pharmacology - Walnuts in shell woolworths - Was andrew jackson a great president - Limitations of enrolled nurse - A software development firm has witnessed - Who owns the zebra and who drinks water - Multisim logic converter - Dpb51 3 phase voltage relay - How do you figure square miles - Organizational information deficiency problems - Working in the yard math 1030 - Ford company vision and mission - Tina jones musculoskeletal subjective data - Example of social aggregate in sociology - Which form of salvation is emphasized in the new testament - Finnegans funeral home phillipsburg nj - How to solve konigsberg bridge problem - St swithuns east grinstead - Project- communication and teamwork - level 3 and level 4 - Research Paper - Research Paper - Corn syrup iga australia - Describe the structure of a monosaccharide - What are the needs and wants of ancient communities - Unit circle exact values - Utopia vs dystopia powerpoint - Relaxing music with water sounds meditation deep sleep music 429 - Lippitts change theory model - Accounting information system notes - Three Pillars and Future Impact - Desmond dawes medal of honor - ENG 1252 MOD 4 FINAL - An activity that has more than one dependency arrow flowing into it is termed a(n) - Michael caine acting in film book pdf - Work Sheet Connector "Amal Unbound" by Aisha Saeed - Excel chapter 2 capstone appliances - Case study for ms excel - American society of public administration code of ethics - Second class lever diagram - Schedule of cost of goods manufactured - Australia freedom rides moree - Poem about millennials - Sed regex cheat sheet - Injustice in montana 1948 - Discussion Week 3: PKI and Cryptographic Applications - Accounting Discussion Questions - Myitlab excel chapter 3 grader project homework - Business area - Australian national safety and quality health service standards 2012 - Preparation of 0.1 m sodium thiosulphate - Village roadshow limited abn - Tony benn arguments for socialism pdf - Who moved my cheese book review ppt - 4500 steps in miles - Radstock community centre earley - **NEED ASAP***Paper -- Fairy Tales and Gender Roles **NEED ASAP*** - Discussion2 - Crystal structure ppt presentation - Dusting by julia alvarez analysis - Industrial power & control - Judicial Review - P6#1 - Challenger guaranteed income plan pds - Wk 5 - Multiple Governments and Intergovernmental Relationships Paper [due Mon]