Loading...

Messages

Proposals

Stuck in your homework and missing deadline? Get urgent help in $10/Page with 24 hours deadline

Get Urgent Writing Help In Your Essays, Assignments, Homeworks, Dissertation, Thesis Or Coursework & Achieve A+ Grades.

Privacy Guaranteed - 100% Plagiarism Free Writing - Free Turnitin Report - Professional And Experienced Writers - 24/7 Online Support

Rule generation in apriori algorithm example

27/10/2021 Client: muhammad11 Deadline: 2 Day

Apriori Property,Interesting Ruse Of A Categorical Variable ,Rules Distinguished From Coincidental Rules,

1. What is the Apriori property?

2. Following is a list of five transactions that include items A, B, C, and D:

• Tl: {A, B, C}

• T2: {A, B}

• T3: {B}

• T4: {A, C}

• TS: {A, C, D}

Which itemsets satisfy the minimum support of 0.5?

(Hint: An item set may include more than one item.)

3. How are interesting rules distinguished from coincidental rules?

4. A local retailer has a database that stores 10,000 transactions of last summer. After analyzing the data, a data science team has identified the following statistics:

• {battery} appears in 4,000 transactions.

• {sunscreen} appears in 3,000 transactions.

• {sandals} appears in 4,000 transactions.

• {bowls} appears in 1,000 transactions.

• {battery, sunscreen} appears in 1,500 transactions.

• {battery, sandals} appears in 1,000 transactions.

• {battery, bowls} appears in 1250 transactions.

• {battery, sunscreen, sandals} appears in 600 transactions.

Answer the following questions:

a. What are the support values of the preceding itemsets?

b. Assuming the minimum support is 0.05, which itemsets are considered frequent?

c. What are the confidence values of {battery}->{ sunscreen} and {battery, sunscreen}->{ sandals} ?

d. Which of the two rules is more interesting?

5. In the use of a categorical variable with n possible values, explain the following:

a. Why only n - 1 binary variables are necessary

b. Why using n variables would be problematic

6. If the probability of an event occurring is 0.4, then

a. What is the odds ratio?

b. What is the log odds ratio?

Data Science and Big Data Analytics

Chapter 5: Advanced Analytical Theory and Methods: Association Rules

1

Chapter Sections

5.1 Overview

5.2 Apriori Algorithm

5.3 Evaluation of Candidate Rules

5.4 Example: Transactions in a Grocery Store

5.5 Validation and Testing

5.6 Diagnostics

2

5.1 Overview

Association rules method

Unsupervised learning method

Descriptive (not predictive) method

Used to find hidden relationships in data

The relationships are represented as rules

Questions association rules might answer

Which products tend to be purchased together

What products do similar customers tend to buy

3

5.1 Overview

Example – general logic of association rules

4

5.1 Overview

Rules have the form X -> Y

When X is observed, Y is also observed

Itemset

Collection of items or entities

k-itemset = {item 1, item 2,…,item k}

Examples

Items purchased in one transaction

Set of hyperlinks clicked by a user in one session

5

5.1 Overview – Apriori Algorithm

Apriori is the most fundamental algorithm

Given itemset L, support of L is the percent of transactions that contain L

Frequent itemset – items appear together “often enough”

Minimum support defines “often enough” (% transactions)

If an itemset is frequent, then any subset is frequent

6

5.1 Overview – Apriori Algorithm

If {B,C,D} frequent, then all subsets frequent

7

5.2 Apriori Algorithm Frequent = minimum support

Bottom-up iterative algorithm

Identify the frequent (min support) 1-itemsets

Frequent 1-itemsets are paired into 2-itemsets, and the frequent 2-itemsets are identified, etc.

Definitions for next slide

D = transaction database

d = minimum support threshold

N = maximum length of itemset (optional parameter)

Ck = set of candidate k-itemsets

Lk = set of k-itemsets with minimum support

8

5.2 Apriori Algorithm

9

5.3 Evaluation of Candidate Rules Confidence

Frequent itemsets can form candidate rules

Confidence measures the certainty of a rule

Minimum confidence – predefined threshold

Problem with confidence

Given a rule X->Y, confidence considers only the antecedent (X) and the co-occurrence of X and Y

Cannot tell if a rule contains true implication

10

5.3 Evaluation of Candidate Rules Lift

Lift measures how much more often X and Y occur together than expected if statistically independent

Lift = 1 if X and Y are statistically independent

Lift > 1 indicates the degree of usefulness of the rule

Example – in 1000 transactions,

If {milk, eggs} appears in 300, {milk} in 500, and {eggs} in 400, then Lift(milk->eggs) = 0.3/(0.5*0.4) = 1.5

If {milk, bread} appears in 400, {milk} in 500, and {bread} in 400, then Lift(milk->bread) = 0.4/(0.5*0.4) = 2.0

11

5.3 Evaluation of Candidate Rules Leverage

Leverage measures the difference in the probability of X and Y appearing together compared to statistical independence

Leverage = 0 if X and Y are statistically independent

Leverage > 0 indicates degree of usefulness of rule

Example – in 1000 transactions,

If {milk, eggs} appears in 300, {milk} in 500, and {eggs} in 400, then Leverage(milk->eggs) = 0.3 - 0.5*0.4 = 0.1

If {milk, bread} appears in 400, {milk} in 500, and {bread} in 400, then Leverage (milk->bread) = 0.4 - 0.5*0.4 = 0.2

12

5.4 Applications of Association Rules

The term market basket analysis refers to a specific implementation of association rules

For better merchandising – products to include/exclude from inventory each month

Placement of products within related products

Association rules also used for

Recommender systems – Amazon, Netflix

Clickstream analysis from web usage log files

Website visitors to page X click on links A,B,C more than on links D,E,F

13

5.5 Example: Grocery Store Transactions 5.5.1 The Groceries Dataset

Packages -> Install -> arules, arulesViz # don’t enter next line

> install.packages(c("arules", "arulesViz")) # appears on console

> library('arules')

> library('arulesViz')

> data(Groceries)

> summary(Groceries) # indicates 9835 rows

Class of dataset Groceries is transactions, containing 3 slots

transactionInfo # data frame with vectors having length of transactions

itemInfo # data frame storing item labels

data # binary evidence matrix of labels in transactions

> Groceries@itemInfo[1:10,]

> apply(Groceries@data[,10:20],2,function(r) paste(Groceries@itemInfo[r,"labels"],collapse=", "))

14

5.5 Example: Grocery Store Transactions 5.5.2 Frequent Itemset Generation

To illustrate the Apriori algorithm, the code below does each iteration separately.

Assume minimum support threshold = 0.02 (0.02 * 9853 = 198 items), get 122 itemsets total

First, get itemsets of length 1

> itemsets<-apriori(Groceries,parameter=list(minlen=1,maxlen=1,support=0.02,target="frequent itemsets"))

> summary(itemsets) # found 59 itemsets

> inspect(head(sort(itemsets,by="support"),10)) # lists top 10

Second, get itemsets of length 2

> itemsets<-apriori(Groceries,parameter=list(minlen=2,maxlen=2,support=0.02,target="frequent itemsets"))

> summary(itemsets) # found 61 itemsets

> inspect(head(sort(itemsets,by="support"),10)) # lists top 10

Third, get itemsets of length 3

> itemsets<-apriori(Groceries,parameter=list(minlen=3,maxlen=3,support=0.02,target="frequent itemsets"))

> summary(itemsets) # found 2 itemsets

> inspect(head(sort(itemsets,by="support"),10)) # lists top 10

> summary(itemsets) # found 59 itemsets> inspect(head(sort(itemsets,by="support"),10)) # lists top 10 supported items

15

5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization

The Apriori algorithm will now generate rules.

Set minimum support threshold to 0.001 (allows more rules, presumably for the scatterplot) and minimum confidence threshold to 0.6 to generate 2,918 rules.

> rules <- apriori(Groceries,parameter=list(support=0.001,confidence=0.6,target="rules"))

> summary(rules) # finds 2918 rules

> plot(rules) # displays scatterplot

The scatterplot shows that the highest lift occurs at a low support and a low confidence.

16

5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization

> plot(rules)

17

5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization

Get scatterplot matrix to compare the support, confidence, and lift of the 2918 rules

> plot(rules@quality) # displays scatterplot matrix

Lift is proportional to confidence with several linear groupings.

Note that Lift = Confidence/Support(Y), so when support of Y remains the same, lift is proportional to confidence and the slope of the linear trend is the reciprocal of Support(Y).

18

5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization

> plot(rules)

19

5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization

Compute the 1/Support(Y) which is the slope

> slope<-sort(round(rules@quality$lift/rules@quality$confidence,2))

Display the number of times each slope appears in dataset

> unlist(lapply(split(slope,f=slope),length))

Display the top 10 rules sorted by lift

> inspect(head(sort(rules,by="lift"),10))

Rule {Instant food products, soda} -> {hamburger meat}

has the highest lift of 19 (page 154)

20

5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization

Find the rules with confidence above 0.9

> confidentRules<-rules[quality(rules)$confidence>0.9]

> confidentRules # set of 127 rules

Plot a matrix-based visualization of the LHS v RHS of rules

> plot(confidentRules,method="matrix",measure=c("lift","confidence"),control=list(reorder=TRUE))

The legend on the right is a color matrix indicating the lift and the confidence to which each square in the main matrix corresponds

21

5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization

> plot(rules)

22

5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization

Visualize the top 5 rules with the highest lift.

> highLiftRules<-head(sort(rules,by="lift"),5)

> plot(highLiftRules,method="graph",control=list(type="items"))

In the graph, the arrow always points from an item on the LHS to an item on the RHS.

For example, the arrows that connects ham, processed cheese, and white bread suggest the rule

{ham, processed cheese} -> {white bread}

Size of circle indicates support and shade represents lift

23

5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization

24

5.6 Validation and Testing

The frequent and high confidence itemsets are found by pre-specified minimum support and minimum confidence levels

Measures like lift and/or leverage then ensure that interesting rules are identified rather than coincidental ones

However, some of the remaining rules may be considered subjectively uninteresting because they don’t yield unexpected profitable actions

E.g., rules like {paper} -> {pencil} are not interesting/meaningful

Incorporating subjective knowledge requires domain experts

Good rules provide valuable insights for institutions to improve their business operations

25

5.7 Diagnostics

Although minimum support is pre-specified in phases 3&4, this level can be adjusted to target the range of the number of rules – variants/improvements of Apriori are available

For large datasets the Apriori algorithm can be computationally expensive – efficiency improvements

Partitioning

Sampling

Transaction reduction

Hash-based itemset counting

Dynamic itemset counting

26

Homework is Completed By:

Writer Writer Name Amount Client Comments & Rating
Instant Homework Helper

ONLINE

Instant Homework Helper

$36

She helped me in last minute in a very reasonable price. She is a lifesaver, I got A+ grade in my homework, I will surely hire her again for my next assignments, Thumbs Up!

Order & Get This Solution Within 3 Hours in $25/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 3 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

Order & Get This Solution Within 6 Hours in $20/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 6 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

Order & Get This Solution Within 12 Hours in $15/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 12 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

6 writers have sent their proposals to do this homework:

Assignment Hub
A+GRADE HELPER
Fatimah Syeda
Top Grade Tutor
High Quality Assignments
Pro Writer
Writer Writer Name Offer Chat
Assignment Hub

ONLINE

Assignment Hub

I have read your project details and I can provide you QUALITY WORK within your given timeline and budget.

$49 Chat With Writer
A+GRADE HELPER

ONLINE

A+GRADE HELPER

I am a PhD writer with 10 years of experience. I will be delivering high-quality, plagiarism-free work to you in the minimum amount of time. Waiting for your message.

$47 Chat With Writer
Fatimah Syeda

ONLINE

Fatimah Syeda

Being a Ph.D. in the Business field, I have been doing academic writing for the past 7 years and have a good command over writing research papers, essay, dissertations and all kinds of academic writing and proofreading.

$44 Chat With Writer
Top Grade Tutor

ONLINE

Top Grade Tutor

I will be delighted to work on your project. As an experienced writer, I can provide you top quality, well researched, concise and error-free work within your provided deadline at very reasonable prices.

$25 Chat With Writer
High Quality Assignments

ONLINE

High Quality Assignments

I am an experienced researcher here with master education. After reading your posting, I feel, you need an expert research writer to complete your project.Thank You

$38 Chat With Writer
Pro Writer

ONLINE

Pro Writer

I will provide you with the well organized and well research papers from different primary and secondary sources will write the content that will support your points.

$41 Chat With Writer

Let our expert academic writers to help you in achieving a+ grades in your homework, assignment, quiz or exam.

Similar Homework Questions

Media ethics issues and cases 9th edition - Healthcare Ethics - When i was a young warthog song - Prefatory parts of a report - Atrium asset management software - Research paper - PSC1515 Miami Florida is considered ground zero for climate change, in particular rising seas will not only drown coastal sections of the city but will disrupt our local supply of drinking water. - Full block style application letter - Section 47 enquiry flowchart - Practical applications in sports nutrition 5th edition pdf - Jensen jarrah furniture australia - Vocational development - Migi-hidari artist of the floating world - What is the symbol for giga - Schizophrenia safety plan - Internal factor evaluation matrix template - 2620 kj in calories - Impact of television on students essay - Actron air reset controller - DISCUSSION POST 5 - 11th grade chemistry worksheets - Greek numerical prefixes in chemistry - Periodic table color coded metals nonmetals metalloids - Christianity presentation - Telecommunications and Networking Essay - Vlsm exercises with answers - What is difference threshold in psychology - How business performance can be evaluated cipd - American red cross financial report - Penner medical products case study solution - Www statdisk org - PSY4500/Week 4 - Assignment: Analyze Experiments - Self care eating healthy and maintaining a healthy weight ati - Medical referral letter template - Juran's quality planning roadmap - Westwood publishing case study answer - Saint jerome in his study van eyck - New mathematics could neutralize pathogens that resist antibiotics sat answers - The heat of solution of nh4cl - Bjt internal capacitance and high frequency model - Three classes of levers - The girl who can by ama ata aidoo answers - Don't have employment separation certificate - Deliverable 4 - Electricity, Magnetism, and Light Compare/Contrast Paper - Fair day's pay for a fair day's work taylor - Brave new world exposition - Prizm is a segmentation scheme - Tintern abbey poem shmoop - Unintended acceleration toyota's recall crisis - Logger pro mac free download - Is 0 odd or even - Costco membership application form - Cubic foot method real estate - Penn foster writing skills part 2 answers - What is a numeric pattern - Polypipe thermostat installation instructions - EDU651 Assignment 1 week 5 social networking a letter of request - Writing Assignment - Which bread molds the fastest conclusion - Cinematography - Nabertherm more than heat 30 3000 c manual - Blim vs netflix 2018 - Http eo ucar edu webweather hurricane2 html - Starbucks barista espresso machine manual pdf - Gaps in nursing practice - Facilities policies and procedures - Freeman tilden interpreting our heritage - Hp z230 sff workstation specs - What is matlab coder - Klaus voormann bass gear - Is weetabix good for weight loss - Nurs495 prompt w6 - What are the disadvantages of buying a franchise comparing to setting up a new business ? - Assignment(Digital forensics) - Volts per hertz generator protection - Need tomorrow - Examples of random error - African music and dance unimelb - He's lying to you sis - 145 mm to inches - Institute of certified public accountants of singapore - Assignment on marketing plan for a new product ppt - Experiment 1: effects of ph on radish seed germination - Ucf vpn cisco - Uc riverside average gpa - Read the article and answer the questions - Instantaneous rate of reaction chemistry - The entire busyteacher library pdf free download - A mis padres les gusta ir a yucatán porque - Nur495journal w1 - Flannery o connor revelation text - Name and describe the major capabilities of database management systems - The role of capitalism - Social gains in crisis communications - Muhammad ali vietnam war primary sources - Stonewall kitchen balsamic fig dressing - Cavendish estate agents mold - Bio rad ih 1000 - Dollars to mexican pesos 2012 - Discussion Board Questions