Loading...

Messages

Proposals

Stuck in your homework and missing deadline? Get urgent help in $10/Page with 24 hours deadline

Get Urgent Writing Help In Your Essays, Assignments, Homeworks, Dissertation, Thesis Or Coursework & Achieve A+ Grades.

Privacy Guaranteed - 100% Plagiarism Free Writing - Free Turnitin Report - Professional And Experienced Writers - 24/7 Online Support

Rule generation in apriori algorithm example

27/10/2021 Client: muhammad11 Deadline: 2 Day

Apriori Property,Interesting Ruse Of A Categorical Variable ,Rules Distinguished From Coincidental Rules,

1. What is the Apriori property?

2. Following is a list of five transactions that include items A, B, C, and D:

• Tl: {A, B, C}

• T2: {A, B}

• T3: {B}

• T4: {A, C}

• TS: {A, C, D}

Which itemsets satisfy the minimum support of 0.5?

(Hint: An item set may include more than one item.)

3. How are interesting rules distinguished from coincidental rules?

4. A local retailer has a database that stores 10,000 transactions of last summer. After analyzing the data, a data science team has identified the following statistics:

• {battery} appears in 4,000 transactions.

• {sunscreen} appears in 3,000 transactions.

• {sandals} appears in 4,000 transactions.

• {bowls} appears in 1,000 transactions.

• {battery, sunscreen} appears in 1,500 transactions.

• {battery, sandals} appears in 1,000 transactions.

• {battery, bowls} appears in 1250 transactions.

• {battery, sunscreen, sandals} appears in 600 transactions.

Answer the following questions:

a. What are the support values of the preceding itemsets?

b. Assuming the minimum support is 0.05, which itemsets are considered frequent?

c. What are the confidence values of {battery}->{ sunscreen} and {battery, sunscreen}->{ sandals} ?

d. Which of the two rules is more interesting?

5. In the use of a categorical variable with n possible values, explain the following:

a. Why only n - 1 binary variables are necessary

b. Why using n variables would be problematic

6. If the probability of an event occurring is 0.4, then

a. What is the odds ratio?

b. What is the log odds ratio?

Data Science and Big Data Analytics

Chapter 5: Advanced Analytical Theory and Methods: Association Rules

1

Chapter Sections

5.1 Overview

5.2 Apriori Algorithm

5.3 Evaluation of Candidate Rules

5.4 Example: Transactions in a Grocery Store

5.5 Validation and Testing

5.6 Diagnostics

2

5.1 Overview

Association rules method

Unsupervised learning method

Descriptive (not predictive) method

Used to find hidden relationships in data

The relationships are represented as rules

Questions association rules might answer

Which products tend to be purchased together

What products do similar customers tend to buy

3

5.1 Overview

Example – general logic of association rules

4

5.1 Overview

Rules have the form X -> Y

When X is observed, Y is also observed

Itemset

Collection of items or entities

k-itemset = {item 1, item 2,…,item k}

Examples

Items purchased in one transaction

Set of hyperlinks clicked by a user in one session

5

5.1 Overview – Apriori Algorithm

Apriori is the most fundamental algorithm

Given itemset L, support of L is the percent of transactions that contain L

Frequent itemset – items appear together “often enough”

Minimum support defines “often enough” (% transactions)

If an itemset is frequent, then any subset is frequent

6

5.1 Overview – Apriori Algorithm

If {B,C,D} frequent, then all subsets frequent

7

5.2 Apriori Algorithm Frequent = minimum support

Bottom-up iterative algorithm

Identify the frequent (min support) 1-itemsets

Frequent 1-itemsets are paired into 2-itemsets, and the frequent 2-itemsets are identified, etc.

Definitions for next slide

D = transaction database

d = minimum support threshold

N = maximum length of itemset (optional parameter)

Ck = set of candidate k-itemsets

Lk = set of k-itemsets with minimum support

8

5.2 Apriori Algorithm

9

5.3 Evaluation of Candidate Rules Confidence

Frequent itemsets can form candidate rules

Confidence measures the certainty of a rule

Minimum confidence – predefined threshold

Problem with confidence

Given a rule X->Y, confidence considers only the antecedent (X) and the co-occurrence of X and Y

Cannot tell if a rule contains true implication

10

5.3 Evaluation of Candidate Rules Lift

Lift measures how much more often X and Y occur together than expected if statistically independent

Lift = 1 if X and Y are statistically independent

Lift > 1 indicates the degree of usefulness of the rule

Example – in 1000 transactions,

If {milk, eggs} appears in 300, {milk} in 500, and {eggs} in 400, then Lift(milk->eggs) = 0.3/(0.5*0.4) = 1.5

If {milk, bread} appears in 400, {milk} in 500, and {bread} in 400, then Lift(milk->bread) = 0.4/(0.5*0.4) = 2.0

11

5.3 Evaluation of Candidate Rules Leverage

Leverage measures the difference in the probability of X and Y appearing together compared to statistical independence

Leverage = 0 if X and Y are statistically independent

Leverage > 0 indicates degree of usefulness of rule

Example – in 1000 transactions,

If {milk, eggs} appears in 300, {milk} in 500, and {eggs} in 400, then Leverage(milk->eggs) = 0.3 - 0.5*0.4 = 0.1

If {milk, bread} appears in 400, {milk} in 500, and {bread} in 400, then Leverage (milk->bread) = 0.4 - 0.5*0.4 = 0.2

12

5.4 Applications of Association Rules

The term market basket analysis refers to a specific implementation of association rules

For better merchandising – products to include/exclude from inventory each month

Placement of products within related products

Association rules also used for

Recommender systems – Amazon, Netflix

Clickstream analysis from web usage log files

Website visitors to page X click on links A,B,C more than on links D,E,F

13

5.5 Example: Grocery Store Transactions 5.5.1 The Groceries Dataset

Packages -> Install -> arules, arulesViz # don’t enter next line

> install.packages(c("arules", "arulesViz")) # appears on console

> library('arules')

> library('arulesViz')

> data(Groceries)

> summary(Groceries) # indicates 9835 rows

Class of dataset Groceries is transactions, containing 3 slots

transactionInfo # data frame with vectors having length of transactions

itemInfo # data frame storing item labels

data # binary evidence matrix of labels in transactions

> Groceries@itemInfo[1:10,]

> apply(Groceries@data[,10:20],2,function(r) paste(Groceries@itemInfo[r,"labels"],collapse=", "))

14

5.5 Example: Grocery Store Transactions 5.5.2 Frequent Itemset Generation

To illustrate the Apriori algorithm, the code below does each iteration separately.

Assume minimum support threshold = 0.02 (0.02 * 9853 = 198 items), get 122 itemsets total

First, get itemsets of length 1

> itemsets<-apriori(Groceries,parameter=list(minlen=1,maxlen=1,support=0.02,target="frequent itemsets"))

> summary(itemsets) # found 59 itemsets

> inspect(head(sort(itemsets,by="support"),10)) # lists top 10

Second, get itemsets of length 2

> itemsets<-apriori(Groceries,parameter=list(minlen=2,maxlen=2,support=0.02,target="frequent itemsets"))

> summary(itemsets) # found 61 itemsets

> inspect(head(sort(itemsets,by="support"),10)) # lists top 10

Third, get itemsets of length 3

> itemsets<-apriori(Groceries,parameter=list(minlen=3,maxlen=3,support=0.02,target="frequent itemsets"))

> summary(itemsets) # found 2 itemsets

> inspect(head(sort(itemsets,by="support"),10)) # lists top 10

> summary(itemsets) # found 59 itemsets> inspect(head(sort(itemsets,by="support"),10)) # lists top 10 supported items

15

5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization

The Apriori algorithm will now generate rules.

Set minimum support threshold to 0.001 (allows more rules, presumably for the scatterplot) and minimum confidence threshold to 0.6 to generate 2,918 rules.

> rules <- apriori(Groceries,parameter=list(support=0.001,confidence=0.6,target="rules"))

> summary(rules) # finds 2918 rules

> plot(rules) # displays scatterplot

The scatterplot shows that the highest lift occurs at a low support and a low confidence.

16

5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization

> plot(rules)

17

5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization

Get scatterplot matrix to compare the support, confidence, and lift of the 2918 rules

> plot(rules@quality) # displays scatterplot matrix

Lift is proportional to confidence with several linear groupings.

Note that Lift = Confidence/Support(Y), so when support of Y remains the same, lift is proportional to confidence and the slope of the linear trend is the reciprocal of Support(Y).

18

5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization

> plot(rules)

19

5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization

Compute the 1/Support(Y) which is the slope

> slope<-sort(round(rules@quality$lift/rules@quality$confidence,2))

Display the number of times each slope appears in dataset

> unlist(lapply(split(slope,f=slope),length))

Display the top 10 rules sorted by lift

> inspect(head(sort(rules,by="lift"),10))

Rule {Instant food products, soda} -> {hamburger meat}

has the highest lift of 19 (page 154)

20

5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization

Find the rules with confidence above 0.9

> confidentRules<-rules[quality(rules)$confidence>0.9]

> confidentRules # set of 127 rules

Plot a matrix-based visualization of the LHS v RHS of rules

> plot(confidentRules,method="matrix",measure=c("lift","confidence"),control=list(reorder=TRUE))

The legend on the right is a color matrix indicating the lift and the confidence to which each square in the main matrix corresponds

21

5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization

> plot(rules)

22

5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization

Visualize the top 5 rules with the highest lift.

> highLiftRules<-head(sort(rules,by="lift"),5)

> plot(highLiftRules,method="graph",control=list(type="items"))

In the graph, the arrow always points from an item on the LHS to an item on the RHS.

For example, the arrows that connects ham, processed cheese, and white bread suggest the rule

{ham, processed cheese} -> {white bread}

Size of circle indicates support and shade represents lift

23

5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization

24

5.6 Validation and Testing

The frequent and high confidence itemsets are found by pre-specified minimum support and minimum confidence levels

Measures like lift and/or leverage then ensure that interesting rules are identified rather than coincidental ones

However, some of the remaining rules may be considered subjectively uninteresting because they don’t yield unexpected profitable actions

E.g., rules like {paper} -> {pencil} are not interesting/meaningful

Incorporating subjective knowledge requires domain experts

Good rules provide valuable insights for institutions to improve their business operations

25

5.7 Diagnostics

Although minimum support is pre-specified in phases 3&4, this level can be adjusted to target the range of the number of rules – variants/improvements of Apriori are available

For large datasets the Apriori algorithm can be computationally expensive – efficiency improvements

Partitioning

Sampling

Transaction reduction

Hash-based itemset counting

Dynamic itemset counting

26

Homework is Completed By:

Writer Writer Name Amount Client Comments & Rating
Instant Homework Helper

ONLINE

Instant Homework Helper

$36

She helped me in last minute in a very reasonable price. She is a lifesaver, I got A+ grade in my homework, I will surely hire her again for my next assignments, Thumbs Up!

Order & Get This Solution Within 3 Hours in $25/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 3 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

Order & Get This Solution Within 6 Hours in $20/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 6 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

Order & Get This Solution Within 12 Hours in $15/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 12 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

6 writers have sent their proposals to do this homework:

Assignment Hub
A+GRADE HELPER
Fatimah Syeda
Top Grade Tutor
High Quality Assignments
Pro Writer
Writer Writer Name Offer Chat
Assignment Hub

ONLINE

Assignment Hub

I have read your project details and I can provide you QUALITY WORK within your given timeline and budget.

$49 Chat With Writer
A+GRADE HELPER

ONLINE

A+GRADE HELPER

I am a PhD writer with 10 years of experience. I will be delivering high-quality, plagiarism-free work to you in the minimum amount of time. Waiting for your message.

$47 Chat With Writer
Fatimah Syeda

ONLINE

Fatimah Syeda

Being a Ph.D. in the Business field, I have been doing academic writing for the past 7 years and have a good command over writing research papers, essay, dissertations and all kinds of academic writing and proofreading.

$44 Chat With Writer
Top Grade Tutor

ONLINE

Top Grade Tutor

I will be delighted to work on your project. As an experienced writer, I can provide you top quality, well researched, concise and error-free work within your provided deadline at very reasonable prices.

$25 Chat With Writer
High Quality Assignments

ONLINE

High Quality Assignments

I am an experienced researcher here with master education. After reading your posting, I feel, you need an expert research writer to complete your project.Thank You

$38 Chat With Writer
Pro Writer

ONLINE

Pro Writer

I will provide you with the well organized and well research papers from different primary and secondary sources will write the content that will support your points.

$41 Chat With Writer

Let our expert academic writers to help you in achieving a+ grades in your homework, assignment, quiz or exam.

Similar Homework Questions

Teaching measurement nsw det - Client consultation card for spa - Biological or physical - Financial accounting - The evolution of strategy at procter & gamble case study - Research Paper- infotech in global enconomy - Changing the subject of a formula - Calypso in the odyssey - CFO Report - Art Question Answers - There will come soft rains poem - Deer park football club - Mr twit description extract - Finding ksp from molar solubility - How to add spanish accents on powerpoint 2016 - Razor over comb definition - Easy salvador dali paintings - Bike shop raynes park - Effigy definition lord of the flies - En 46111 crankshaft rotation socket - Bahauddin zakariya university lahore - Cook hall howard university - A straight fin fabricated from aluminum alloy - Theme of remember the titans - As3000 current carrying capacity - Stoichiometry lesson shari kendrick answers - Essay - Elements of traditional literature - What are the five major supply chain drivers - Oxidation number of chlorine in cl2 - RESEARCH PAPER - Paper and PowerPoint Presentation: The Black Perspective on Addictions and Mental Health - Geometry cleanup in hypermesh - Boogie with stu mandolin tab - Another brooklyn discussion questions - Hands off doctrine - Cost reduction pressures can be particularly intense in industries producing - What is atom economy - Na2co3 hcl ionic equation - Week 3 Discussion Creativity and Innovation - Advocacy through legislation - Autodesk sketchbook transparent layer - WEEK 6 Discussion Prompt 2 Pharmacology - Conjugate root theorem worksheet - Apple cider vinegar enema for hemorrhoids - Airbus a380 landing gear test - Arabic speakers learning english problems - Graduation gear up ou - City of moreland grants - Write a class named employee that has the following fields - Evidence based interventions psychology - Beach in a simple sentence - After a death roo borson analysis - Wheelen and hunger strategic management pdf - Barack obama leadership style ppt - The poem if by rudyard kipling pdf - Howard publishing company songs of faith and praise - Dwg trueview + design review - Why does zeus decline to save hector - Assisting a person with active and passive movement - London met evision attendance - Redox virtual lab - Slope of distance vs time graph - music performance report - 7 rarere road hauraki - Black salve breast cancer - Essay - Stability strategy in strategic management - Tim burton style analysis - Correct Error give me before 2pm on 10/23/2020 - What are the components of a relational database - Chemistry study design 2020 - Math - Financial planning - Airbus h135 operating cost - North rocks child care centre - Eye clinic cheltenham hospital - Wipe aide toilet paper moistener - Primary Source Assignment - 20 maple street london - Why do organizations still have information deficiency problems - Scrip-for-scrip capital gains tax rollover relief - Shrugs bracing upper body against an incline bench - Why did the motte and bailey castles change - Cloud computing security and integrity - Policy, Legal - Yarra valley water sewerage system charge - Fitt principle ppt - Collective bargaining analysis - Nhs institute for innovation and improvement the productive operating theatre - Testout 14.1 build a computer from scratch - Pfizer management structure - G-POWER analysis - Abn 19 619 574 186 - Distinctly you trading comparison and competition for freedom and fulfillment - 5 page paper - 1 samuel 2 12 17 - RN Adult Medical Surgical Online Practice 2019(90 Questions) - Specification for highway works - Finder com au add