Loading...

Messages

Proposals

Stuck in your homework and missing deadline? Get urgent help in $10/Page with 24 hours deadline

Get Urgent Writing Help In Your Essays, Assignments, Homeworks, Dissertation, Thesis Or Coursework & Achieve A+ Grades.

Privacy Guaranteed - 100% Plagiarism Free Writing - Free Turnitin Report - Professional And Experienced Writers - 24/7 Online Support

Rule generation in apriori algorithm example

27/10/2021 Client: muhammad11 Deadline: 2 Day

Apriori Property,Interesting Ruse Of A Categorical Variable ,Rules Distinguished From Coincidental Rules,

1. What is the Apriori property?

2. Following is a list of five transactions that include items A, B, C, and D:

• Tl: {A, B, C}

• T2: {A, B}

• T3: {B}

• T4: {A, C}

• TS: {A, C, D}

Which itemsets satisfy the minimum support of 0.5?

(Hint: An item set may include more than one item.)

3. How are interesting rules distinguished from coincidental rules?

4. A local retailer has a database that stores 10,000 transactions of last summer. After analyzing the data, a data science team has identified the following statistics:

• {battery} appears in 4,000 transactions.

• {sunscreen} appears in 3,000 transactions.

• {sandals} appears in 4,000 transactions.

• {bowls} appears in 1,000 transactions.

• {battery, sunscreen} appears in 1,500 transactions.

• {battery, sandals} appears in 1,000 transactions.

• {battery, bowls} appears in 1250 transactions.

• {battery, sunscreen, sandals} appears in 600 transactions.

Answer the following questions:

a. What are the support values of the preceding itemsets?

b. Assuming the minimum support is 0.05, which itemsets are considered frequent?

c. What are the confidence values of {battery}->{ sunscreen} and {battery, sunscreen}->{ sandals} ?

d. Which of the two rules is more interesting?

5. In the use of a categorical variable with n possible values, explain the following:

a. Why only n - 1 binary variables are necessary

b. Why using n variables would be problematic

6. If the probability of an event occurring is 0.4, then

a. What is the odds ratio?

b. What is the log odds ratio?

Data Science and Big Data Analytics

Chapter 5: Advanced Analytical Theory and Methods: Association Rules

1

Chapter Sections

5.1 Overview

5.2 Apriori Algorithm

5.3 Evaluation of Candidate Rules

5.4 Example: Transactions in a Grocery Store

5.5 Validation and Testing

5.6 Diagnostics

2

5.1 Overview

Association rules method

Unsupervised learning method

Descriptive (not predictive) method

Used to find hidden relationships in data

The relationships are represented as rules

Questions association rules might answer

Which products tend to be purchased together

What products do similar customers tend to buy

3

5.1 Overview

Example – general logic of association rules

4

5.1 Overview

Rules have the form X -> Y

When X is observed, Y is also observed

Itemset

Collection of items or entities

k-itemset = {item 1, item 2,…,item k}

Examples

Items purchased in one transaction

Set of hyperlinks clicked by a user in one session

5

5.1 Overview – Apriori Algorithm

Apriori is the most fundamental algorithm

Given itemset L, support of L is the percent of transactions that contain L

Frequent itemset – items appear together “often enough”

Minimum support defines “often enough” (% transactions)

If an itemset is frequent, then any subset is frequent

6

5.1 Overview – Apriori Algorithm

If {B,C,D} frequent, then all subsets frequent

7

5.2 Apriori Algorithm Frequent = minimum support

Bottom-up iterative algorithm

Identify the frequent (min support) 1-itemsets

Frequent 1-itemsets are paired into 2-itemsets, and the frequent 2-itemsets are identified, etc.

Definitions for next slide

D = transaction database

d = minimum support threshold

N = maximum length of itemset (optional parameter)

Ck = set of candidate k-itemsets

Lk = set of k-itemsets with minimum support

8

5.2 Apriori Algorithm

9

5.3 Evaluation of Candidate Rules Confidence

Frequent itemsets can form candidate rules

Confidence measures the certainty of a rule

Minimum confidence – predefined threshold

Problem with confidence

Given a rule X->Y, confidence considers only the antecedent (X) and the co-occurrence of X and Y

Cannot tell if a rule contains true implication

10

5.3 Evaluation of Candidate Rules Lift

Lift measures how much more often X and Y occur together than expected if statistically independent

Lift = 1 if X and Y are statistically independent

Lift > 1 indicates the degree of usefulness of the rule

Example – in 1000 transactions,

If {milk, eggs} appears in 300, {milk} in 500, and {eggs} in 400, then Lift(milk->eggs) = 0.3/(0.5*0.4) = 1.5

If {milk, bread} appears in 400, {milk} in 500, and {bread} in 400, then Lift(milk->bread) = 0.4/(0.5*0.4) = 2.0

11

5.3 Evaluation of Candidate Rules Leverage

Leverage measures the difference in the probability of X and Y appearing together compared to statistical independence

Leverage = 0 if X and Y are statistically independent

Leverage > 0 indicates degree of usefulness of rule

Example – in 1000 transactions,

If {milk, eggs} appears in 300, {milk} in 500, and {eggs} in 400, then Leverage(milk->eggs) = 0.3 - 0.5*0.4 = 0.1

If {milk, bread} appears in 400, {milk} in 500, and {bread} in 400, then Leverage (milk->bread) = 0.4 - 0.5*0.4 = 0.2

12

5.4 Applications of Association Rules

The term market basket analysis refers to a specific implementation of association rules

For better merchandising – products to include/exclude from inventory each month

Placement of products within related products

Association rules also used for

Recommender systems – Amazon, Netflix

Clickstream analysis from web usage log files

Website visitors to page X click on links A,B,C more than on links D,E,F

13

5.5 Example: Grocery Store Transactions 5.5.1 The Groceries Dataset

Packages -> Install -> arules, arulesViz # don’t enter next line

> install.packages(c("arules", "arulesViz")) # appears on console

> library('arules')

> library('arulesViz')

> data(Groceries)

> summary(Groceries) # indicates 9835 rows

Class of dataset Groceries is transactions, containing 3 slots

transactionInfo # data frame with vectors having length of transactions

itemInfo # data frame storing item labels

data # binary evidence matrix of labels in transactions

> Groceries@itemInfo[1:10,]

> apply(Groceries@data[,10:20],2,function(r) paste(Groceries@itemInfo[r,"labels"],collapse=", "))

14

5.5 Example: Grocery Store Transactions 5.5.2 Frequent Itemset Generation

To illustrate the Apriori algorithm, the code below does each iteration separately.

Assume minimum support threshold = 0.02 (0.02 * 9853 = 198 items), get 122 itemsets total

First, get itemsets of length 1

> itemsets<-apriori(Groceries,parameter=list(minlen=1,maxlen=1,support=0.02,target="frequent itemsets"))

> summary(itemsets) # found 59 itemsets

> inspect(head(sort(itemsets,by="support"),10)) # lists top 10

Second, get itemsets of length 2

> itemsets<-apriori(Groceries,parameter=list(minlen=2,maxlen=2,support=0.02,target="frequent itemsets"))

> summary(itemsets) # found 61 itemsets

> inspect(head(sort(itemsets,by="support"),10)) # lists top 10

Third, get itemsets of length 3

> itemsets<-apriori(Groceries,parameter=list(minlen=3,maxlen=3,support=0.02,target="frequent itemsets"))

> summary(itemsets) # found 2 itemsets

> inspect(head(sort(itemsets,by="support"),10)) # lists top 10

> summary(itemsets) # found 59 itemsets> inspect(head(sort(itemsets,by="support"),10)) # lists top 10 supported items

15

5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization

The Apriori algorithm will now generate rules.

Set minimum support threshold to 0.001 (allows more rules, presumably for the scatterplot) and minimum confidence threshold to 0.6 to generate 2,918 rules.

> rules <- apriori(Groceries,parameter=list(support=0.001,confidence=0.6,target="rules"))

> summary(rules) # finds 2918 rules

> plot(rules) # displays scatterplot

The scatterplot shows that the highest lift occurs at a low support and a low confidence.

16

5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization

> plot(rules)

17

5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization

Get scatterplot matrix to compare the support, confidence, and lift of the 2918 rules

> plot(rules@quality) # displays scatterplot matrix

Lift is proportional to confidence with several linear groupings.

Note that Lift = Confidence/Support(Y), so when support of Y remains the same, lift is proportional to confidence and the slope of the linear trend is the reciprocal of Support(Y).

18

5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization

> plot(rules)

19

5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization

Compute the 1/Support(Y) which is the slope

> slope<-sort(round(rules@quality$lift/rules@quality$confidence,2))

Display the number of times each slope appears in dataset

> unlist(lapply(split(slope,f=slope),length))

Display the top 10 rules sorted by lift

> inspect(head(sort(rules,by="lift"),10))

Rule {Instant food products, soda} -> {hamburger meat}

has the highest lift of 19 (page 154)

20

5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization

Find the rules with confidence above 0.9

> confidentRules<-rules[quality(rules)$confidence>0.9]

> confidentRules # set of 127 rules

Plot a matrix-based visualization of the LHS v RHS of rules

> plot(confidentRules,method="matrix",measure=c("lift","confidence"),control=list(reorder=TRUE))

The legend on the right is a color matrix indicating the lift and the confidence to which each square in the main matrix corresponds

21

5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization

> plot(rules)

22

5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization

Visualize the top 5 rules with the highest lift.

> highLiftRules<-head(sort(rules,by="lift"),5)

> plot(highLiftRules,method="graph",control=list(type="items"))

In the graph, the arrow always points from an item on the LHS to an item on the RHS.

For example, the arrows that connects ham, processed cheese, and white bread suggest the rule

{ham, processed cheese} -> {white bread}

Size of circle indicates support and shade represents lift

23

5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization

24

5.6 Validation and Testing

The frequent and high confidence itemsets are found by pre-specified minimum support and minimum confidence levels

Measures like lift and/or leverage then ensure that interesting rules are identified rather than coincidental ones

However, some of the remaining rules may be considered subjectively uninteresting because they don’t yield unexpected profitable actions

E.g., rules like {paper} -> {pencil} are not interesting/meaningful

Incorporating subjective knowledge requires domain experts

Good rules provide valuable insights for institutions to improve their business operations

25

5.7 Diagnostics

Although minimum support is pre-specified in phases 3&4, this level can be adjusted to target the range of the number of rules – variants/improvements of Apriori are available

For large datasets the Apriori algorithm can be computationally expensive – efficiency improvements

Partitioning

Sampling

Transaction reduction

Hash-based itemset counting

Dynamic itemset counting

26

Homework is Completed By:

Writer Writer Name Amount Client Comments & Rating
Instant Homework Helper

ONLINE

Instant Homework Helper

$36

She helped me in last minute in a very reasonable price. She is a lifesaver, I got A+ grade in my homework, I will surely hire her again for my next assignments, Thumbs Up!

Order & Get This Solution Within 3 Hours in $25/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 3 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

Order & Get This Solution Within 6 Hours in $20/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 6 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

Order & Get This Solution Within 12 Hours in $15/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 12 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

6 writers have sent their proposals to do this homework:

Assignment Hub
A+GRADE HELPER
Fatimah Syeda
Top Grade Tutor
High Quality Assignments
Pro Writer
Writer Writer Name Offer Chat
Assignment Hub

ONLINE

Assignment Hub

I have read your project details and I can provide you QUALITY WORK within your given timeline and budget.

$49 Chat With Writer
A+GRADE HELPER

ONLINE

A+GRADE HELPER

I am a PhD writer with 10 years of experience. I will be delivering high-quality, plagiarism-free work to you in the minimum amount of time. Waiting for your message.

$47 Chat With Writer
Fatimah Syeda

ONLINE

Fatimah Syeda

Being a Ph.D. in the Business field, I have been doing academic writing for the past 7 years and have a good command over writing research papers, essay, dissertations and all kinds of academic writing and proofreading.

$44 Chat With Writer
Top Grade Tutor

ONLINE

Top Grade Tutor

I will be delighted to work on your project. As an experienced writer, I can provide you top quality, well researched, concise and error-free work within your provided deadline at very reasonable prices.

$25 Chat With Writer
High Quality Assignments

ONLINE

High Quality Assignments

I am an experienced researcher here with master education. After reading your posting, I feel, you need an expert research writer to complete your project.Thank You

$38 Chat With Writer
Pro Writer

ONLINE

Pro Writer

I will provide you with the well organized and well research papers from different primary and secondary sources will write the content that will support your points.

$41 Chat With Writer

Let our expert academic writers to help you in achieving a+ grades in your homework, assignment, quiz or exam.

Similar Homework Questions

HRM Discussion - Do not go gentle into that good night worksheet answers - Organizational behavior and Leadership ppt - Dan murphy melbourne bitter - Budgeted manufacturing overhead rate - Examine china's projected economic activity on the below table - Risk assessment for trips and outings template - How to make a bacterial lawn - Kids shoe size 31 conversion - MGT 498 week 1 - American History - Math. - Afi strategy framework - Mw petroleum corporation case solution - 15 simplify the expression below show your work 7y 12 - Harvard marketing simulation minnesota micromotors solution - Stanzas to augusta analysis - 在線購買氯胺酮,購買MXM粉末,購買1P-LSD粉末,購買美沙酮,在線購買MDPV購買DMT粉末。 - Villanova lean six sigma certification exam - Pore size distribution bjh - What is cause and effect persuasive technique - Excel qm in excel 2016 - Oberon wall mount bracket - Evidence based practice project paper on diabetes - What is an operationalised independent variable - National centre for vocational education research - (traer, adelgazar, compartir) tú - A liquid mixture of acetone and water contains - Groups in action evolution and challenges workbook answer key - Why students should be able to use cellphones in school - The 3 branches of science - Sonnet 43 elizabeth barrett browning context - It strategy and innovation mckeen pdf - Software configuration management activities - Parametric analysis behavior analysis - Designing team and team identity in team management - Can technology save sears case study answers - An adiabatic air compressor is to be powered - Online examination system project documentation - Sobe adrenaline rush energy drink ingredients - Math 144 major assignment 2 - Maroubra bay public schoolmaroubra bay public school - Figurative language worksheets with answers - Www medcohealth com medco consumer home jsp - Flood frequency analysis excel - Powerware 9120 dc bus fault - Profile assignment sample - Armstrong and miller dance - Practice b ratio in similar polygons - Chapter 4: Discussion - Nursing diagnosis for community assessment - Candy lips pump before and after - Who updates the hcpcs level ii codes and how often - Hsrp configuration example with diagram - How to open bartender - Global marketing pearson 9th edition pdf - What is upcoding and downcoding - Office 365 leeds trinity university - How much time did johnny cash spend in prison - Journal Article - Nick and the candlestick - Wileyplus brief - Push and pull factors of rio de janeiro - Https connected mcgraw hill com answer key - Mgt 230 management planning presentation - Biostatistics - Counter argument essay structure - The mystery of the whistling building answer key - Chapter 21 accounting for leases solutions - M-7 - The power of framing fairhurst pdf - Charisma wrinkle shield sheets costco - Discussion needed by wed @ 6pm - The weight in lbf of a 25.0 lbm object - Discussion - International Preparation - Barcarolle trinity grade 4 - Section 248b corporations act - Catch 22 sparknotes chapter 1 - Grimoldby primary school louth - Vrhn analysis in strategic management - Cherokee inc is a merchandiser that provided the following information - Each community dental clinic ringwood east - Bowie alcoholics anonymous - Lions skillet sheet music - Disaster Recovery Plan - Root mean square velocity - Assignment #019 - The time machine hg wells sparknotes - Cwu early childhood education - Advertising, Ethics, Values, and Diversity - Inquiry letter for product - Describe athena's changes to odysseus's appearance - I need help with writing a Management class paper. - Nsbm computer science degree - The most stable measure of central tendency is - Assign - Film theory an introduction through the senses pdf - Tangram triangle with 7 pieces - Los estudiantes de la universidad virtual estudian - Hrm 300 total rewards plan worksheet