Loading...

Messages

Proposals

Stuck in your homework and missing deadline? Get urgent help in $10/Page with 24 hours deadline

Get Urgent Writing Help In Your Essays, Assignments, Homeworks, Dissertation, Thesis Or Coursework & Achieve A+ Grades.

Privacy Guaranteed - 100% Plagiarism Free Writing - Free Turnitin Report - Professional And Experienced Writers - 24/7 Online Support

Rule generation in apriori algorithm example

27/10/2021 Client: muhammad11 Deadline: 2 Day

Apriori Property,Interesting Ruse Of A Categorical Variable ,Rules Distinguished From Coincidental Rules,

1. What is the Apriori property?

2. Following is a list of five transactions that include items A, B, C, and D:

• Tl: {A, B, C}

• T2: {A, B}

• T3: {B}

• T4: {A, C}

• TS: {A, C, D}

Which itemsets satisfy the minimum support of 0.5?

(Hint: An item set may include more than one item.)

3. How are interesting rules distinguished from coincidental rules?

4. A local retailer has a database that stores 10,000 transactions of last summer. After analyzing the data, a data science team has identified the following statistics:

• {battery} appears in 4,000 transactions.

• {sunscreen} appears in 3,000 transactions.

• {sandals} appears in 4,000 transactions.

• {bowls} appears in 1,000 transactions.

• {battery, sunscreen} appears in 1,500 transactions.

• {battery, sandals} appears in 1,000 transactions.

• {battery, bowls} appears in 1250 transactions.

• {battery, sunscreen, sandals} appears in 600 transactions.

Answer the following questions:

a. What are the support values of the preceding itemsets?

b. Assuming the minimum support is 0.05, which itemsets are considered frequent?

c. What are the confidence values of {battery}->{ sunscreen} and {battery, sunscreen}->{ sandals} ?

d. Which of the two rules is more interesting?

5. In the use of a categorical variable with n possible values, explain the following:

a. Why only n - 1 binary variables are necessary

b. Why using n variables would be problematic

6. If the probability of an event occurring is 0.4, then

a. What is the odds ratio?

b. What is the log odds ratio?

Data Science and Big Data Analytics

Chapter 5: Advanced Analytical Theory and Methods: Association Rules

1

Chapter Sections

5.1 Overview

5.2 Apriori Algorithm

5.3 Evaluation of Candidate Rules

5.4 Example: Transactions in a Grocery Store

5.5 Validation and Testing

5.6 Diagnostics

2

5.1 Overview

Association rules method

Unsupervised learning method

Descriptive (not predictive) method

Used to find hidden relationships in data

The relationships are represented as rules

Questions association rules might answer

Which products tend to be purchased together

What products do similar customers tend to buy

3

5.1 Overview

Example – general logic of association rules

4

5.1 Overview

Rules have the form X -> Y

When X is observed, Y is also observed

Itemset

Collection of items or entities

k-itemset = {item 1, item 2,…,item k}

Examples

Items purchased in one transaction

Set of hyperlinks clicked by a user in one session

5

5.1 Overview – Apriori Algorithm

Apriori is the most fundamental algorithm

Given itemset L, support of L is the percent of transactions that contain L

Frequent itemset – items appear together “often enough”

Minimum support defines “often enough” (% transactions)

If an itemset is frequent, then any subset is frequent

6

5.1 Overview – Apriori Algorithm

If {B,C,D} frequent, then all subsets frequent

7

5.2 Apriori Algorithm Frequent = minimum support

Bottom-up iterative algorithm

Identify the frequent (min support) 1-itemsets

Frequent 1-itemsets are paired into 2-itemsets, and the frequent 2-itemsets are identified, etc.

Definitions for next slide

D = transaction database

d = minimum support threshold

N = maximum length of itemset (optional parameter)

Ck = set of candidate k-itemsets

Lk = set of k-itemsets with minimum support

8

5.2 Apriori Algorithm

9

5.3 Evaluation of Candidate Rules Confidence

Frequent itemsets can form candidate rules

Confidence measures the certainty of a rule

Minimum confidence – predefined threshold

Problem with confidence

Given a rule X->Y, confidence considers only the antecedent (X) and the co-occurrence of X and Y

Cannot tell if a rule contains true implication

10

5.3 Evaluation of Candidate Rules Lift

Lift measures how much more often X and Y occur together than expected if statistically independent

Lift = 1 if X and Y are statistically independent

Lift > 1 indicates the degree of usefulness of the rule

Example – in 1000 transactions,

If {milk, eggs} appears in 300, {milk} in 500, and {eggs} in 400, then Lift(milk->eggs) = 0.3/(0.5*0.4) = 1.5

If {milk, bread} appears in 400, {milk} in 500, and {bread} in 400, then Lift(milk->bread) = 0.4/(0.5*0.4) = 2.0

11

5.3 Evaluation of Candidate Rules Leverage

Leverage measures the difference in the probability of X and Y appearing together compared to statistical independence

Leverage = 0 if X and Y are statistically independent

Leverage > 0 indicates degree of usefulness of rule

Example – in 1000 transactions,

If {milk, eggs} appears in 300, {milk} in 500, and {eggs} in 400, then Leverage(milk->eggs) = 0.3 - 0.5*0.4 = 0.1

If {milk, bread} appears in 400, {milk} in 500, and {bread} in 400, then Leverage (milk->bread) = 0.4 - 0.5*0.4 = 0.2

12

5.4 Applications of Association Rules

The term market basket analysis refers to a specific implementation of association rules

For better merchandising – products to include/exclude from inventory each month

Placement of products within related products

Association rules also used for

Recommender systems – Amazon, Netflix

Clickstream analysis from web usage log files

Website visitors to page X click on links A,B,C more than on links D,E,F

13

5.5 Example: Grocery Store Transactions 5.5.1 The Groceries Dataset

Packages -> Install -> arules, arulesViz # don’t enter next line

> install.packages(c("arules", "arulesViz")) # appears on console

> library('arules')

> library('arulesViz')

> data(Groceries)

> summary(Groceries) # indicates 9835 rows

Class of dataset Groceries is transactions, containing 3 slots

transactionInfo # data frame with vectors having length of transactions

itemInfo # data frame storing item labels

data # binary evidence matrix of labels in transactions

> Groceries@itemInfo[1:10,]

> apply(Groceries@data[,10:20],2,function(r) paste(Groceries@itemInfo[r,"labels"],collapse=", "))

14

5.5 Example: Grocery Store Transactions 5.5.2 Frequent Itemset Generation

To illustrate the Apriori algorithm, the code below does each iteration separately.

Assume minimum support threshold = 0.02 (0.02 * 9853 = 198 items), get 122 itemsets total

First, get itemsets of length 1

> itemsets<-apriori(Groceries,parameter=list(minlen=1,maxlen=1,support=0.02,target="frequent itemsets"))

> summary(itemsets) # found 59 itemsets

> inspect(head(sort(itemsets,by="support"),10)) # lists top 10

Second, get itemsets of length 2

> itemsets<-apriori(Groceries,parameter=list(minlen=2,maxlen=2,support=0.02,target="frequent itemsets"))

> summary(itemsets) # found 61 itemsets

> inspect(head(sort(itemsets,by="support"),10)) # lists top 10

Third, get itemsets of length 3

> itemsets<-apriori(Groceries,parameter=list(minlen=3,maxlen=3,support=0.02,target="frequent itemsets"))

> summary(itemsets) # found 2 itemsets

> inspect(head(sort(itemsets,by="support"),10)) # lists top 10

> summary(itemsets) # found 59 itemsets> inspect(head(sort(itemsets,by="support"),10)) # lists top 10 supported items

15

5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization

The Apriori algorithm will now generate rules.

Set minimum support threshold to 0.001 (allows more rules, presumably for the scatterplot) and minimum confidence threshold to 0.6 to generate 2,918 rules.

> rules <- apriori(Groceries,parameter=list(support=0.001,confidence=0.6,target="rules"))

> summary(rules) # finds 2918 rules

> plot(rules) # displays scatterplot

The scatterplot shows that the highest lift occurs at a low support and a low confidence.

16

5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization

> plot(rules)

17

5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization

Get scatterplot matrix to compare the support, confidence, and lift of the 2918 rules

> plot(rules@quality) # displays scatterplot matrix

Lift is proportional to confidence with several linear groupings.

Note that Lift = Confidence/Support(Y), so when support of Y remains the same, lift is proportional to confidence and the slope of the linear trend is the reciprocal of Support(Y).

18

5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization

> plot(rules)

19

5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization

Compute the 1/Support(Y) which is the slope

> slope<-sort(round(rules@quality$lift/rules@quality$confidence,2))

Display the number of times each slope appears in dataset

> unlist(lapply(split(slope,f=slope),length))

Display the top 10 rules sorted by lift

> inspect(head(sort(rules,by="lift"),10))

Rule {Instant food products, soda} -> {hamburger meat}

has the highest lift of 19 (page 154)

20

5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization

Find the rules with confidence above 0.9

> confidentRules<-rules[quality(rules)$confidence>0.9]

> confidentRules # set of 127 rules

Plot a matrix-based visualization of the LHS v RHS of rules

> plot(confidentRules,method="matrix",measure=c("lift","confidence"),control=list(reorder=TRUE))

The legend on the right is a color matrix indicating the lift and the confidence to which each square in the main matrix corresponds

21

5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization

> plot(rules)

22

5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization

Visualize the top 5 rules with the highest lift.

> highLiftRules<-head(sort(rules,by="lift"),5)

> plot(highLiftRules,method="graph",control=list(type="items"))

In the graph, the arrow always points from an item on the LHS to an item on the RHS.

For example, the arrows that connects ham, processed cheese, and white bread suggest the rule

{ham, processed cheese} -> {white bread}

Size of circle indicates support and shade represents lift

23

5.5 Example: Grocery Store Transactions 5.5.3 Rule Generation and Visualization

24

5.6 Validation and Testing

The frequent and high confidence itemsets are found by pre-specified minimum support and minimum confidence levels

Measures like lift and/or leverage then ensure that interesting rules are identified rather than coincidental ones

However, some of the remaining rules may be considered subjectively uninteresting because they don’t yield unexpected profitable actions

E.g., rules like {paper} -> {pencil} are not interesting/meaningful

Incorporating subjective knowledge requires domain experts

Good rules provide valuable insights for institutions to improve their business operations

25

5.7 Diagnostics

Although minimum support is pre-specified in phases 3&4, this level can be adjusted to target the range of the number of rules – variants/improvements of Apriori are available

For large datasets the Apriori algorithm can be computationally expensive – efficiency improvements

Partitioning

Sampling

Transaction reduction

Hash-based itemset counting

Dynamic itemset counting

26

Homework is Completed By:

Writer Writer Name Amount Client Comments & Rating
Instant Homework Helper

ONLINE

Instant Homework Helper

$36

She helped me in last minute in a very reasonable price. She is a lifesaver, I got A+ grade in my homework, I will surely hire her again for my next assignments, Thumbs Up!

Order & Get This Solution Within 3 Hours in $25/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 3 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

Order & Get This Solution Within 6 Hours in $20/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 6 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

Order & Get This Solution Within 12 Hours in $15/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 12 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

6 writers have sent their proposals to do this homework:

Assignment Hub
A+GRADE HELPER
Fatimah Syeda
Top Grade Tutor
High Quality Assignments
Pro Writer
Writer Writer Name Offer Chat
Assignment Hub

ONLINE

Assignment Hub

I have read your project details and I can provide you QUALITY WORK within your given timeline and budget.

$49 Chat With Writer
A+GRADE HELPER

ONLINE

A+GRADE HELPER

I am a PhD writer with 10 years of experience. I will be delivering high-quality, plagiarism-free work to you in the minimum amount of time. Waiting for your message.

$47 Chat With Writer
Fatimah Syeda

ONLINE

Fatimah Syeda

Being a Ph.D. in the Business field, I have been doing academic writing for the past 7 years and have a good command over writing research papers, essay, dissertations and all kinds of academic writing and proofreading.

$44 Chat With Writer
Top Grade Tutor

ONLINE

Top Grade Tutor

I will be delighted to work on your project. As an experienced writer, I can provide you top quality, well researched, concise and error-free work within your provided deadline at very reasonable prices.

$25 Chat With Writer
High Quality Assignments

ONLINE

High Quality Assignments

I am an experienced researcher here with master education. After reading your posting, I feel, you need an expert research writer to complete your project.Thank You

$38 Chat With Writer
Pro Writer

ONLINE

Pro Writer

I will provide you with the well organized and well research papers from different primary and secondary sources will write the content that will support your points.

$41 Chat With Writer

Let our expert academic writers to help you in achieving a+ grades in your homework, assignment, quiz or exam.

Similar Homework Questions

Vodafone and hutch merger case study - Halliford school term dates - Cyber security awareness training proposal - Chapter 5 human resource planning and job analysis - Labor relations and competitive bargaining - Belkasoft acquisition tool download - M7D2 HIGHER CALLING - Magnesium and oxygen reaction observations - Investment banking interview questions - Supertexts - Need help with answering a Business Law discussion question - How to show critical path in projectlibre - MHA5020 WEEK 1 PROJECT/Setting the Stage for a Purpose - Homework - Research project - Electrical risk assessment template word - 561 week 7 power point replies - Inside russia's toughest prisons watch online - Who invented freeze drying - Illustrate main phases of database design using a diagram - Columbus county social services - Chipotle mexican grill business strategy - Nurse practitioner head to toe assessment walden university - Who wants to be a mathionaire - Danielle ferrante - Development of the telescope timeline - Https phet colorado edu en simulation legacy projectile motion - Cat6 ethernet color code - Rea diagram revenue cycle cardinalities - Www cfainstitute org toolkit your online preparation resource - Discussion Question - Klumper corporation is a diversified - Week 4 MF - Decision making under certainty uncertainty and risk - Truman show radio scene - Written communication skills ppt - What outcomes do horizontal merger and acquisition strategies intend - Hrm 300 week 1 - Kpi for industrial relations manager - Psychology as the behaviorist views it - Tutorial 4 case problem 1 sky dust stories - According to kelly, fear is different from threat in that fear - 02.05 guilds and a changing economy assessment - Use of percentages to describe change - Data flow diagram example - Protein injection for dialysis patients - 1 RESPOND - Neural intelligence by david perkins - Swot analysis of coca cola 2016 - BUSI510 Week 8 Assignment - Physical security, security planning and Influence of Physical Design - 1 page APA - Electric field mapping lab report theory - 3 pages due by 16 hours ..Attendance policy development scholarly paper - Army bfa standards australia - Beetroot experiment cell membrane alcohol - Translation reflection rotation worksheet pdf - Consider a process consisting of three resources - One subject lesson plan template - Homework and journal - Influence of single parenting in children development - Implement a “prototype” database system application / ERD diagram for an eCommerce social-network system - The entire busyteacher library pdf free download - 200 words - Problems in the World Today - Gas gas gas poem - Right carotid endarterectomy and patch angioplasty icd 10 pcs - Aisha contribution to islam - The study of economics involves which of the following - Design a journal bearing for centrifugal pump - Purchasing power parity holds between the nations - Human anatomy and physiology lab manual answers key - Assignment 1 cryptography - Argumentative Paper - Evolution in Recent History and Mutations in Evolution Discussion - Need Thursday morning - Mgt 460 week 3 assignment - Cirque du soleil in los angeles 2013 - Acc 561 week 2 - An emission fee levied against polluters will tend to - What is the difference between predictive and prescriptive analytics - Property occupations form 6 - What is an ethogram - IA FP - Punnett square interactive activity - In the proto evangelium god promised that - Ode on a distant prospect of eton college paraphrase - Strategic management formulation implementation and control 12th edition - 1m3 m25 concrete weight - Nick nightingale resene net worth - Asset disposal form word - Defi smart adapter w - Reading lolitha in teheran pdf - Wk 2, HCS 430: DR 2 - Oracle hyperion financial management administrator's guide - SOCW 6090 Week 3 - Assignment: Conducting a Diagnostic Interview With a Mental Status Exam - Last word on tennis - Altium add footprint to component - Food and beverage operations management assignment - Newspaper headlines with double meanings