Loading...

Messages

Proposals

Stuck in your homework and missing deadline? Get urgent help in $10/Page with 24 hours deadline

Get Urgent Writing Help In Your Essays, Assignments, Homeworks, Dissertation, Thesis Or Coursework & Achieve A+ Grades.

Privacy Guaranteed - 100% Plagiarism Free Writing - Free Turnitin Report - Professional And Experienced Writers - 24/7 Online Support

Anil shinghal death dallas tx

09/01/2021 Client: saad24vbs Deadline: 6 Hours

introduction to data mining.pdf

INTRODUCTION TO DATA MINING


INTRODUCTION TO DATA MINING SECOND EDITION


PANG-NING TAN


Michigan State Universit


MICHAEL STEINBACH


University of Minnesota


ANUJ KARPATNE


University of Minnesota


VIPIN KUMAR


University of Minnesota


330 Hudson Street, NY NY 10013


Director, Portfolio Management: Engineering, Computer Science & Global Editions: Julian Partridge


Specialist, Higher Ed Portfolio Management: Matt Goldstein


Portfolio Management Assistant: Meghan Jacoby


Managing Content Producer: Scott Disanno


Content Producer: Carole Snyder


Web Developer: Steve Wright


Rights and Permissions Manager: Ben Ferrini


Manufacturing Buyer, Higher Ed, Lake Side Communications Inc (LSC): Maura Zaldivar-Garcia


Inventory Manager: Ann Lam


Product Marketing Manager: Yvonne Vannatta


Field Marketing Manager: Demetrius Hall


Marketing Assistant: Jon Bryant


Cover Designer: Joyce Wells, jWellsDesign


Full-Service Project Management: Chandrasekar Subramanian, SPi Global


Copyright ©2019 Pearson Education, Inc. All rights reserved. Manufactured in the United States of America. This publication is protected by Copyright, and permission should be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding permissions, request forms and the appropriate contacts within the Pearson Education Global Rights & Permissions department, please visit www.pearsonhighed.com/permissions/.


Many of the designations by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed in initial caps or all caps.


Library of Congress Cataloging-in-Publication Data on File


Names: Tan, Pang-Ning, author. | Steinbach, Michael, author. | Karpatne, Anuj, author. | Kumar, Vipin, 1956- author.


Title: Introduction to Data Mining / Pang-Ning Tan, Michigan State University, Michael Steinbach, University of Minnesota, Anuj Karpatne, University of Minnesota, Vipin Kumar, University of Minnesota.


Description: Second edition. | New York, NY : Pearson Education, [2019] | Includes bibliographical


references and index.


Identifiers: LCCN 2017048641 | ISBN 9780133128901 | ISBN 0133128903


Subjects: LCSH: Data mining.


Classification: LCC QA76.9.D343 T35 2019 | DDC 006.3/12–dc23 LC record available at https://lccn.loc.gov/2017048641


1 18


ISBN-10: 0133128903


ISBN-13: 9780133128901


To our families …


Preface to the Second Edition Since the first edition, roughly 12 years ago, much has changed in the field of data analysis. The volume and variety of data being collected continues to increase, as has the rate (velocity) at which it is being collected and used to make decisions. Indeed, the term, Big Data, has been used to refer to the massive and diverse data sets now available. In addition, the term data science has been coined to describe an emerging area that applies tools and techniques from various fields, such as data mining, machine learning, statistics, and many others, to extract actionable insights from data, often big data.


The growth in data has created numerous opportunities for all areas of data analysis. The most dramatic developments have been in the area of predictive modeling, across a wide range of application domains. For instance, recent advances in neural networks, known as deep learning, have shown impressive results in a number of challenging areas, such as image classification, speech recognition, as well as text categorization and understanding. While not as dramatic, other areas, e.g., clustering, association analysis, and anomaly detection have also continued to advance. This new edition is in response to those advances.


Overview As with the first edition, the second edition of the book provides a comprehensive introduction to data mining and is designed to be accessible and useful to students, instructors, researchers, and professionals. Areas covered include data preprocessing, predictive modeling, association analysis, cluster analysis, anomaly detection, and avoiding false discoveries. The goal is to present fundamental concepts and algorithms for each topic, thus providing the reader with the necessary background for the application of data mining to real problems. As before, classification, association analysis and cluster analysis, are each covered in a pair of chapters. The introductory chapter covers basic concepts, representative algorithms, and evaluation techniques, while the more following chapter discusses advanced concepts and algorithms. As before, our objective is to provide the reader with a sound understanding of the foundations of data mining, while still covering many important advanced topics. Because of this approach, the book is useful both as a learning tool and as a reference.


To help readers better understand the concepts that have been presented, we provide an extensive set of examples, figures, and exercises. The solutions to the original exercises, which are already circulating on the web, will be made public. The exercises are mostly unchanged from the last edition, with the exception of new exercises in the chapter on avoiding false discoveries. New exercises for the other chapters and their solutions will be available to instructors via the web. Bibliographic notes are included at the end of each chapter for readers who are interested in more advanced topics, historically important papers, and recent trends. These have also been significantly updated. The book also contains a comprehensive subject and author index.


What is New in the Second Edition? Some of the most significant improvements in the text have been in the two chapters on classification. The


introductory chapter uses the decision tree classifier for illustration, but the discussion on many topics— those that apply across all classification approaches—has been greatly expanded and clarified, including topics such as overfitting, underfitting, the impact of training size, model complexity, model selection, and common pitfalls in model evaluation. Almost every section of the advanced classification chapter has been significantly updated. The material on Bayesian networks, support vector machines, and artificial neural networks has been significantly expanded. We have added a separate section on deep networks to address the current developments in this area. The discussion of evaluation, which occurs in the section on imbalanced classes, has also been updated and improved.


The changes in association analysis are more localized. We have completely reworked the section on the evaluation of association patterns (introductory chapter), as well as the sections on sequence and graph mining (advanced chapter). Changes to cluster analysis are also localized. The introductory chapter added the K-means initialization technique and an updated the discussion of cluster evaluation. The advanced clustering chapter adds a new section on spectral graph clustering. Anomaly detection has been greatly revised and expanded. Existing approaches—statistical, nearest neighbor/density-based, and clustering based—have been retained and updated, while new approaches have been added: reconstruction-based, one-class classification, and information-theoretic. The reconstruction-based approach is illustrated using autoencoder networks that are part of the deep learning paradigm. The data chapter has been updated to include discussions of mutual information and kernel-based techniques.


The last chapter, which discusses how to avoid false discoveries and produce valid results, is completely new, and is novel among other contemporary textbooks on data mining. It supplements the discussions in the other chapters with a discussion of the statistical concepts (statistical significance, p-values, false discovery rate, permutation testing, etc.) relevant to avoiding spurious results, and then illustrates these concepts in the context of data mining techniques. This chapter addresses the increasing concern over the validity and reproducibility of results obtained from data analysis. The addition of this last chapter is a recognition of the importance of this topic and an acknowledgment that a deeper understanding of this area is needed for those analyzing data.


The data exploration chapter has been deleted, as have the appendices, from the print edition of the book, but will remain available on the web. A new appendix provides a brief discussion of scalability in the context of big data.


To the Instructor As a textbook, this book is suitable for a wide range of students at the advanced undergraduate or graduate level. Since students come to this subject with diverse backgrounds that may not include extensive knowledge of statistics or databases, our book requires minimal prerequisites. No database knowledge is needed, and we assume only a modest background in statistics or mathematics, although such a background will make for easier going in some sections. As before, the book, and more specifically, the chapters covering major data mining topics, are designed to be as self-contained as possible. Thus, the order in which topics can be covered is quite flexible. The core material is covered in chapters 2 (data), 3 (classification), 5 (association analysis), 7 (clustering), and 9 (anomaly detection). We recommend at least a cursory coverage of Chapter 10 (Avoiding False Discoveries) to instill in students some caution when interpreting the results of their data analysis. Although the introductory data chapter (2) should be covered first, the basic classification (3), association analysis (5), and clustering


chapters (7), can be covered in any order. Because of the relationship of anomaly detection (9) to classification (3) and clustering (7), these chapters should precede Chapter 9. Various topics can be selected from the advanced classification, association analysis, and clustering chapters (4, 6, and 8, respectively) to fit the schedule and interests of the instructor and students. We also advise that the lectures be augmented by projects or practical exercises in data mining. Although they are time consuming, such hands-on assignments greatly enhance the value of the course.


Support Materials Support materials available to all readers of this book are available at http://www- users.cs.umn.edu/~kumar/dmbook.


PowerPoint lecture slides


Suggestions for student projects


Data mining resources, such as algorithms and data sets


Online tutorials that give step-by-step examples for selected data mining techniques described in the book using actual data sets and data analysis software


Additional support materials, including solutions to exercises, are available only to instructors adopting this textbook for classroom use. The book’s resources will be mirrored at www.pearsonhighered.com/cs- resources. Comments and suggestions, as well as reports of errors, can be sent to the authors through dmbook@cs.umn.edu.


Acknowledgments Many people contributed to the first and second editions of the book. We begin by acknowledging our families to whom this book is dedicated. Without their patience and support, this project would have been impossible.


We would like to thank the current and former students of our data mining groups at the University of Minnesota and Michigan State for their contributions. Eui-Hong (Sam) Han and Mahesh Joshi helped with the initial data mining classes. Some of the exercises and presentation slides that they created can be found in the book and its accompanying slides. Students in our data mining groups who provided comments on drafts of the book or who contributed in other ways include Shyam Boriah, Haibin Cheng, Varun Chandola, Eric Eilertson, Levent Ertöz, Jing Gao, Rohit Gupta, Sridhar Iyer, Jung-Eun Lee, Benjamin Mayer, Aysel Ozgur, Uygar Oztekin, Gaurav Pandey, Kashif Riaz, Jerry Scripps, Gyorgy Simon, Hui Xiong, Jieping Ye, and Pusheng Zhang. We would also like to thank the students of our data mining classes at the University of Minnesota and Michigan State University who worked with early drafts of the book and provided invaluable feedback. We specifically note the helpful suggestions of Bernardo Craemer, Arifin Ruslim, Jamshid Vayghan, and Yu Wei.


Joydeep Ghosh (University of Texas) and Sanjay Ranka (University of Florida) class tested early


http://www.pearsonhighered.com/cs-resources

versions of the book. We also received many useful suggestions directly from the following UT students: Pankaj Adhikari, Rajiv Bhatia, Frederic Bosche, Arindam Chakraborty, Meghana Deodhar, Chris Everson, David Gardner, Saad Godil, Todd Hay, Clint Jones, Ajay Joshi, Joonsoo Lee, Yue Luo, Anuj Nanavati, Tyler Olsen, Sunyoung Park, Aashish Phansalkar, Geoff Prewett, Michael Ryoo, Daryl Shannon, and Mei Yang.


Ronald Kostoff (ONR) read an early version of the clustering chapter and offered numerous suggestions. George Karypis provided invaluable LATEX assistance in creating an author index. Irene Moulitsas also provided assistance with LATEX and reviewed some of the appendices. Musetta Steinbach was very helpful in finding errors in the figures.


We would like to acknowledge our colleagues at the University of Minnesota and Michigan State who have helped create a positive environment for data mining research. They include Arindam Banerjee, Dan Boley, Joyce Chai, Anil Jain, Ravi Janardan, Rong Jin, George Karypis, Claudia Neuhauser, Haesun Park, William F. Punch, György Simon, Shashi Shekhar, and Jaideep Srivastava. The collaborators on our many data mining projects, who also have our gratitude, include Ramesh Agrawal, Maneesh Bhargava, Steve Cannon, Alok Choudhary, Imme Ebert-Uphoff, Auroop Ganguly, Piet C. de Groen, Fran Hill, Yongdae Kim, Steve Klooster, Kerry Long, Nihar Mahapatra, Rama Nemani, Nikunj Oza, Chris Potter, Lisiane Pruinelli, Nagiza Samatova, Jonathan Shapiro, Kevin Silverstein, Brian Van Ness, Bonnie Westra, Nevin Young, and Zhi-Li Zhang.

Homework is Completed By:

Writer Writer Name Amount Client Comments & Rating
Instant Homework Helper

ONLINE

Instant Homework Helper

$36

She helped me in last minute in a very reasonable price. She is a lifesaver, I got A+ grade in my homework, I will surely hire her again for my next assignments, Thumbs Up!

Order & Get This Solution Within 3 Hours in $25/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 3 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

Order & Get This Solution Within 6 Hours in $20/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 6 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

Order & Get This Solution Within 12 Hours in $15/Page

Custom Original Solution And Get A+ Grades

  • 100% Plagiarism Free
  • Proper APA/MLA/Harvard Referencing
  • Delivery in 12 Hours After Placing Order
  • Free Turnitin Report
  • Unlimited Revisions
  • Privacy Guaranteed

6 writers have sent their proposals to do this homework:

Helping Hand
Top Essay Tutor
Best Coursework Help
University Coursework Help
Writer Writer Name Offer Chat
Helping Hand

ONLINE

Helping Hand

I am an Academic writer with 10 years of experience. As an Academic writer, my aim is to generate unique content without Plagiarism as per the client’s requirements.

$100 Chat With Writer
Top Essay Tutor

ONLINE

Top Essay Tutor

I have more than 12 years of experience in managing online classes, exams, and quizzes on different websites like; Connect, McGraw-Hill, and Blackboard. I always provide a guarantee to my clients for their grades.

$105 Chat With Writer
Best Coursework Help

ONLINE

Best Coursework Help

I am an Academic writer with 10 years of experience. As an Academic writer, my aim is to generate unique content without Plagiarism as per the client’s requirements.

$100 Chat With Writer
University Coursework Help

ONLINE

University Coursework Help

Hi dear, I am ready to do your homework in a reasonable price.

$102 Chat With Writer

Let our expert academic writers to help you in achieving a+ grades in your homework, assignment, quiz or exam.

Similar Homework Questions

Simple phylogenetic tree worksheet - Manual handling lifting techniques - Psychology: Dissertation Chapter 3 - Great denham primary school - Wk 5 - Signature Assignment: Strategic Plan - Implementation Plan, Strategic Contr - Underline the verbs with answers - English - The odyssey book names - Burlesque lounge los angeles - Explain the difference between lan and wan firewall rules - Product Design Philosophy dicussion question due today!! - +91-8306951337 vashikaran specialist near me IN Coimbatore - Chemistry Lab - Health Care Reform - Queen elizabeth ii leadership style - Military time worksheet pdf - Lego batman movie waugh chapel - Santa monica college reading for understanding - Media literacy - Screwtape proposes a toast - 4 responses Sep 10 - Neo pi r interpretive report - Discussion: Logistical Challenges in Contingency Planning - Pte test centre near me - The dark child chapter summary - 4.5 9 distance in kilometers codehs - Duties and responsibilities of marine engineer - Embry riddle los angeles - Programmers usually use the word “write” to mean “produce hard copy output.” - Cable abc has a length of 5 m figure 1 - Dr howard fine psychologist - Wk 1 prject NSG 5000 - Motorcycle diaries questions and answers - X ray vision led driving lights - Twill weave 2 1 - Chevelle wheel fitment guide - Issa certified nutritionist final exam - Croft & barrow encaustic women's wedge sandals - Free pragmatic language activities - Acu extension request form - Module 2 General Psychology - Enter the november 1 balances in the ledger accounts - Among us skins hack - Newcastle university partners scheme - Amazon alexa target audience - Professional Cover Letter - The ledger of mai company includes the following accounts - Self balancing robot raspberry pi - Australian college of applied psychology melbourne - Unitarian church of lincoln - X 509 authentication service pdf - Hsa ergonomics workshop - Compare and analyze the impact of widowhood during this stage of life - Myp art lesson plans - Memo Due Tomorrow - ACC561 Week 6 Final Exam SCORE 100 PERCENT - What are the four components of health care - Human resources management st clair college - The break-even time (bet) method is a variation of the: - Factors affecting location decision ppt - Ernestine wiedenbach nursing theory pdf - The average score for a class of students - Using exhibit 1.3 as a model describe the source-make-deliver-return relationships - How to work out hire purchase - Family Assessment 30 page nursing final project due 8/9/2020 - Isometric drawing 45 degree angle - History teacher interview questions - Psychologyy - The Role of the RN/APRN in Policy-Making - Even though most corporate bonds in the united states make coupon payments semiannually - Turbotax forcing me to use deluxe - Sorin inc a company that produces - Every round goes higher and higher lyrics - Canterbury school fort wayne - Unit 1 assignment 2 business - X mouse button control - Centre of pressure experiment discussion - Non union fortune 100 companies - Airbus case study answers - Week 4 - Datalogger id fronius primo - Title page of a literature review - Dissociative Identity Disorder and PTSD - Economics and Politics - Gardner's theory of multiple intelligences quizlet - Brisbane city council footpath permit - Marketing Management - How to identify the reducing and oxidizing agent - How to do piecewise functions on ti nspire cx - Klein and moeschberger survival analysis solutions - Why do you think that Islam was so successful - Million dollar baby character analysis - Chapter 11 checkpoint questions introduction to java programming - Importance of creative strategy in advertising - Criminal justice/affidavit/complaint offense#3 - The false claims act contains which distinguishing provision - Cisco business edition 7000 ordering guide - Physics - An ideal monoatomic gas at expands adiabatically - Limiting and excess reactants worksheet