Recent Orders

Our Reviews

Sample Papers

How It Works

Get First 2 Pages Of Your Homework Absolutely Free!

Messages

Welcome to TutorsOnSpot.Com!

World's No. 1 Assignment Writing Market

Post Your Homework

Proposals

Post your homework and get free proposals here!

Post Your Homework

Stuck in your homework and missing deadline? Get urgent help in $10/Page with 24 hours deadline

Get Urgent Writing Help In Your Essays, Assignments, Homeworks, Dissertation, Thesis Or Coursework & Achieve A+ Grades.

Privacy Guaranteed - 100% Plagiarism Free Writing - Free Turnitin Report - Professional And Experienced Writers - 24/7 Online Support

Get Free Quotes Post Your Requirements

Kddm

19/11/2020 Client: papadok01 Deadline: 24 Hours

Chapter 4 Knowledge Discovery, Data Mining, and Practice-Based Evidence

Mollie R. Cummins

Ginette A. Pepper

Susan D. Horn

The next step to comparative effectiveness research is to conduct more prospective large-scale observational cohort studies with the rigor described here for knowledge discovery and data mining (KDDM) and practice-based evidence (PBE) studies.

Objectives

At the completion of this chapter the reader will be prepared to:

1.Define the goals and processes employed in knowledge discovery and data mining (KDDM) and practice-based evidence (PBE) designs

2.Analyze the strengths and weaknesses of observational designs in general and of KDDM and PBE specifically

3.Identify the roles and activities of the informatics specialist in KDDM and PBE in healthcare environments

Key Terms

Comparative effectiveness research, 69

Confusion matrix, 62

Data mining, 61

Knowledge discovery and data mining (KDDM), 56

Machine learning, 56

Natural language processing (NLP), 58

Practice-based evidence (PBE), 56

Preprocessing, 56

Abstract

The advent of the electronic health record (EHR) and other large electronic datasets has revolutionized efficient access to comprehensive data across large numbers of patients and the concomitant capacity to detect subtle patterns in these data even with missing or less than optimal data quality. This chapter introduces two approaches to knowledge building from clinical data: (1) knowledge discovery and data mining (KDDM) and (2) practice-based evidence (PBE). The use of machine learning methods in retrospective analysis of routinely collected clinical data characterizes KDDM. KDDM enables us to efficiently and effectively analyze large amounts of data and develop clinical knowledge models for decision support. PBE integrates health information technology (health IT) products with cohort identification, prospective data collection, and extensive front-line clinician and patient input for comparative effectiveness research. PBE can uncover best practices and combinations of treatments for specific types of patients while achieving many of the presumed advantages of randomized controlled trials (RCTs).

Introduction

Leaders need to foster a shared learning culture for improving healthcare. This extends beyond the local department or institution to a value for creating generalizable knowledge to improve care worldwide. Sound, rigorous methods are needed by researchers and health professionals to create this knowledge and address practical questions about risks, benefits, and costs of interventions as they occur in actual clinical practice. Typical questions are as follows:

•Are treatments used in daily practice associated with intended outcomes?

•Can we predict adverse events in time to prevent or ameliorate them?

•What treatments work best for which patients?

•With limited financial resources, what are the best interventions to use for specific types of patients?

•What types of individuals are at risk for certain conditions?

Answers to these questions can help clinicians, patients, researchers, healthcare administrators, and policy-makers to learn from and improve real-world, everyday clinical practice. Two important emerging approaches to knowledge building from clinical data are KDDM and PBE, both of which are described in this chapter.

Research Designs for Knowledge Discovery

The gold standard research design for answering questions about the efficacy of treatments is an experimental design, often referred to as a randomized controlled trial (RCT). An RCT requires random assignment of patients to treatment condition as well as other design features, such as tightly controlled inclusion criteria, to assure as much as posible that the only difference between the experimental and control groups is the treatment (or placebo) that each group receives. The strength of the RCT is the degree of confidence in causal inferences, in other words, that the therapeutic intervention caused the clinical effects (or lack of effects). Drawbacks of the RCT include the time and expense required to conduct a comparison of a small number of treatment options and the limited generalizability of the results to patients, settings, intervention procedures, and measures that differ from the specific conditions in the study condition. Further, RCTs have little value in generating unique hypotheses and possibilities.

Observational research designs can also yield valuable information to characterize disease risk and generate hypotheses about potentially effective treatments. In addition, observational research is essential to determine the effectiveness of treatments or how well treatments work in actual practice. In observational studies the investigator merely records what occurs under naturalistic conditions, such as which individual gets what therapy and what outcomes result or which variables are associated with what outcomes. Of course, with observational studies the patients who receive different treatments generally differ on many other variables (selection bias) since treatments were determined by clinician judgment rather than random assignment and selection. For example, one therapy may be prescribed for sicker patients under natural conditions or may not be accessible to uninsured patients. Since diagnostic approaches vary in clinical practice, patients with the same diagnosis may have considerable differences in the actual condition.

Observational studies can be either prospective (data are generated after the study commences) or retrospective (data were generated before the study). Chart review has traditonally been the most common approach to retrospective observational research. However, chart review previously required tedious and time-consuming data extraction and the requisite data may be missing, inconsistent, or of poor quality. Prospective studies have the advantage that the measurements can be standardized, but recording both research data and clinical data constitutes a documentation burden for clinicians that cannot be accommodated in typical clinical settings unless the research and clinical data elements are combined to become the standard for documentation.

EHRs and Knowledge Discovery

The advent of the EHR and other large electronic datasets has revolutionized observational studies by increasing the potential for efficient access to comprehensive data, reflecting large numbers of patients and the capacity to detect subtle patterns in the data, even with missing or less than optimal data quality. With very large samples available from EHRs at relatively low cost, it is often possible to compensate with statistical controls for the lack of randomization in the practice setting. With electronic data, standardized data collection is facilitated and data validity can be enhanced, minimizing the documentation burden by using clinical data for research purposes.

Increased adoption of EHRs and other health information systems has resulted in vast amounts of structured and textual data. Stored on servers in a data warehouse (a large data repository integrating data across clinical, administrative, and other systems), the data may be a partial or complete copy of all data collected in the course of care provision. The data can include billing information, physician and nursing notes, laboratory results, radiology images, and numerous other diverse types of data. In some settings data describing individual patients and their characteristics, health issues, treatments, and outcomes has accumulated for years, forming longitudinal records. The clinical record can also be linked to repositories of genetic or familial data.1–3 These data constitute an incredible resource that is underused for scientific research in biomedicine and nursing.

The potential of using these data stores for the advancement of scientific knowledge and patient care is widely acknowledged. However, the lack of availability of tools and technology to adequately manage the data deluge has proven to be an Achilles' heel. Very large data resources, typically on the terabyte scale or larger, require highly specialized approaches to storage, management, extraction, and analysis. Moreover, the data may not be useful. Data quality can be poor and require substantial additional processing prior to use.

Clinical concepts are typically represented in the EHR in a way that supports healthcare delivery but not necessarily research. For example, pain might be qualitatively described in a patient's note and EHR as “mild” or “better.” This may meet the immediate need for documentation and care but it does not allow the researcher to measure differences in pain over time and across patients, as would measurement using a pain scale. Clinical concepts may not be adequately measured or represented in a way that enables scientific analysis. Data quality affects the feasibility of secondary analysis.

TABLE 4-1 Characteristics of Knowledge Discovery and Data Mining (KDDM) and Practice-Based Evidence (PBE)

Characteristic KDDM PBE

Description Application of machine learning and statistical methods for pattern discovery

Participatory research approach requiring documentation of predefined process and outcome data and analysis

Goal Develop models to predict future events or infer missing information

Determine the effectiveness of multiple interventions on multiple outcomes in actual practice environment

Design classification Observational (descriptive) Observational (descriptive)

Temporal aspects Retrospective Prospective

Typical sample size 1000-1,000,000 or more, depending on project and available data

800-2000+

Knowledge Building Using Health IT

Two observational approaches to knowledge building from health IT can be employed for research and clinical performance improvement. One approach is based on machine learning applied to retrospective analysis of routinely collected clinical data and a second approach is based on increasing integration of health IT with cohort identification, front-line knowledge, and prospective data collection for research and clinical care.

Knowledge discovery and data mining (KDDM), the first approach, uses pattern discovery in large amounts of clinical and biomedical data and entails the use of software tools that facilitate the extraction, sampling, and large-scale cleaning and preprocessing of data. KDDM also makes use of specialized analytic methods, characteristically machine learning methods, to identify patterns in a semiautomated fashion. This level of analysis far exceeds the types of descriptive summaries typically presented by dashboard applications, such as a clinical summary for a patient. Instead, KDDM is used to build tools that support clinical decision making, generate hypotheses for scientific evaluation, and identify links between genotype and phenotype. KDDM can also be used to “patch” weaknesses in clinical data that pose a barrier to research. For example, if poor data quality is a barrier to automatic identification of patients with type II diabetes from diagnostic codes, a machine learning approach could be used to more completely and accurately identify the patients on the basis of text documents and laboratory and medication data.

Practice-based evidence (PBE) is an example of the second approach. PBE studies are observational cohort studies that attempt to mitigate the weaknesses traditionally associated with observational designs. This is accomplished by exhaustive attention to determining patient characteristics that may confound conclusions about the effectiveness of an intervention.

For example, observational studies might indicate that aerobic exercise is superior to nonaerobic exercise in preventing falls. But if the prescribers tend to order nonaerobic exercise for those who are more debilitated, severity of illness is a confounder and should be controlled in the analysis. PBE studies use large samples and diverse sources of patients to improve sample representativeness, power, and external validity. Generally there are 800 or more subjects, which is considerably more than in a typical RCT but far less than in a KDDM study. PBE uses approaches similar to community-based participatory research by including front-line clinicians and patients in the design, execution, and analysis of studies, as well as their data elements, to improve relevance to real-world practice. Finally, PBE uses detailed standardized structured documentation of interventions, which is ideally incorporated into the standard electronic documentation.

This method requires training and quality control checks for reliability of the measures of the actual process of care. Statistical analysis involves determining bivariate and multivariate correlations among patient characteristics, intervention process steps, and outcomes. PBE can uncover best practices and combinations of treatments for specific types of patients while achieving many of the presumed advantages of RCTs, especially the presumed advantage that RCTs control for patient differences through randomization. Front-line clinicians treating the study patients lead the study design and analyses of the data prospectively based on clinical expertise, rather than relying on machines to detect patterns as in KDDM. The characteristics of KDDM and PBE are summarized in Table 4-1. Both techniques are detailed in the following sections.

Knowledge Discovery and Data Mining

KDDM is a process in which machine learning and statistical methods are applied to analyze large amounts of data. Frequently, the goal of analysis is to develop models that predict future events or infer missing information based on available data. Methods of KDDM are preferred for this type of endeavor because they are effective for analyzing very large repositories of clinical data and for analyzing complex, nonlinear relationships. Models developed on the basis of routinely collected clinical data are advantageous for several reasons:

1.KDDM models access and leverage the valuable information contained in large repositories of clinical data

2.Models can be developed from very large sample sizes or entire populations

3.Models based on routinely collected data can be implemented in computerized systems to support decision making for individual patients

4.Models induced directly from data using machine learning methods often perform better than models manually developed by human experts

For example, Walton and colleagues developed a model that forecasts an impending respiratory syncytial virus (RSV) outbreak.4 RSV is a virus that causes bronchiolitis in children, and severe cases warrant hospitalization. RSV outbreaks cause dramatic increases in census at pediatric hospitals, so advance warning of an impending RSV outbreak would allow pediatric hospitals to plan staffing and supplies. Some evidence indicates that weather is related to outbreaks of RSV and RSV outbreaks are known to follow a biennial pattern, information that may be useful for predicting outbreaks in advance. Given these circumstances the authors built a model using historical data that predicts RSV outbreaks up to 3 weeks in advance. These types of models can be especially effective in designing clinical decision support (CDS) systems. CDS systems are computer applications that assist healthcare providers in making clinical decisions about patients and are explained in detail in Chapter 10. The design of individual CDS systems varies and can be as simple as an alert that warns about potential drug–drug interaction.5 Every CDS system is based on some underlying algorithm or rules and on existing or entered patient data. These rules must be specified in machine-readable code that is compatible with patient data stored in an EHR or other applications. Historically, clinical practice guidelines have not been expressed as a set of adequately explicit rules and could not be executed by a machine. See, for example, Lyng and Pederson and Isern and Moreno for a detailed discussion of this issue.6,7 While a human being can reason on the basis of conditions such as “moderate improvement” or “diminished level of consciousness,” a machine cannot. CDS models must consist of rules, conditions, and dependencies described in terms of machine-interpretable relationships and specific data values. Moreover, the algorithms and rules must be executable over the data as they are coded in the information system.

For example, gender may be included in a set of rules. If the rule is based on a gender variable coded with the values male, female, or unknown, it will not work in a system where gender is coded as 0, 1, 2, 3, or null, where 0 = male, 1 = female, 2 = transgender, 3 = unknown, and null = missing. While relatively simple changes could adapt the rule set for use in a system with different coding of gender, other variables pose a greater challenge. Some necessary variables may not exist as coded data in an information system or may be represented in a variety of ways that cannot be resolved as easily as gender can be.

In recent years there has been a substantial effort to develop computer-interpretable guidelines—guidelines that are expressed as an adequately explicit set of rules—with some success.8 KDDM is also advantageous in this situation because it develops only machine-executable algorithms or rules, based on native data. Every model could potentially be used in a CDS system. Moreover, in situations where there is insufficient evidence to fully specify rules, the rules can be induced from a large sample of real-life examples using KDDM.

Retrieving a Dataset for Analysis

The process of KDDM, depicted in Figure 4-1, encompasses multiple steps and actions. Data must first be extracted from the clinical data warehouse. To review, data warehouses are a complex, vast collection of databases and it is usually necessary to join a number of tables to construct a usable dataset for KDDM purposes. To accomplish this investigators must collaborate closely with informatics specialists to develop effective queries, queries that select the clinical data relevant to the specific KDDM project with a sufficient but not overwhelming sample size.

To request the appropriate data, investigators and clinicians first need to understand how the concepts of interest are represented (coded) in the health IT product. In many health IT products, for example, laboratory tests are coded according to the standard Logical Observation Identifier Names and Codes (LOINC) terminology.9 To ensure that the extracted dataset contains urinalysis data, for example, it will be necessary to first determine how and where a urinalysis is coded. For example, in the case of Veterans Health Administration (VHA) data, this may entail identification of the LOINC codes used to represent urinalysis results. For less precise concepts, such as mental health diagnoses, multiple codes may be relevant. Some data may not be structured and may be captured only in text documents such as discharge summaries. Information extraction from these documents can be accomplished and represents an active area of research and development.10

Queries written in a specialized programming language (Structured Query Language or SQL) are used to retrieve data from the data warehouse according to a researcher's specifications. Currently, investigators and healthcare organization IT personnel collaborate to develop effective queries. The code used to execute the query is saved as a file and can be reused in the future or repeatedly reused on a scheduled basis. In some cases healthcare organizations opt to support ongoing investigator data needs by creating separate repositories of aggregated, processed clinical data that relate to a particular clinical domain. In the VHA, investigators in infectious disease have developed procedures to aggregate a specialized set of nationwide patient data related to methicillin-resistant Staphylococcus aureus (MRSA).11 These specialized repositories of data can be more readily analyzed on an ongoing basis to support quality improvement, health services research, and clinical research.

The amount of data retrieved from clinical data warehouses can be enormous, especially when data originate from multiple sites. Investigators will want to define a sampling plan that limits the number of selected records, according to the needs of the study or project. For KDDM, it may not be possible to import the data fully into analytic software as a single flat file. Fortunately, many statistical and machine learning software packages can be used to analyze data contained within an SQL database. For example, SAS Enterprise Miner can be used to analyze data within an SQL database using Open Database Connectivity (ODBC).12 Clinicians or investigators who are new to KDDM should plan to collaborate with statistical and informatics personnel to plan an optimal approach.

Homework is Completed By:

Writer	Writer Name	Amount	Client Comments & Rating
ONLINE	Instant Homework Helper 4.8 4305 Orders Completed	$36	She helped me in last minute in a very reasonable price. She is a lifesaver, I got A+ grade in my homework, I will surely hire her again for my next assignments, Thumbs Up! 5.00
Answer.docx Turnitin Report.pdf Contact Writer For Solution Contact Writer For Solution

Order & Get This Solution Within 3 Hours in $25/Page

Custom Original Solution And Get A+ Grades

100% Plagiarism Free
Proper APA/MLA/Harvard Referencing
Delivery in 3 Hours After Placing Order
Free Turnitin Report
Unlimited Revisions
Privacy Guaranteed

Order Now

Order & Get This Solution Within 6 Hours in $20/Page

Custom Original Solution And Get A+ Grades

100% Plagiarism Free
Proper APA/MLA/Harvard Referencing
Delivery in 6 Hours After Placing Order
Free Turnitin Report
Unlimited Revisions
Privacy Guaranteed

Order Now

Order & Get This Solution Within 12 Hours in $15/Page

Custom Original Solution And Get A+ Grades

100% Plagiarism Free
Proper APA/MLA/Harvard Referencing
Delivery in 12 Hours After Placing Order
Free Turnitin Report
Unlimited Revisions
Privacy Guaranteed

Order Now

6 writers have sent their proposals to do this homework:

Writer	Writer Name	Offer	Chat
ONLINE	Quality Homework Helper Hi dear, I am ready to do your homework in a reasonable price. 4.8 1449 Orders Completed	$62	Chat With Writer
ONLINE	Instant Assignments Hey, I can write about your given topic according to the provided requirements. I have a few more questions to ask as if there is any specific instructions or deadline issue. I have already completed more than 250 academic papers, articles, and technical articles. I can provide you samples. I believe my capabilities would be perfect for your project. I can finish this job within the necessary interval. I have four years of experience in this field. If you want to give me the project I had be very happy to discuss this further and get started for you as soon as possible. 4.8 1869 Orders Completed	$55	Chat With Writer
ONLINE	Custom Coursework Service Hey, Hope you are doing great :) I have read your project description. I am a high qualified writer. I will surely assist you in writing paper in which i will be explaining and analyzing the formulation and implementation of the strategy of Nestle. I will cover all the points which you have mentioned in your project details. I have a clear idea of what you are looking for. The work will be done according to your expectations. I will provide you Turnitin report as well to check the similarity. I am familiar with APA, MLA, Harvard, Chicago and Turabian referencing styles. I have more than 5 years’ experience in technical and academic writing. Please message me to discuss further details. I will be glad to assist you out. 2.9 105 Orders Completed	$55	Chat With Writer
ONLINE	Helping Hand I am an Academic writer with 10 years of experience. As an Academic writer, my aim is to generate unique content without Plagiarism as per the client’s requirements. 4.7 1701 Orders Completed	$60	Chat With Writer
ONLINE	Homework Guru Hi dear, I am ready to do your homework in a reasonable price and in a timely manner. 4.8 2373 Orders Completed	$62	Chat With Writer
ONLINE	Top Essay Tutor I have more than 12 years of experience in managing online classes, exams, and quizzes on different websites like; Connect, McGraw-Hill, and Blackboard. I always provide a guarantee to my clients for their grades. 4.7 9702 Orders Completed	$65	Chat With Writer