Discovery phase in data analytics

27/10/2021

Data Analytics Lifecycle

Read Chapter 2 - Data Analytics Lifecycle and answer the following questions.
1. In which phase would the team expect to invest most of the project time? Why? Where would the team expect to spend the least time?

2. What are the benefits of doing a pilot program before a full-scale rollout of a new analytical method- ology? Discuss this in the context of the mini case study.

3. What kinds of tools would be used in the following phases, and for which kinds of use scenarios?

a.Phase 2: Data preparation
b.Phase 4: Model building

- Typed in a word document.
- Each question should be answered in not less than 150 - 200 words.
- Follow APA format.
- Please include at least three (3) reputable sources.

Data Analytics Lifecycle

1 M

odule 2: Data Analytics Lifecycle

Copyright © 2014 EM

C Corporation. All Rights Reserved.

Data Analytics Lifecycle

U pon com

pletion of this lesson, you should be able to: •

Apply the Data Analytics Lifecycle to a case study scenario •

Fram e a business problem

as an analytics problem

• Identify the four m

ain deliverables in an analytics project

M odule 2: Data Analytics Lifecycle


Copyright © 2014 EM

C Corporation. All Rights Reserved.

How to Approach Your Analytics Problem


• How

do you currently approach your analytics problem


• Do you follow

a m ethodology or

som e kind of fram

ew ork?

• How

do you plan for an analytic project?

3 M

odule 2: Data Analytics Lifecycle

Your Thoughts?

Copyright © 2014 EM

C Corporation. All Rights Reserved.

• Focus your tim


• Ensure rigor and com


• Enable better transition to m

em bers of the cross-functional

analytic team s

Repeatable Scale to additional analysts Support validity of findings


“A journey of a thousand m iles begins w

ith a single step“ (Lao Tzu)

M odule 2: Data Analytics Lifecycle

Value of U sing the Data Analytics Lifecycle

Copyright © 2014 EM

C Corporation. All Rights Reserved. 5

1. W

ell-defined processes can help guide any analytic project

2. Focus of Data Analytics Lifecycle is on Data Science projects, not business intelligence

3. Data Science projects tend to require a m

ore consultative approach, and differ in a few

w ays

M ore due diligence in Discovery phase

M ore projects w

hich lack shape or structure Less predictable data

N eed For a Process to Guide Data Science Projects

5 M

odule 2: Data Analytics Lifecycle

Copyright © 2014 EM

C Corporation. All Rights Reserved.

Key Roles for a Successful Analytic Project

M odule 2: Data Analytics Lifecycle


Role Description

Business U ser

Som eone w

ho benefits from the end results

and can consult and advise project team on

value of end results and how these w

ill be operationalized

Project Sponsor Person responsible for the genesis of the project, providing the im

petus for the project and core business problem

, generally provides the funding and w

ill gauge the degree of value from

the final outputs of the w orking team

Project M anager

Ensure key m ilestonesand objectives are m

et on tim e and at expected quality.

BusinessIntelligence Analyst

Businessdom ain expertise w

ith deep understanding of the data,KPIs, key m etrics and

business intelligence from a reporting perspective

Data Engineer Deep technical skills to assist w

ith tuning SQ L queries for data m

anagem ent, extraction and

support data ingest to analytic sandbox

Database Adm

inistrator (DBA) Database Adm

inistratorw ho provisions and configures database environm

ent to support the analytical needs of the w

orking team

Data Scientist Provide subject m

atter expertise for analytical techniques, data m

odeling, applying valid

analytical techniques to given business problem s and ensuring overall analytical objectives

are m et

Copyright © 2014 EM

C Corporation. All Rights Reserved.

Data Analytics Lifecycle

M odule 2: Data Analytics Lifecycle


D iscovery

O perationalize

M odel

P lanning

D ata P


M odel

B uilding

C om

m unicate

R esults

Do I have enough inform

ation to draft an analytic plan and share for

peer review ?

Do I have enough good

quality data to start building the m


Do I have a good idea about the type of m

odel to try? Can I refine the

analytic plan?

Is the m odel robust

enough? Have w e

failed for sure?





Copyright © 2014 EM

C Corporation. All Rights Reserved.

Data Analytics Lifecycle Phase 1: Discovery

M odule 2: Data Analytics Lifecycle


D iscovery

O perationalize

M odel

P lanning

D ata P


M odel

B uilding

C om

m unicate

R esults

Do I have enough inform

ation to draft an analytic plan and share for

peer review ?

Do I have enough good

quality data to start building the m


Do I have a good idea about the type of m

odel to try? Can I refine the

analytic plan?

Is the m odel robust

enough? Have w e

failed for sure?

• Learn the Business Dom

ain Determ

ine am ount of dom

ain know ledge needed to orient you to the data and

interpret results dow nstream

Determ ine the general analytic problem

type (such as clustering, classification) If you don’t know, then conduct initial research to learn about the dom

ain area you’ll be analyzing

• Learn from

the past Have there been previous attem

pts in the organization to solve this problem ?

If so, w hy did they fail? W

hy are w e trying again? How

have things changed?


Copyright © 2014 EM

C Corporation. All Rights Reserved.

Data Analytics Lifecycle Phase 1: Discovery

M odule 2: Data Analytics Lifecycle


D iscovery

O perationalize

M odel

P lanning

D ata P


M odel

B uilding

C om

m unicate

R esults

Do I have enough inform

ation to draft an analytic plan and share for

peer review ?

Do I have enough good

quality data to start building the m


Do I have a good idea about the type of m

odel to try? Can I refine the

analytic plan?

Is the m odel robust

enough? Have w e

failed for sure?

• Resources

Assess available technology Available data –

sufficient to m eet your needs

People for the w orking team

Assess scope of tim e for the project in calendar tim

e and person-hours Do you have sufficient resources to attem

pt the project? If not, can you get m



Copyright © 2014 EM

C Corporation. All Rights Reserved.

Data Analytics Lifecycle Phase 1: Discovery

M odule 2: Data Analytics Lifecycle


D iscovery

O perationalize

M odel

P lanning

D ata P


M odel

B uilding

C om

m unicate

R esults

Do I have enough inform

ation to draft an analytic plan and share for

peer review ?

Do I have enough good

quality data to start building the m


Do I have a good idea about the type of m

odel to try? Can I refine the

analytic plan?

Is the m odel robust

enough? Have w e

failed for sure?

• Fram

e the problem …

..Fram ing is the process of stating the analytics problem

to be solved

State the analytics problem , w

hy it is im portant, and to w

hom Identify key stakeholders and their interests in the project Clearly articulate the current situation and pain points O

bjectives – identify w

hat needs to be achieved in business term s and w

hat needs to be done to m

eet the needs W

hat is the goal? W hat are the criteria for success? W

hat’s “good enough”? W

hat is the failure criterion (w hen do w

e just stop trying or settle for w hat w

e have)?

Identify the success criteria, key risks, and stakeholders (such as RACI)


Copyright © 2014 EM

C Corporation. All Rights Reserved.

Tips for Interview ing the Analytics Sponsor

• Even if you are “given” an analytic problem

you should w ork w

ith clients to clarify and fram

e the problem You’re typically handed solutions, you need to identify the problem

and their desired outcom e

Sponsor Interview Tips

• Prepare for the interview

– draft your questions, review

w ith colleague, team

• U

se open-ended questions, don’t ask leading questions •

Probe for details, follow -up

• Don’t fill every silence –

give them tim

e to think •

Let them express their ideas, don’t put w

ords in their m outh, let them

share their feelings •

Ask clarifying questions, ask w hy –

is that correct? Am I on target? Is there anything else?

• U

se active listening – repeat it back to m

ake sure you heard it correctly •

Don’t express your opinions •

Be m indful of your body language and theirs –

use eye contact, be attentive •

M inim

ize distractions •

Docum ent w

hat you heard and review it back w

ith the sponsor

1111 M

odule 2: Data Analytics Lifecycle

Copyright © 2014 EM

C Corporation. All Rights Reserved.

Tips for Interview ing the Analytics Sponsor

Interview Q


• W

hat is the business problem you’re trying to solve?

• W

hat is your desired outcom e?

• W

ill the focus and scope of the problem change if the follow

ing dim ensions

change: •

Tim e –

analyzing 1 year or 10 years w orth of data?

• People –

how w

ould this project change this? •

Risk – conservative to aggressive

• Resources –

none to unlim ited (tools, tech, …

..) •

Size and attributes of Data

• W

hat data sources do you have? •

W hat industry issues m

ay im pact the analysis?

• W

hat tim elines are you up against?

• W

ho could provide insight into the project? Consulted? •

W ho has final say on the project?

1212 M

odule 2: Data Analytics Lifecycle

Copyright © 2014 EM

C Corporation. All Rights Reserved.

Data Analytics Lifecycle Phase 1: Discovery

M odule 2: Data Analytics Lifecycle


D iscovery

O perationalize

M odel

P lanning

D ata P


M odel

B uilding

C om

m unicate

R esults

Do I have enough inform

ation to draft an analytic plan and share for

peer review ?

Do I have enough good

quality data to start building the m


Do I have a good idea about the type of m

odel to try? Can I refine the

analytic plan?

Is the m odel robust

enough? Have w e

failed for sure?

• Form

ulate Initial Hypotheses IH, H

1 , H 2, H

3 , … H


Gather and assess hypotheses from stakeholders and

dom ain experts

Prelim inary data exploration to inform

discussions w ith

stakeholders during the hypothesis form ing stage

• Identify Data Sources –

Begin Learning the Data Aggregate sources for preview

ing the data and provide high-level understanding Review

the raw data

Determ ine the structures and tools needed

Scope the kind of data needed for this kind of problem


Copyright © 2014 EM

C Corporation. All Rights Reserved.

U sing a Sam

ple Case Study to Track the Phases in the Data Analytics Lifecycle

Situation Synopsis

• Retail Bank, Yoyodyne Bank w

ants to im prove the N

et Present Value (N

PV) and retention rate of custom ers

• They w

ant to establish an effective m arketing cam

paign targeting custom

ers to reduce the churn rate by at least five percent

• The bank w

ants to determ ine w

hether those custom ers are w

orth retaining. In addition, the bank also w

ants to analyze reasons for custom

er attrition and w hat they can do to keep them

• The bank w

ants to build a data w arehouse to support M

arketing and other related custom

er care groups


M ini C

ase Study: C hurn Prediction for

Yoyodyne B ank

14 M

odule 2: Data Analytics Lifecycle

Copyright © 2014 EM

C Corporation. All Rights Reserved.

How to Fram

e an Analytics Problem

Sam ple

Business Problem s

Q ualifiers

Analytical Approach

• How

can w e im

prove on x? •

W hat’s happening real-tim

e? Trends?

• How

can w e use analytics

differentiate ourselves •

How can w

e use analytics to innovate?

• How

can w e stay ahead of our

biggest com petitor?

W ill the focus and scope of the problem

change if the follow

ing dim ensions change:

• Tim

e •

People –

how w

ould x change this? •

Risk – conservative/aggressive

• Resources –

none/unlim ited

• Size of Data?

Define an analytical approach, including key term

s, m etrics, and

data needed.

Yoyodyne Bank How

can w

e im prove

N et Present Value (N

PV) and retention rate of the custom


• Tim

e: Trailing 5 m onths

• People: W

orking team and business users

from the

Bank •

Risk: the projectw ill fail if w

e cannot determ

ine valid predictors of churn •

Resources: EDW , analytic

sandbox, O LTP

system •

Data:U se 24 m

onths for the training set, then analyze 5 m

onths of historical data for those custom

ers w ho churned

How do w

e identify churn/no churn for a custom


Pilot study follow ed

full scale analytical m


1515 M

odule 2: Data Analytics Lifecycle

C hurn Prediction for Yoyodyne B


M ini C

ase Study

Copyright © 2014 EM

C Corporation. All Rights Reserved.

Data Analytics Lifecycle Phase 2: Data Preparation

M odule 2: Data Analytics Lifecycle


D iscovery

O perationalize

M odel

P lanning

D ata P


M odel

B uilding

C om

m unicate

R esults

Do I have enough inform

ation to draft an analytic plan and share for

peer review ?

Do I have enough good

quality data to start building the m


Do I have a good idea about the type of m

odel to try? Can I refine the

analytic plan?

Is the m odel robust

enough? Have w e

failed for sure?

• Prepare Analytic Sandbox

W ork space for the analytic team

10x+ vs. EDW

• Perform

ELT Determ

ine needed transform ations

Assess data quality and structuring Derive statistically useful m

easures Determ

ine and establish data connections for raw

data Execute Big ELT and/or Big ETL

• U

seful Tools for this phase: •

For D ata Transform

ation & C

leansing: S Q

L, H adoop, M

apR educe, A

lpine M iner


Copyright © 2014 EM

C Corporation. All Rights Reserved.

Data Analytics Lifecycle Phase 2: Data Preparation

M odule 2: Data Analytics Lifecycle


D iscovery

O perationalize

M odel

P lanning

D ata P


M odel

B uilding

C om

m unicate

R esults

Do I have enough inform

ation to draft an analytic plan and share for

peer review ?

Do I have enough good

quality data to start building the m


Do I have a good idea about the type of m

odel to try? Can I refine the

analytic plan?

Is the m odel robust

enough? Have w e

failed for sure?

• Fam

iliarize yourself w ith the data thoroughly

List your data sources W

hat’s needed vs. w hat’s available

• Data Conditioning

Clean and norm alize data

Discern w hat you keep vs. w

hat you discard •

Survey & Visualize

O verview, zoom

& filter, details-on-dem

and Descriptive Statistics Data Q


• U

seful Tools for this phase: •

D escriptive S

tatistics on candidate variables for diagnostics & quality

• Visualization: R

(base package, ggplot and lattice), G nuP

lot, G gobi/R

ggobi, S potfire,



Copyright © 2014 EM

C Corporation. All Rights Reserved.

Data Analytics Lifecycle Phase 3: M

odel Planning

M odule 2: Data Analytics Lifecycle


D iscovery

O perationalize

M odel

P lanning

D ata P


M odel

B uilding

C om

m unicate

R esults

Do I have enough inform

ation to draft an analytic plan and share for

peer review ?

Do I have enough good

quality data to start building the m


Do I have a good idea about the type of m

odel to try? Can I refine the

analytic plan?

Is the m odel robust

enough? Have w e

failed for sure?

• Determ

ine M ethods

Select m ethods based on hypotheses, data

structure and volum e

Ensure techniques and approach w ill m

eet business objectives

• Techniques &

W orkflow

Candidate tests and sequence Identify and docum

ent m odeling

assum ptions

• U

seful Tools for this phase: R /P

ostgresS Q

L, S Q


nalytics, A lpine M

iner, S A

S /A



S , S


S /O




Copyright © 2014 EM

C Corporation. All Rights Reserved.

Data Analytics Lifecycle Phase 3: M

odel Planning

M odule 2: Data Analytics Lifecycle


D iscovery

O perationalize

M odel

P lanning

D ata P


M odel

B uilding

C om

m unicate

R esults

Do I have enough inform

ation to draft an analytic plan and share for

peer review ?

Do I have enough good

quality data to start building the m


Do I have a good idea about the type of m

odel to try? Can I refine the

analytic plan?

Is the m odel robust

enough? Have w e

failed for sure?

• Data Exploration

• Variable Selection

Inputs from stakeholders and dom

ain experts Capture essence of the predictors, leverage a technique for dim

ensionality reduction Iterative testing to confirm

the m ost

significant variables

• M

odel Selection Conversion to SQ

L or database language for best perform

ance Choose technique based on the end goal


Copyright © 2014 EM

C Corporation. All Rights Reserved.

Sam ple Research: Churn Prediction in O

ther Verticals

M arket Sector

Analytic Techniques/M ethods U


W ireless Telecom


ethod (data m ining by evolutionary learning)

Retail Business Logistic regression, ARD (autom

atic relevance determ

ination), decision tree

Daily Grocery M

LR (m ultiple linear regression), ARD, and decision tree

W ireless Telecom

N eural netw

ork, decision tree, hierarchical neurofuzzy system s, rule evolver

Retail Banking M

ultiple regression

W ireless Telecom

Logistic regression, neural netw ork, decision tree

2020 M

odule 2: Data Analytics Lifecycle

M ini C

ase Study: C

hurn Prediction for Yoyodyne B


• After conducting research on churn prediction, you have identified m

any m

ethods for analyzing custom er churn across m

ultiple verticals (those in bold

are taught in this course)

• At this point, a Data Scientist w

ould assess the m ethods and select the best

m odel for the situation

Copyright © 2014 EM

C Corporation. All Rights Reserved.

Data Analytics Lifecycle Phase 4: M

odel Building

M odule 2: Data Analytics Lifecycle


D iscovery

O perationalize

M odel

P lanning

D ata P


M odel

B uilding

C om

m unicate

R esults

Do I have enough inform

ation to draft an analytic plan and share for

peer review ?

Do I have enough good

quality data to start building the m


Do I have a good idea about the type of m

odel to try? Can I refine the

analytic plan?

Is the m odel robust

enough? Have w e

failed for sure?

• Develop data sets for testing, training, and production purposes

N eed to ensure that the m

odel data is sufficiently robust for the m odel

and analytical techniques Sm

aller, test sets for validating approach, training set for initial experim

ents •

