Singapore Management University Institutional Knowledge at Singapore Management University Research Collection Lee Kong Chian School Of Business
Lee Kong Chian School of Business
10-2016
Big data and data science methods for management research: From the Editors Gerard GEORGE Singapore Management University, ggeorge@smu.edu.sg
Ernst C. OSINGA Singapore Management University, ecosinga@smu.edu.sg
Dovev LAVIE Technion
Brent A. SCOTT Michigan State University DOI: https://doi.org/10.5465/amj.2016.4005
Follow this and additional works at: https://ink.library.smu.edu.sg/lkcsb_research
Part of the Management Sciences and Quantitative Methods Commons, and the Strategic Management Policy Commons
This Editorial is brought to you for free and open access by the Lee Kong Chian School of Business at Institutional Knowledge at Singapore Management University. It has been accepted for inclusion in Research Collection Lee Kong Chian School Of Business by an authorized administrator of Institutional Knowledge at Singapore Management University. For more information, please email libIR@smu.edu.sg.
Citation GEORGE, Gerard; Ernst C. OSINGA; LAVIE, Dovev; and SCOTT, Brent A.. Big data and data science methods for management research: From the Editors. (2016). Academy of Management Journal. 59, (5), 1493-1507. Research Collection Lee Kong Chian School Of Business. Available at: https://ink.library.smu.edu.sg/lkcsb_research/4964
https://ink.library.smu.edu.sg?utm_source=ink.library.smu.edu.sg%2Flkcsb_research%2F4964&utm_medium=PDF&utm_campaign=PDFCoverPages
https://ink.library.smu.edu.sg/lkcsb_research?utm_source=ink.library.smu.edu.sg%2Flkcsb_research%2F4964&utm_medium=PDF&utm_campaign=PDFCoverPages
https://ink.library.smu.edu.sg/lkcsb_research?utm_source=ink.library.smu.edu.sg%2Flkcsb_research%2F4964&utm_medium=PDF&utm_campaign=PDFCoverPages
https://ink.library.smu.edu.sg/lkcsb?utm_source=ink.library.smu.edu.sg%2Flkcsb_research%2F4964&utm_medium=PDF&utm_campaign=PDFCoverPages
https://doi.org/10.5465/amj.2016.4005
https://ink.library.smu.edu.sg/lkcsb_research?utm_source=ink.library.smu.edu.sg%2Flkcsb_research%2F4964&utm_medium=PDF&utm_campaign=PDFCoverPages
http://network.bepress.com/hgg/discipline/637?utm_source=ink.library.smu.edu.sg%2Flkcsb_research%2F4964&utm_medium=PDF&utm_campaign=PDFCoverPages
http://network.bepress.com/hgg/discipline/642?utm_source=ink.library.smu.edu.sg%2Flkcsb_research%2F4964&utm_medium=PDF&utm_campaign=PDFCoverPages
http://network.bepress.com/hgg/discipline/642?utm_source=ink.library.smu.edu.sg%2Flkcsb_research%2F4964&utm_medium=PDF&utm_campaign=PDFCoverPages
mailto:libIR@smu.edu.sg
1
FROM THE EDITORS
BIG DATA AND DATA SCIENCE METHODS FOR MANAGEMENT RESEARCH
Published in Academy of Management Journal, October 2016, 59 (5), pp. 1493-1507.
http://doi.org/10.5465/amj.2016.4005
The recent advent of remote sensing, mobile technologies, novel transaction systems, and
high performance computing offers opportunities to understand trends, behaviors, and actions in
a manner that has not been previously possible. Researchers can thus leverage 'big data' that are
generated from a plurality of sources including mobile transactions, wearable technologies,
social media, ambient networks, and business transactions. An earlier AMJ editorial explored the
potential implications for data science in management research and highlighted questions for
management scholarship, and the attendant challenges of data sharing and privacy (George, Haas
& Pentland, 2014). This nascent field is evolving rapidly and at a speed that leaves scholars and
practitioners alike attempting to make sense of the emergent opportunities that big data holds.
With the promise of big data come questions about the analytical value and thus relevance of this
data for theory development -- including concerns over the context-specific relevance, its
reliability and its validity.
To address this challenge, data science is emerging as an interdisciplinary field that
combines statistics, data mining, machine learning, and analytics to understand and explain how
we can generate analytical insights and prediction models from structured and unstructured big
data. Data science emphasizes the systematic study of the organization, properties, and analysis
of data and its role in inference, including our confidence in the inference (Dhar, 2013). Whereas
both big data and data science terms are often used interchangeably, big data is about collecting
and managing large, varied data while data science develops models that capture, visualize, and
http://doi.org/10.5465/amj.2016.4005
2
analyze the underlying patterns to develop business applications. In this editorial, we address
both the collection and handling of big data and the analytical tools provided by data science for
management scholars.
At the current time, practitioners suggest that data science applications tackle the three
core elements of big data: volume, velocity, and variety (McAfee & Brynjolfsson, 2012;
Zikopoulos & Eaton, 2011). Volume represents the sheer size of the dataset due to the
aggregation of a large number of variables and an even larger set of observations for each
variable. Velocity reflects the speed at which these data are collected and analyzed, whether real-
time or near real-time from sensors, sales transactions, social media posts and sentiment data for
breaking news and social trends. Variety in big data comes from the plurality of structured and
unstructured data sources such as text, videos, networks, and graphics among others. The
combinations of volume, velocity and variety reveal the complex task of generating knowledge
from big data, which often runs into millions of observations, and deriving theoretical
contributions from such data. In this editorial, we provide a primer or a “starter kit” for potential
data science applications in management research. We do so with a caveat that emerging fields
outdate and improve upon methodologies while often supplanting them with new applications.
Nevertheless, this primer can guide management scholars who wish to use data science
techniques to reach better answers to existing questions or explore completely new research
questions.
BIG DATA, DATA SCIENCE, AND MANAGEMENT THEORY
Big data and data science have potential as new tools for developing management theory,
but given the differences from existing data collection and analytical techniques to which
3
scholars are socialized in doctoral training it will take more effort and skill in adapting new
practices. The current model of management research is post hoc analysis, wherein scholars
analyze data collected after the temporal occurrence of the event – a manuscript is drafted
months or years after the original data are collected. Therefore, velocity or the real-time
applications important for management practice is not a critical concern for management
scholars in the current research paradigm. However, data volume and data variety hold potential
for scholarly research. Particularly, these two elements of data science can be transposed as data
scope and data granularity for management research.
Data Scope. Building on the notion of volume, data scope refers to the
comprehensiveness of data by which a phenomenon can be examined. Scope could imply a wide
range of variables, holistic populations rather than sampling, or numerous observations on each
participant. By increasing the number of observations, a higher data scope can shift the analysis
from samples to populations. Thus, instead of focusing on sample selection and biases,
researchers could potentially collect data on the complete population. Within organizations,
many employers collect data on their employees, and more data is being digitized and made
accessible. This includes email communication, office entry and exit, RFID tagging, wearable
sociometric sensors, web browsers, and phone calls, which enable researchers to tap into large
databases on employee behavior on a continuous basis. Researchers have begun to examine the
utility and psychometric properties of such data collection methods, which is critical if they are
to be incorporated into and tied to existing theories and literatures. For example, Chaffin et al. (in
press) examined the feasibility of using wearable sociometric sensors, which use a Bluetooth
sensor to measure physical proximity, an infrared detector to assess face-to-face positioning
between actors, and a microphone to capture verbal activity, to detect structure within a social
4
network. As another example, researchers have begun to analyze large samples of language (e.g.,
individuals’ posts on social media) as a non-obtrusive way to assess personality (Park et al.,
2015). With changes in workplace design, communication patterns, and performance feedback
mechanisms, we have called for research on how businesses are harnessing technologies and data
to shape employee experience and talent management systems (Colbert, Yee & George, 2016;
Gruber, Leon, Thompson & George, 2015).
Data Granularity. Following the notion of variety, we refer to data granularity as the
most theoretically proximal measurement of a phenomenon or unit of analysis. Granularity
implies direct measurement of constituent characteristics of a construct rather than distal
inferences from data. For example, in a study of employee stress, granular data would include
emotions through facial recognition patterns or biometrics such as elevated heart rates during
every minute on the job or task rather than surveys or respondent interviews. In experience-
sampling studies on well-being, for example, researchers have begun to incorporate portable
blood pressure monitors. For instance, in a 3-week experience-sampling study, Bono, Glomb,
Shen, Kim, and Koch (2013) had employees wear ambulatory blood pressure monitors that
recorded measurements every 30 minutes for two hours in the morning, afternoon, and evening.
Similarly, Ilies, Dimotakis, and DePater (2010) used blood pressure monitors in a field setting to
record employees’ blood pressure at the end of each workday over a two-week period. Haas,
Criscuolo and George (2015) studied message posts and derived meaning in words to predict
whether individuals are likely to contribute to problem solving and knowledge sharing across
organizational units. Researchers in other areas could also increase granularity in other ways. In
network analysis for instance, researchers can monitor communication patterns across employees
instead of asking employees with whom they interact or seek advice from retrospectively.
5
Equivalent data were earlier collected using surveys and indirect observation, but with big data
the unit of analysis shifts from individual employees to messages and physical interactions.
Though such efforts are already seen in smaller samples of emails or messages posted on a
network (e.g., Haas, Criscuolo & George, 2015), organization-wide efforts are likely to provide
clearer and holistic representations of networks, communications, friendships, advice-giving and
taking, and information flows (van Knippenberg, Dahlander, Haas & George, 2015).
Better Answers and New Questions
Together, data scope and data granularity allow management scholars to develop new
questions and new theories, and to potentially generate better answers to established questions.
In Figure 1, we portray a stylistic model of how data scope and data granularity could
productively inform management research.
6
Better Answers to Existing Questions. Data science techniques enable researchers to get more
immediate and accurate results for testing existing theories. In doing so, we hope to get more
accurate estimations of effect sizes and their contingencies. Over the past decade, management
theories have begun emphasizing effect sizes. This emphasis on precision is typically observed in
strategy research rather than in behavioral studies. With data science techniques, the precision of
effect sizes and their confidence intervals will likely be higher and can reveal nuances in
moderating effects that have hitherto not been possible to identify or estimate effectively.
Better answers could also come from establishing clearer causal mechanisms. For
instance, network studies rely on surveys of informants to assess friendship and advice ties, but
in these studies, the temporal dimension is missing, and therefore it is difficult to determine
whether network structure drives behavior or vice versa. Instead, collecting email
communications or other forms of exchange on a continuous level would enable researchers to
measure networks and behavior dynamically, and thus assess more systematically cause and
effect.
Although rare event modeling is uncommon in management research, data science
techniques could potentially shed more light on, for example, organizational responses to
disasters, modeling and estimating probability of failure, at risk behavior, and systemic resilience
(van der Vegt, Essens & Wahlstrom, 2015). Research on rare events can use motor car accident
data, for instance, to analyze the role of driver experience in seconds leading up to an accident
and how previous behaviors could be modeled to predict reaction times and responses. Insurance
companies now routinely use such data to price insurance coverage, but this type of data could
also be useful for modeling individual-level risk propensity, aggressiveness, or even avoidance
behaviors. At an aggregate level, data science approaches such as collecting driver behavior
7
using sensors to gauge actions like speeding and sudden stopping, allow more than observing
accidents, and therefore generate a better understanding of their occurrence. Such data allows
cities to plan traffic flows, map road rage or accident hot spots, and avert congestion, and
researchers to connect such data to timeliness at work, and negative or positive effects of
commuting sentiment on workplace behaviors.
Additionally, data science techniques such as monitoring call center calls can enable
researchers to identify specific triggers to certain behaviors as opposed to simply measuring
those behaviors. This can help better understand phenomena such as employee attrition. Studying
misbehavior is problematic due to sensitivity, privacy and availability of data. Yet, banks are
now introducing tighter behavioral monitoring and compliance systems that are tracked in real-
time to predict and deter misbehaviors. Scholars already examine lawsuits, fraud, and collusion,
but by using data science techniques, they can search electronic communication or press data
using keywords that characterize misbehavior in order to identify the likelihood of misbehavior
before its occurrence. As these techniques become prevalent, it will be important to tie the new
measures, and the constructs they purportedly assess, to existing theories and knowledge bases;
otherwise, we risk the emergence of separate literatures using “big” and “little” data that have the
capacity to inform each other.
New Questions. With higher scope and granularity of data, it becomes possible to explore
new questions that have not been examined in the past. This could arise because data science
allows us to introduce new constructs, but it could also arise because data science allows us to
operationalize existing constructs in a novel way. Web scraping and sentiment data from social
media posts are now being seen in the management literature, but they have yet to push scholars
to ask new questions. Granular data with high scope could open questions in new areas of
8
mobility and communications, physical space, and collaboration patterns where we could delve
deeper into causal mechanisms underlying collaboration and team dynamics, decision-making
and the physical environment, workplace design and virtual collaborations. Tracking phone
usage and physical proximity cues could provide insight into whether individuals spend too
much time on communications technology and attention allocation to social situations at work or
at home. Studies suggest that time spent on email increases anger and conflict at work and at
home (Butts, Becker & Boswell, 2015). But such work could then be extended to physical and
social contingencies, nature of work, work performance outcomes, and their quality of life
implications.
Data on customer purchase decisions and social feedback mechanisms can be
complemented with digital payments and transaction data to delve deeper into innovation and
product adoption as well as behavioral dynamics of specific customer segments. The United
Nations’ Global Pulse is harnessing data science for humanitarian action. Digital money and
transactions through mobile platforms provide a window into social and financial inclusion, such
as access to credit, energy and water purchase through phone credits, transfer of money for
goods and services, create spending profiles, identify indebtedness or wealth accumulation, and
promote entrepreneurship (Dodgson et al., 2015). Data science applications allow the delivery
and coordination of public services such as treatment for disease outbreaks, coordination across
grassroots agencies for emergency management, and provision of fundamental services such as
energy and transport. Data on carbon emissions and mobility can be superimposed for tackling
issues of climate change and optimizing transport services or traffic management systems. Such
technological advances that promote social wellbeing can also raise new questions for scholars in
9
identifying ways of organizing and ask fundamentally new questions on organization design,
social inclusion, and the delivery of services to disenfranchised communities.
New questions could emerge from existing theories. For example, once researchers can
observe and analyze email communication or online search data, they can ask questions
concerning the processes by which executives make decisions as opposed to studying the
individual/TMT characteristics that affect managerial decisions. There is room for using
unstructured data such as video and graphic data, and face recognition for emotions. Together,
these data could expand conversations beyond roles, experience, and homogeneity to political
coalitions, public or corporate sentiment, decision dynamics, message framing, issue selling,
negotiations, persuasion, and decision outcomes.
Text mining can be used when seeking to answer questions such as where do ideas or
innovations come from -- as opposed to testing whether certain conditions generate ground-
breaking innovations. This requires data mining of patent citations that can track the sources of
knowledge embedded in a given patent and its relationships with the entire population of patents.
In addition, analytics allow inference of meaning, rather than word co-occurrence, which could
be helpful in understanding cumulativeness, evolution and emergence of ideas and knowledge.
A new repertoire of capabilities is required for scholars to explore these questions and to
handle challenges posed by data scope and granularity. Data are now more easily available from
corporates and “Open Data” warehouses such as the London DataStore. These data initiatives
encourage citizens to access platforms and develop solutions using big data on public services,
mobility and geophysical mapping among others data sources. Hence, as new data sources and
analytics become available to researchers, the field of management can evolve by raising
questions that have not received attention as a result of data acce