Summary Of Journals
SECURITY ANALYTICS TOOLS AND IMPLEMENTATION SUCCESS FACTORS:
INSTRUMENT DEVELOPMENT USING DELPHI APPROACH AND EXPLORATORY
FACTOR ANALYSIS
By
Sethuraman K Srinivas
BERNARD J. SHARUM, PhD, Faculty Mentor and Chair
JOHN HERR, PhD, Committee Member
JELENA VUCETIC, PhD, Committee Member
Rhonda Capron, EdD, Dean
School of Business and Technology
A Dissertation Presented in Partial Fulfillment
Of the Requirements for the Degree
Doctor of Philosophy
Capella University
March 2018
ProQuest Number:
All rights reserved
INFORMATION TO ALL USERS The quality of this reproduction is dependent upon the quality of the copy submitted.
In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if material had to be removed,
a note will indicate the deletion.
ProQuest
Published by ProQuest LLC ( ). Copyright of the Dissertation is held by the Author.
All rights reserved. This work is protected against unauthorized copying under Title 17, United States Code
Microform Edition © ProQuest LLC.
ProQuest LLC. 789 East Eisenhower Parkway
P.O. Box 1346 Ann Arbor, MI 48106 - 1346
10807845
10807845
2018
© Sethuraman K Srinivas, 2018
Abstract
Over the past two decades, information security scientists have conducted in-depth research in
the area of security analytics to counter cyber-attacks that have challenged the security postures
of corporate networks and data. Learning from this research has immensely benefited security
analytic tools and contributed to their maturity, thereby enabling many organizations to
implement them. While adoption of these tools has increased, understanding factors that impact
the successful implementation of these tools has lagged and such understanding is critical to the
information security practice. The literature review revealed the lack of a validated survey
instrument dedicated to security analytic tools, which can help in extracting implementation
factors that security professionals would consider to be critical for success. The focus of this
research study was to develop a survey instrument and use the developed instrument to identify
factors that impact the successful implementation of any security analytic tool, including big data
based tools. Delphi method was used to develop the instrument in Phase 1 with the help of
security analytic tool experts located in the United States of America, and the same instrument
was used to collect responses during Phase 2 from practitioners located in North America. The
researcher used Delphi method for establishing content validity, exploratory factor analysis for
establishing construct validity and Cronbach’s alpha for testing reliability. The Delphi study
started with 16 experts in the first round, ended with 11 experts providing consensus in the fourth
round. An exploratory factor analysis study performed during Phase 2, involving a sample size of
206, identified seven factors that impacted the successful implementation of security analytic
tools. These factors are: large-scale security event analysis, functional utilization, incident
detection and correlation analysis, governance and chief information security officer metrics, log
source and use case management, threat and operational intelligence, real-time attack and
anomaly detection. By defining metrics for these factors, practitioners could use these factors as
key performance indicators to assess the success of security analytic tool implementation.
Exploring causal relationships among identified factors, such as threat intelligence and incident
detection, will help in tuning security analytic tools and products.
iii
Dedication
I primarily dedicate this dissertation to my spiritual master and one of the greatest
humanitarians to walk this earth, Shri Mata Amritanandamayi Devi. Her loving presence greatly
inspired me to pursue this mammoth academic effort. I also dedicate this dissertation to the Non-
Dual Brahman (Creator), the Essence in all living and nonliving beings. My late father Shri K.V.
Srinivasan was alive when I started this pursuit but passed on to ages during this doctoral
journey. An exemplar of a man he was, and he truly deserves this dedication. I am sure he will be
doubly joyed in heaven.
I also dedicate this dissertation to my vibrant mother, Smt Saraswathy Srinivasan, who
takes enormous pride in all my accomplishments. She is a truly inspiring mother, filled with
myriad talents, and her disciplining nature is the main reason that I stand as a successful man in
all walks of life. No effort of a family man is achievable without the strong support of a spouse.
My wife, Sow Subha Sethuraman, was a great and unfailing support to me throughout this
journey. She spared no efforts in providing a conducive environment for my doctoral journey, in
particular, during the intense preparation for my comprehensive exams. My son, Naveen
Sethuraman, sacrificed a lot of weekend getaways to support me and was able to be highly
independent in his homework and assignments, thereby giving me a lot of time to work on my
dissertation.
iv
Acknowledgments
I profusely thank and acknowledge the guidance and mentorship provided by Dr. Bernard
Sharum, mentor and committee chair. His patience with all my questions and concerns that arose
during this journey needs a special mention. Without his guidance, I would not have made good
progress. I also wish to thank Dr. Jelena Vucetic and Dr. John Herr for their timely support and
guidance during this journey. My sincere thanks to both of them for their encouraging approach.
I am also very thankful to Dr. Shardul Pandya, core faculty member at Capella, for his positive
and motivational words. His help in orienting me to Delphi studies and readiness to answer my
calls are greatly appreciated. My sincere thanks to Dr. Steven Brown, core faculty member in the
information assurance department at Capella, for his guidance during the Delphi study and also
during the final stages of data analysis. I hereby express my heartfelt gratitude to Dr. Tsun
Chow, faculty chair of doctoral IT Programs, for his stellar guidance that greatly enhanced the
quality of this dissertation, during the school review.
My wife, Sow Subha Sethuraman, despite being an extremely busy information
technology start-up professional, dedicated a lot of her time to this dissertation to support me.
Her help in reviewing my documents and her thoroughness in her execution is a rarity in this fast
age. I lovingly acknowledge her unflinching support.
I sincerely thank Mr. Jayakumar Muthukumarsamy (JK), chief architect and fellow at
Shutterfly Inc., who was a source of great support during my dissertation. He provided a lot of
useful ideas during the data analysis stage. My sincere thanks are due to Mr. Srinivas
Tummalapenta, chief architect and distinguished engineer, IBM managed security services. He
v
provided immense guidance during the pre-Delphi field study and also during the data analysis
phase.
Finally, I also want to thank every expert who helped me during the Delphi study rounds.
Their incisive inputs made a big contribution to the Delphi study and its output.
vi
Table of Contents
Acknowledgments.............................................................................................................. iv
List of Tables ..................................................................................................................... ix
List of Figures ..................................................................................................................... x
CHAPTER 1. INTRODUCTION ................................................................................................... 1
Background of the Problem ................................................................................................ 1
Statement of the Problem .................................................................................................... 2
Purpose of the Study ........................................................................................................... 4
Significance of the Study .................................................................................................... 5
Research Questions ............................................................................................................. 7
Definitions of Terms ........................................................................................................... 7
Research Design.................................................................................................................. 8
Assumptions and Limitations ........................................................................................... 11
Organization of the Remainder of the Study .................................................................... 13
CHAPTER 2. LITERATURE REVIEW ...................................................................................... 14
Overview ........................................................................................................................... 14
Methods of Searching ....................................................................................................... 14
Theoretical Orientation for the Study ............................................................................... 15
Review of the Literature ................................................................................................... 29
Critique of Previous Research Methods ........................................................................... 45
Findings............................................................................................................................. 47
Summary ........................................................................................................................... 51
vii
CHAPTER 3. METHODOLOGY ................................................................................................ 55
Purpose of the Study ......................................................................................................... 55
Research Questions ........................................................................................................... 57
Research Design................................................................................................................ 57
Procedures ......................................................................................................................... 59
Likert Scale Instrument ..................................................................................................... 71
Ethical Considerations ...................................................................................................... 72
Summary ........................................................................................................................... 73
CHAPTER 4. RESULTS .............................................................................................................. 74
Background ....................................................................................................................... 74
Research Questions ........................................................................................................... 74
Description of the Sample ................................................................................................. 75
Delphi Study ..................................................................................................................... 81
EFA ................................................................................................................................... 93
Summary ......................................................................................................................... 102
CHAPTER 5. DISCUSSION, IMPLICATIONS, AND RECOMMENDATIONS ................... 104
Introduction ..................................................................................................................... 104
Summary of the Results .................................................................................................. 104
Discussion of the Results ................................................................................................ 107
Findings........................................................................................................................... 114
Conclusions Based on the Results .................................................................................. 114
Limitations ...................................................................................................................... 118
viii
Implications for Practice ................................................................................................. 119
Recommendations for Further Research ......................................................................... 121
Conclusion ...................................................................................................................... 123
REFERENCES ........................................................................................................................... 124
STATEMENT OF ORIGINAL WORK ..................................................................................... 136
APPENDIX A. RESEARCHER-DESIGNED SECURITY ANALYTICS SURVEY .............. 138
ix
List of Tables
Table 1 Assessment of Security Analytics Tools ......................................................................... 47
Table 2 Delphi Panel Member Experience ................................................................................... 77
Table 3 Participant Domain Experience ....................................................................................... 80
Table 4 Participant Industry .......................................................................................................... 81
Table 5 Results of the Delphi Rounds .......................................................................................... 82
Table 6 Delphi Round 1– Descriptive Statistics ........................................................................... 84
Table 7 Delphi Round 2 – Descriptive Statistics .......................................................................... 86
Table 8 Delphi Round 3 – Descriptive Statistics .......................................................................... 89
Table 9 Delphi Round 4 – Descriptive Statistics .......................................................................... 91
Table 10 Sample Adequacy Test ................................................................................................. 93
Table 11 Total Variance Explained (SPSS) .................................................................................. 95
Table 12 Rotated Factor Matrix – SPSS Output ........................................................................... 97
Table 13 Reliability – Overall Instrument Level ........................................................................ 100
Table 14 Case Processing Summary ........................................................................................... 100
Table 15 Cronbach’s Alpha at Factor Level ............................................................................... 101
Table 16 Factor Names ............................................................................................................... 102
Table 17 Factor Definitions ........................................................................................................ 108
Table 18 Factors and Subfactors ................................................................................................. 116
x
List of Figures
Figure 1. Security analytics – Theoretical foundation. ................................................................. 17
Figure 2. Security analytics: A functional architecture. ............................................................... 51
Figure 3. High-level research design. ........................................................................................... 58
Figure 4. Delphi study process flow. ............................................................................................ 64
Figure 5. Security analytics research – A waterfall approach to orient the Delphi study
participant. ................................................................................................................. 79
Figure 6. Phase 2 – EFA study...................................................................................................... 94
Figure 7. SPSS scree plot analysis. ............................................................................................... 96
1
CHAPTER 1. INTRODUCTION
This chapter introduces the topic of assessing the implementation of security analytic
tools. There is an increasing trend towards adoption of security analytic tools among North
American organizations. This trend necessitates the creation and validation of a survey
instrument focused on security analytics leading to the identification of implementation success
factors. This chapter states the research questions in the context of the problem’s background.
These questions form the basis for further research and identification of factors. This chapter
explains the purpose and significance of this research along with research design, assumptions,
and limitations. A description of the organization of the remainder of this study is provided at the
end of this chapter
Background of the Problem
Cyber-attacks have become a daily phenomenon over the past few years, with hacker
strategies becoming very sophisticated and vicious. Botnet-based attacks, advanced persistent
threat (APT) attacks, malware attacks, denial of service (DoS) attacks, and insider attacks are
flooding corporate networks all over the world. These attacks and threats form a major part of
daily security incidents in many enterprises. Cardenas, Manadhata, and Rajan (2013) observed
that enterprises collect terabytes of security log data every day, which leads to challenges even in
storage management. Analyzing large volumes of log data to identify security anomalies that
impact the safety of corporate businesses is a greater problem. The need for enterprises to
comply with national legislation such as Sarbanes-Oxley and payment card industry (PCI)
2
mandates, combined with normal security operations, are contributing to increases in log data.
Van de Moosdijk, Wagenaar, and Final (2015) elaborated on the contribution of log management
and security analytics in fulfilling compliance laws. These laws mandate retention of many years
of log data to satisfy auditing requirements.
As Crespo and Garwood (2014) epitomized in their botnet-related article, identifying
security incidents and correlating large and real-time security data segments to extract actionable
intelligence has warranted the need for deeper and faster analytics. The size, speed, and precision
of this analytics differ based on the need of the enterprise. Mahmood and Afzal (2013) presented
a very comprehensive survey of big data security analytic tools in their analysis of security
analytic trends. Their findings portrayed the capability of big data tools in managing security
incidents.
Most assessments in security analytics deal with self-assessment by product inventors or
tool comparisons by third-party critics. Enterprises that implement market-leading security
analytic tools or in-house tools are the real benefactors of these tools. The end users from these
organizations are in a better position to assess and rate these tools’ benefits and downsides.
Scholar-practitioners in the security analytic domain also benefit immensely from an unbiased
evaluation of these tools by experts and end users.
Statement of the Problem
The research literature on the security analytics domain indicates that there is an
explosive growth of tools and in-house packages in this domain. Such tools are starting to attain
maturity in terms of ensuring the safety of enterprises. However, there is no clear set of factors
and attributes to evaluate the success of these tools within an enterprise. Nicolett and Kavanagh
(2013) studied the critical capabilities of security analytics tools, and they found a relationship
3
between tool capabilities and effectiveness. However, they did not clearly state the areas within
these tools that needed an examination to conclude a successful implementation. Mateski et al.
(2012) defined cyber threat metrics as a part of their research work for Sandia National
Laboratories. While metrics measure the effectiveness of the governance aspects of security
analytic tools, we do not have exhaustive feedback from the user community of these security
analytic tools. The first part of the research problem is a lack of structured assessment of security
analytic tools from the user’s perspective in the currently available and surveyed literature. The
second part of the problem is a lack of identified factors that determine a successful
implementation of security analytic tools according to the tool users and experts.
Shackleford (2014) surveyed the existing commercial security analytic tools from the
perspective of product architecture. Shackleford did not assess whether these tools revealed a
successful implementation, leading to the safety of an enterprise. Howarth (2014) discussed in
detail the actionable intelligence that was generated by the security analytic tools. For example,
actionable intelligence refers to recommending a list of infected endpoint devices that need to be
quarantined. Howarth’s discussion did not focus on broader factors that influence a successful
implementation of security analytic tools. Mahmood and Afzal (2013) presented a
comprehensive survey of big data security analytic tools in their analysis of security analytic
trends. Even though they focused on big data security analytics as a domain, there was no
discussion on identification of factors to assess the success of an implementation. Hence, the
research problem (i.e., determining the factors that influence a successful implementation of
security analytics tools, leading to the safety of an enterprise) is a fresh problem and a gap in the
information security industry that needs to be addressed.
4
Use cases are similar to user scenarios and are driven by the risk scenarios that a security
analytic tool vendor is trying to address. Van de Moosdijk, Wagenaar, and Final (2015) asserted
that use cases are fundamental to security analytic tools. Nicolett and Kavanagh’s (2013) as a
part of their Gartner study identified three use cases for assessing a security analytic tool. These
use cases are threat management, compliance, and tool deployment. These use cases were
standardized based on the opinion of a few research analysts and addressed only a subsection of
security analytic tool assessment problem. Shackleford’s (2014) survey focused on popular
product features such as architecture and performance, in the domain of security analytics.
Above citations point to the fact that commercial surveys are built based on popular product
features and architecture. But the imminent need is a common and broad assessment framework
that will help to remove the bias in commercial product surveys. Cybenko and Landwehr (2012)
recognized that assessment produced by above-mentioned commercial surveys are likely to be
biased towards the sponsor of the survey. Another significant gap in commercial surveys that are
cited above is the lack of validated survey instrument produced by a panel of unbiased and
anonymous experts.
Purpose of the Study
The major goals of this study are to identify factors that determine the successful
implementation of security analytic tools that could be used to ensure the security of any
enterprise. The general problem that is addressed in this research is the lack of a structured
method or instrument to assess the implementation of security analytic tools in any given
organization. Development of such an instrument can solve the broader problem of lack of a
structured assessment method in the field of security analytics. However, literature analysis
revealed that the narrower research problem is the lack of factors that can be used to assess an
5
implementation. Shackleford (2014) developed a commercial survey in the domain of security
analytics which focused more on product features and did not use technology-neutral
implementation factors. Ferketich, Phillips, and Verran (1993) suggested that a researcher
modify an existing instrument or create a new instrument in the absence of a suitable instrument
to answer the research question. The creation of a new survey instrument with a broad focus on
all generations of security analytic tools is the first step that would lead to the extraction of
implementation factors. Since there was no instrument available to survey users of security
analytic tools, a Delphi study to build a questionnaire with the help of experts in the
cybersecurity industry was the immediate first step, and it formed Phase 1 of this research study.
Significance of the Study
The assessment of the implementation of security analytic tools is the broad goal of this
research study. This study is significant due to the recent proliferation of new security analytic
tools, both from popular vendors and independent scientists trying to analyze large volumes of
data to identify cyber-attacks. Security analytic tools, in the context of this research study,
include in-house solutions, intrusion detection and prevention solutions (IDPS), commercial
security information and event management (SIEM) tools, and solutions with big data capability.
The main beneficiaries of this study will be both professionals and researchers in the information
security industry.
The ultimate goal of this study is to identify factors that determine the successful
implementation of security analytic tools. The factors are identified after an initial survey
instrument is built. This study would provide many benefits to the information security industry
and its professionals. Some of the significant benefits are (a) identifying the implementation
areas of focus for security analytic tools. For example, certain types of tools may not have strong
6
focus on correlation processes to support correlation analysis of incidents; (b) identifying factors
that will help in ensuring the protection of valuable organizational assets and personally
identifiable information; and (c) identifying factors that will help in the fulfillment of business
stakeholders requirements like log management and compliance with industry compliance laws.
A survey instrument for which the contents and constructs are validated is a necessary pre-
requisite for identifying the implementation factors. This survey instrument will benefit the
researchers in the information security domain.
Commercial surveys do not always consider user inputs from hands-on experts to build
the survey. Nicolett and Kavanagh (2013) analyzed the critical capabilities of commercial SIEM
tools as a part of the Gartner study. This Gartner study is a based on commercial opinion surveys
built using product use-cases. Nicollet and Kavanagh did not examine user perceptions of tools
and factors that users consider as significant for the implementation of these tools. However, this
research study focusses on conducting a survey with end users and extracting factors by
performing factor analysis on the resulting survey data. During Phase 1 of this research study, a
survey instrument was created with a broad focus on all three generation of security analytic
tools. This researcher recruited security analytic industry experts, who had varied
implementation experience, to be part of a Delphi panel to build the initial instrument. The
experts in the Delphi panel and the four rounds of Delphi study contributed to the rigor of Phase
1 process. The participation of experts ensured that the survey instrument provided
implementation specific insights to (a) researchers in the area of security analytics, (b) product
vendors in the field of security analytics, and (c) security analytic tool implementers in any
organization.
7
Future researchers could apply the validated final survey instrument as a starting point for
more focused research. For example, a survey instrument with a specific focus on healthcare
industry or finance industry could be built based on this broader instrument. By determining
industry-specific implementation factors, this type of research attains more depth and maturity.
Vendors and implementers of security analytic products will benefit from the instrument and
factors that are identified by this research study. Some of the benefits are (a) these factors could
provide an initial assessment structure for product assessment, leading to improved product
design; (b) metrics that are defined based on these factors could help implementers to define
service level agreements both for internal use and external use with vendors; and (c) survey
instrument from this research study could be modified and enhanced by focusing only on the
area of tool adoption. For example, after a modification and confirmatory factor analysis, this
instrument could be used to test the conformance of security analytic family of tools to
technology acceptance model (TAM-3) that was defined by Venkatesh and Bala (2008).
Research Questions
Omnibus Research Question (ORQ): What are the factors that determine the successful
implementation of security analytics tools or packages?
Research Subquestion 1 (RSQ1): What are the factors that determine the successful
implementation of non-big data security analytics tools or packages?
Research Subquestion 2 (RSQ2): What are the factors that determine the successful
implementation of big data security analytics tools or packages?
Definitions of Terms
Big data. Big data describes large volumes of high-velocity data and variable data that
require advanced technologies for data management (TechAmerica, 2012).
8
Cyber-attack. A hostile act using a computer or related networks or systems, and
intended to disrupt and/or destroy an adversary’s critical cyber systems, assets, or functions
(Hathaway et al., 2012).
Delphi study. This will be the Phase 1 study. The Delphi method is an iterative process
for consensus-building among a panel of experts who are anonymous to each other (Garson,
2014).
Fraudulent behavior. In this research context, fraudulent behavior refers to fraudulent
hacker behavior in e-banking systems (Malekpour, Khademi, & Minae-Bidgoli, 2014).
Network analytics. A game-theoretic framework for modeling offender-defender
situations in computer networks (Roy et al., 2010).
Non-big data. Any dataset that is not a big dataset. Big data is a term for massive datasets
having large and complex structures with difficulties in storing and analyzing data (Sagiroglu &
Sinanc, 2013).
Probably approximately correct (PAC). The PAC model is the initial standard for
learning programs (Valiant, 1984).
Security analytics. Use of tools, methods, and algorithms that are useful in discovering
security breaches and attacks (Talabis, McPherson, Miyamoto, & Martin, 2014).
User acceptance. User acceptance in this research context refers to four TAM-3
constructs. They are PU, PEOU, behavioral intention, and usage behavior (Ahlan & Ahmad,
2014).
Research Design
The research design for this study was using a quantitative and nonexperimental Delphi
approach in Phase 1. In Phase 2, this study was using exploratory factor analysis (EFA) and
9
reliability analysis. This study consisted of two phases. Phase 1 was a Delphi study with the goal
of arriving at a consensus. The major focus in Phase 1 was the development of a survey
questionnaire. In Phase 2, the researcher conducted an EFA survey with the help of the survey
questionnaire and subjected the data to EFA and reliability analysis to extract the factors that
impact the successful implementation of security analytic tools.
Cardenas et al. (2013) argued that security analytics is a highly advanced technological
topic. However, academic research studies and investigations about this area of security analytics
still take place in nascent and developing conditions. A detailed search by this researcher did not
yield broad and expert-validated survey instruments in the field of security analytics, suitable for
the research questions of this study. Pinsonneault and Kraemer (1993) studied surveys during the
early days of IT growth. Accordingly, scholars invariably use exploratory models to develop
concepts and models of newly arrived fields of research. It is not possible to answer the research
question pertaining to this study by setting up an experiment for the following reasons: (a)
setting up a security analytics tool or SIEM for an experiment is an expensive proposition, (b)
even if the researcher set up an experiment, it would not be possible to simulate the typical
security events and load conditions that happen in a real-world setup, and (c) any results for this
type of research from an experimental approach would not be accurate. Pinsonneault and
Kraemer proposed that researchers conduct surveys to collect data to examine relationships
between variables. However, based on the literature review that follows, no pre-existing Likert
scale survey instrument, suitable for the research questions of this study was available. Hence the
researcher used the Delphi method to develop a survey questionnaire with the help of an expert
panel consisting of security analytic experts. The researcher subsequently performed EFA and
reliability analysis.
10
Data Collection (Delphi Study and EFA Study)
Security analytics, big data, and correlation analysis are areas of applied statistics that
researchers have increasingly used in data analytic tools in the past five to seven years. These
areas and related tools have not been probed with the goal of assessing their effectiveness post-
implementation. Based on the research design requirements of this study, a survey instrument in
the field of security analytics with a broad focus on all three generations of security analytic tools
is needed. Secondary security performance data on security analytic tools, available from many
private and nonprofit organizations, will not reveal the true picture, as these surveys are biased
towards the sponsors. Cybenko and Landwehr (2012) commented that such statistics may be
skewed because there is always a vested interest by major tool vendors sponsoring those surveys.
Delphi data collection usually takes place with a panel of experts, and there might be
three to four rounds of Delphi before panel consensus emerges. In this research study, the
researcher performed Phase 1 using the Delphi technique. In his evaluation of the Delphi
technique, Davidson (2013) asserted the importance of the principle of anonymity. Skulmoski et
al. (2007) mandated four major features of any Delphi study: (a) anonymity of Delphi
participants, (b) an iterative approach that allows the participant to refine his or her views on any
subject matter, (c) controlled feedback from the Delphi coordinator (researcher) to the
participants regarding other perspectives in a Delphi study, and (d) statistical analysis and
aggregation of the group response. Based on the above citations, during Phase 1 of this study, the
researcher circulated an initial questionnaire he had designed among a set of five experts and
then he refined it to establish content validity. This refined survey questionnaire was the first
document that went to the chosen panel of Delphi experts through e-mails. Until consensus
emerged, in each round the researcher collected input from the panel using e-mails. The
11
researcher used the refined survey in Phase 2 to conduct an EFA study through Qualtrics to
collect survey data. The researcher used data from this survey to perform EFA and reliability
analyses to extract the factors that determine the success of any security analytics
implementation.
Assumptions and Limitations
Assumptions
According to Orlikowski and Baroudi (1991), ontological assumptions have connections
to the nature of the phenomenon under investigation. The nature of phenomenon under
investigation may be objective, subjective, or a mix of the two. In the case of this security
analytics study, it is both subjective and objective, as the researcher solicited the opinions of
information security personnel. For this study, security analytic experts applied both their
perception of security analytic tools and their objective experience with the tools in answering
the survey questions.
Jupp (2006) clarified that epistemology deals with methods of achieving knowledge
about reality. The epistemological assumption of this study is positivist, which maintains that
human perception of information security-related data, in the form of security analytic tools,
observed over time, will reveal the true picture about these tools. Information security personnel
working with security analytic tools know from their experience the effect of these tools on the
overall safety of the organization and which factors influence the success of any implementation
most. Iivari (1991) asserted that methodological assumptions deal with choices of research
methods. The Delphi method, as explained by Okoli and Pawlowski (2004), uses consensus to
arrive at the final deliverable. It is the most suitable method for this research study. Pathirage,
Amaratunga, and Haigh (2005) mentioned that axiological assumptions relate to the value of any
12
given research study. In this study, the major stakeholders are information security personnel and
researchers. Apart from the above general assumptions, this researcher has identified a list of
specific assumptions for this study. These assumptions are explained in the following lines.
Experts in the field of security analytics are assumed to have good knowledge of the
implementation specifics and issues of security analytic tools. Participants for the Delphi study
were recruited using the LinkedIn professional network. It is assumed that Delphi participants
actually possess the expertise projected in their LinkedIn profile and can understand the Delphi
study orientation document. It was also important that Delphi participants would unbiasedly
participate in the study. This researcher also assumed that Phase 2 study participants were not the
same people who participated in the Phase 1 study and Phase 2 participants will not use proxies.
Limitations
Any restricting factor that limits the scope of this research is a limitation. The researcher
acknowledges a few limitations of this research study. The survey questionnaire validated in the
EFA study (Phase 2) may not be fully representative of the entire population of security analytic
tool users. Pinsonneault and Kraemer (1993) opined that cross-sectional surveys are not fully
representative of the target population. Lack of time prevented the researcher from performing
longitudinal surveys. Phase 2 survey was conducted only in the United States and Canada and no
other country was involved. Hence, this study does not fully reflect the security analytic usage in
other countries and is a geographic limitation. Industry related limitation is applicable to this
study because Phase 2 study focused only on retail, finance, healthcare and government sectors.
Other industries like manufacturing and energy sectors were not included in the study.
13
Organization of the Remainder of the Study
The remainder of the study focusses on the following major areas: (a) a review of extant
literature in security analytics and related areas, (b) an explanation of the research methodology,
(c) the results of the study, and (d) the conclusion of the study.
14
CHAPTER 2. LITERATURE REVIEW
Overview
The increasing usage of security analytic tools and their impact on the safety of corporate
network and applications necessitated a closer examination of this phenomenon using a thorough
literature review leading to a survey instrument to support the research questions. A high-level
theoretical foundation suitable for this study included five theories. Those five theories are (a)
game theory, (b) MapReduce programming model, (c) TAM-3 theory, (d) computational
learning theory, and (e) Dempster-Shafer theory. The literature review provides an overview of
many independent research tools recently developed in the domain of security analytics.
Cardenas et al. (2013) defined three maturity levels for security analytic products, and this
research study has built a survey instrument that contains probing questions applicable to all the
three maturity levels. The literature review also evaluates the connection between the theories
mentioned above and the independent security analytic solutions. The literature review concludes
with a synthesis of the findings, gaps and a critique of the security analytic solutions including
that of commercial surveys done on security analytic products.
Methods of Searching
The search involved seeking out security analytics-related literature in many online
libraries. The researcher used six major sources to search for articles: (a) Capella University’s
online library and subscriptions to book and journal databases, (b) the Association for
15
Computing Machinery (ACM) online digital library, (c) the Institute of Electrical and Electronic
Engineers (IEEE) online digital library, (d) Google Scholar database, (e) ProQuest dissertation
database, and (f) SAGE journal database. The actual search strings that the researcher used were
(a) security analytics, (b) big data security analytics, (c) theories and security analytics, (d)
security analytics game theory, (e) security analytics computational learning theory, (f) security
analytics TAM-3 theory, (g) security analytics map reduce, (h) security analytics DS theory, and
(i) security analytics Dempster-Schafer theory. Capella University’s Summon search tool was
also extensively used in executing the above searches.
Theoretical Orientation for the Study
Foundational Theories
At the core of this research is the subject of security analytics. As this subject involves
connection to the entire landscape of IT within an organization, many foundational theories can
explain the functionality of security analytic tools. One of the significant components of security
analytics is network analytics. It is easy to model network behavior and to extract intelligence
from these models with the help of game theory and evidence theory. Liang and Xiao (2013)
applied game theoretical concepts to network security by exploiting game models inherent to
game theory. Dempster and Shafer developed DS theory and Shafer (1976) named it as the
theory of mathematical evidence. Big data based security analytic products invariably use
MapReduce theoretical concepts in reducing large datasets to smaller datasets, suitable for
extracting actionable insights. Dean and Ghemawat (2010) introduced MapReduce as a
programming model suitable for parallelizing and executing large programs using multiple
computing resources. Venkatesh and Bala (2008) presented technology acceptance model
(TAM-3) to explain the adoption of new technologies by the user community. Application of
16
TAM-3 constructs to the family of security analytic tools will explain the adoption maturity of
those tools. Machine learning concepts explain the learning component of security analytic tools.
Computational learning theory and its application are also useful to the domain of security
analytics. Goldman (1995) provided an explanation for the assessment of learning algorithms
using computational learning theory. A detailed treatment of the above theories is in the
following section. Figure 1 pictorially depicts the theoretical foundation for security analytics.
Game theory. A game in the simplest case is an interaction between two entities in any
given situation. For example, in a game of chess, there are always two players (entities) in play.
Barron (2013) mathematically described games as involving a number of players (N), a set of
strategies for each player, and a quantitative payoff for each player. Von Neumann and
Morgenstern (2007), for the first time, defined game theory in 1944 as a strategy of games to
solve problems in economics. Turocy and von Stengel (2001) discussed the application of game
theory to bidding in online auctions. Game theory is a mathematical tool that can describe and
solve games (Liang & Xiao, 2013). Liang and Xiao further elaborated about game theory in
their classic introduction on the following salient aspects: (a) Category 1 – based on number of
stages in a game, games are classifiable as static games, dynamic games, or stochastic games, (b)
Category 2 – based on information available on player’s actions, games are classifiable as games
of perfect information and games of imperfect information, (c) Category 3 – based on the
completeness of information available for players, games are classifiable as games of complete
information and games of incomplete information, and (d) equilibrium, which is a result or
combination of players’ strategies.
17
Figure 1. Security analytics – Theoretical foundation.
Nash (1950/1997) for the first time established the Nash equilibrium to indicate that finite
games have an equilibrium point. At this point of equilibrium, all players choose the best
possible action given the decisions their opponents make. Non-cooperative games are quite
popular in the world of information security, as such games model attacker and defender
situations.
18
Game theory and security analytics. Liang and Xiao (2013) opined that network
security is a typical game between attackers and defenders. Analyzing network traffic and related
data is a necessary prerequisite for bolstering network security, and it is an integral part of
security analytics. Theoretical inputs play a major role in modeling network situations. Game
theory has a wide variety of applications in network security. Roy et al. (2010) presented a
thorough survey of applications of game theory and argued that game theory-based solutions fill
the lack of a quantitative decision framework in traditional network analytics.
The Nash equilibrium is widely applicable in a multiple player situation. As per this
principle, it is not possible to predict the results of decisions emanating from multiple players in
a game situation by judging players in isolation. Mohammed, Fung, and Debbabi (2011) applied
this principle to solve a data integration problem for very large databases. Usually, an integrated
dataset obtained by joining multiple data sources reveals sensitive data. To solve this problem,
researchers developed a common algorithm based on Nash’s equilibrium principle in which
every data integrator participated. This algorithm isolated malicious participants. This technique
can detect insider attacks. One of the goals of insider attacks in an enterprise is to procure the
sensitive data of the employees. For example, merging of disparate sources of endpoint data can
reveal sensitive data relating to employees. The Nash equilibrium technique can be applied to
thwart such insider attacks. Security analytic tools involve extensive use of correlational
algorithms and data integration. Mohammed et al.’s research project is easily extendable and
applicable to any commercial security analytic product.
Honeypots are traps set for attackers by defenders (Spitzner, 2003). Deceiving attackers
is an evolving strategy based on game theory. Carroll and Grosu (2011) explained their
deception strategy using a comprehensive set of algorithms and dynamic games. Han (2012)
19
referred to dynamic games as situations in which one player has information about the other
player’s strategy or moves, and multiple moves take place over time. In this deception game, two
sets of players, the defending network and the attacking network, are involved. Camouflaging
normal computers as honeypots and vice versa is a strategy that defenders adopt. Attackers are
usually at a disadvantage in this deception game, as they are not the first movers of the game.
Since logging is a powerful feature of honeypots, security analytic tools will easily be able to
integrate these log files for analysis.
Games in which players do not know the payoffs or preferences of their opponents are
games with incomplete information (Han, 2012. Liu, Zang, and Yu (2005)) modeled a defense
system with the aim of capturing a DoS attacker’s intent as opposed to the attack pattern. The
attacker’s incentive lies in the core of his or her design. One of the incentives is to access
classified internal documents. This design used data from attacks to study and understand the
intent of the attacker. The game design of this model used a six-tuple game with two players
(defender and attacker), two strategy spaces (one each for attacker and defender), a Bayesian
game type, and a set of outcomes. A Bayesian game is a game in which there is incomplete
information about the other players in the game. In this scenario, there is less information about
attackers. While this approach was not a perfect design in terms of accuracy, this model inspired
many more research projects that applied game theory to security analytics.
Fielder, Panaousis, Malacaria, Hankin, and Smeraldi (2014) addressed the challenge of
supporting the decisions of security administrators and other personnel with respect to protecting
the information-related assets of any organization. They used game theoretic modeling
techniques to model the situation between the attacker and a team of administrators to establish
the Nash equilibrium. One of the aspects of security analytic tools is to model the attacker-
20
defender situation. This can lead to the determination of the number of security analysts
necessary to pursue the actionable intelligence the tool provides. Game theoretic techniques are
easily applicable here to solve the challenge of cybersecurity resource allocation.
Chung, Kamhoua, Kwiat, Kalbarczyk, and Iyer (2016) argued that game theory
techniques can combine with learning algorithms like Q-learning to react to adversarial behavior
in a network situation. Their novel technique does not need complete information from the
opponents. Security incidents belonging to the intrusion category are the main focus of this
approach. Q-learning is a model-free reinforcement learning technique, and it can model games
with incomplete information. Q-learning mainly learns from human behavior. For example, Q-
learning can learn from the type of decisions a security analyst takes. It also learns from the data
of earlier iterations.
Computational learning theory and machine learning. Computer programs that can
perform tasks with the help of human-like intelligence and learn and improve their performance
over time act as learning programs. Goldman (1995), in her elaborate introduction to
computational learning theory, explained the initial work in the early sixties in machine-based
learning in terms of learning theory offering a foundation for assessing learning algorithms.
Leeuwen (2004) introduced machine learning using simple and yet powerful examples. Some of
the typical examples he gave are (a) pattern recognition in an array of images, (b) identifying
words in a handwritten text, (c) discovering and extracting common information from distributed
data, and (d) speech recognition. Most security analytic tools incorporate one or more learning
algorithms that focus on learning about attacks. For example, learning algorithms easily spot
botnet attacks.
21
Computation learning theory is a branch of theoretical computer science that studies
machine learning with the help of a strong mathematical foundation. Gold (1967) provided the
first formal definition of learning. Gold’s theory of learning focused on the learner guessing a
rule behind a data sequence leading to a convergence of the sequence. However, Gold’s theory
of learning did not attempt to evaluate and assess the efficiency of a learning process, whereas
computational learning theory provided a framework to evaluate the learning process. Goldman
(1995) reiterated that computational learning theory provided a framework to compare and rate
different learning algorithms. From the perspective of assessment, computational learning theory
has a close connection to machine learning.
Probably approximately correct (PAC) learning model. The seminal work of Valiant
(1984) initiated computation learning theory in the form of the PAC model. The PAC model is
the initial standard for learning programs. Leeuwen (2004) provided a convincing mathematical
model for PAC. He explained that the goal of a PAC learning algorithm is to classify samples
generated from a sample space (X) into a concept space (C). Based on the samples, the researcher
forms a hypothesis, and with every sample, the researcher adjusts this hypothesis. A hypothesis
is good when the number of errors in classifying samples remains below a predefined bound. A
problem is PAC-learnable if there is an algorithm (A) which for any distribution D and any concept c
will when given some independently drawn samples and with high probability, produce a near error-
free hypothesis (h). Concept learning is a generic term that includes the PAC learning model.
Blum (1994) introduced the two popular phases involved in concept learning. Those two phases
are the training phase and the testing phase. During the training phase, the learning procedure
studies some examples and produces a hypothesis. During the testing phases, it evaluates the
hypothesis.
22
Application of PAC to fraud detection. A PAC learning algorithm can be either strong
or weak. Leeuwen (2004) described a weak learning algorithm as one that performs poorly in
terms of learning. A weak learning algorithm has less accuracy in terms of classification. This
phenomenon is likely due to not having a large training dataset. There are a few meta-learning
techniques that can boost the accuracy and performance of a weak learner. One such popular
technique is boosting. Boosting turns a weak learning algorithm into a strong learner. Malekpour
et al. (2014) elaborated a hybrid model approach in predicting fraudulent behavior in e-banking
systems. Security analytic tools ideally incorporate similar models, both supervised and
unsupervised, to identify financial frauds in banking institutions. Log sources feeding into
security analytic tools form the major inputs to these hybrid models. Hybrid models include both
classification and clustering techniques. The ensemble approach employs boosting algorithms to
increase the model accuracy. Malekpour et al. used this approach to increase their accuracy in
detecting frauds. Increased accuracy was the result of comparing results with earlier techniques.
Boosting the PAC algorithms was the key to improving accuracy.
Dempster-Shafer (DS) Theory and its applications. Science has succeeded in
modeling many aspects of uncertainty in daily life. Kohlas and Monney (1994) introduced the
theory of evidence with a strong computational flavor with the aim of modeling uncertainty.
Researchers also call this theory DS theory. Shafer (1976) released it as a theory of mathematical
evidence. As per Shafer, the main goals of this theory are to represent uncertainty with the help
of evidence and hints. Statistical modeling of uncertainty was a successful outcome of this
theory. Kohlas and Monney elaborated on the application of evidence theory in the area of
decision analysis. Decision analysis is an important application of evidence theory and is applied
in the area of intrusion detection. Detecting intrusion is one of the major functions of security
23
analytic tools. Database log files are a major input to security analytic tools. Intrusive activity in
databases is usually detected by a pre-defined set of rules. Panigrahi, Sural, and Majumdar
(2009) explored the application of evidence theory to intrusion detection in databases. Rules
provide evidence of intrusive behavior in database transactions. However, many pieces of
evidence are combined to form an initial belief in any given transaction. This combination of
evidence is accomplished using DS theoretical constructs as explained in the following section.
Database intrusion detection, a less popular concept, identified user behavioral anomalies
with a focus on preventing insider attacks. Panigrahi, Sural, and Majumdar (2009) elaborated on
a unique algorithm they developed to detect suspicious user activity in databases. Two rules
formed the initial belief function regarding whether a specific user activity is malicious or
normal. The first rule measured the sequence of activities in a transaction and any deviation from
the prebuilt user profile. The second rule detected spatiotemporal outliers in user behavior with
specific reference to location and time of user activity. The belief component of the detection
engine combined output from both the rules that act as evidence to determine malicious
behavior. Panigrahi et al. tested their algorithm using a transaction simulator. Rules in this
experiment are easily extendable to any security analytic tool. Weblogs and database logs are
suitable inputs to the security analytic tool in the application of these rules.
Detecting insider attacks is a high priority for many of the security analytic tools. Mobile
ad hoc network (MANET) is an evolving network model in communication networks, and it is
subject to insider attacks. Ehsan and Khan (2012) presented a detailed analysis of the security
attacks that normally happen in mobile ad hoc network (MANET). Many algorithms are
currently developed by researchers to combat MANET attacks. Wei, Tang, Yu, Wang, and
Mason (2014) developed an algorithm based on DS theory to fuse indirect information provided
24