Formulating Research Questions (RQ’s) Refer EXERCISE # 1
Selecting Empirical Methods for SE Research
Despite widespread interest in empirical software engineering, there is little guidance on
Which research methods are suitable for which research problems?
Software engineering is a multi-disciplinary field
Because of the importance of human activities in software development, research methods are drawn from other disciplines
2
1. Despite widespread interest in empirical software engineering, there is little guidance on which research methods are suitable to which research problems, and how to choose amongst them. Many researchers select inappropriate methods because they do not understand the goals
underlying a method or possess little knowledge about alternatives. As a first step in helping researchers select an appropriate method, this chapter discusses key questions to consider in selecting a method, from philosophical considerations about the nature of knowledge to practical
considerations in the application of the method. We characterize key empirical methods applicable to empirical software engineering, and explain the strengths and weaknesses of each.
2. Software engineering is a multi-disciplinary field, crossing many social and technological boundaries. To understand how software engineers construct and maintain complex, evolving software systems, we need to investigate not just the tools and processes they use, but also the
social and cognitive processes surrounding them. This requires the study of human activities. We need to understand how individual software engineers develop software, as well as how teams and organizations coordinate their efforts.
3. Because of the importance of human activities in software development, many of the research methods that are appropriate to software engineering are drawn from disciplines that study human behavior, both at the individual level (e.g. psychology) and at the team and organizational levels (e.g. sociology).These methods all have known flaws, and each can only provide limited, qualified evidence about the phenomena being studied. However, each method is flawed differently and viable research strategies use multiple methods, chosen in such a way that the weaknesses of each method are addressed by use of complementary methods.
2
Classes of research methods in SE
Controlled experiments (including Quasi-Experiments)
Case Studies
Survey Research
3
How to select a research method? (Start by FORMULATING RESEARCH QUESTIONS)
To illustrate the steps involved in deciding which method or methods to use, we present two guiding examples. Two fictional software engineering researchers, Joe and Jane, will explore how the various research methods can be applied to their work:
Some of the examples will be used in the remainder of the slides.
The other examples will be used for EXERCISE # 1
4
What research questions are you asking?
The first step in choosing an appropriate research method is to clarify the research question.
Asking questions + systematic process of obtaining valid answers
Problem Statement
RQ and Hypothesis
5
One of the first steps in choosing an appropriate research method is to clarify the research question. While Jane and Joe have identified the problems they wish to work on, neither has pinned down a precise question. In each case, they could focus on a number of different research questions, each of which leads to a different direction in developing research strategies.
5
Research question
Often the most obvious question is not the best choice for starting point.
Jane’s first attempt to formulate her RQ is:
“Is a fisheye-view file navigator more efficient than the traditional view for file navigator”
Joe’s first attempt to formulate her RQ is:
“How widely are UML diagrams used as collaborative shared artifacts during design?”
6
Research questions
Both questions are vague, because they make assumptions about the phenomenon (to be studied) and the kinds of situations (in which these phenomenon occur).
7
For example: Is a fisheye-view file navigator more efficient than the traditional view for file navigator
Jane’s question only makes sense if we already know that some people (who?) need to do file navigation (whatever that is?), under some circumstances (which are?), and that efficiency (measured how?) is a relevant goal for these people (how would we know that?)
Research questions
8
For example: “How widely are UML diagrams used as collaborative shared artifacts during design?”
Joe’s question presupposes that we know what a “collaborative shared artifact” is, and can reliably identify one, and even reliably say which things are UML diagrams
Defining the precise meaning of terms is a crucial part of empirical research, and is closely tied with the idea of developing (or selecting) an appropriate theory
Research questions
In early stages of a research program, we need to ask “exploratory” questions
Suitable research methods for exploratory questions helps us to build tentative theories
Unless Jane and Joe are building on existing work, they need to formulate exploratory questions, such
Existence Question : “Does X Exist”?
Description and Classification Questions: “What is X like”?
Descriptive- Comparative questions: “How does X differ from Y”?
9
1. In the early stages of a research program, we usually need to ask exploratory questions, as we attempt to understand the phenomena, and identify useful distinctions that clarify our understanding.
2. Suitable research methods for exploratory questions tend to be those that offer rich, qualitative data, which help us to build tentative theories.
3. Unless they are building on existing work that already offers clear definitions, both Jane and Joe need to formulate exploratory questions, such as
9
Research questions: Existence questions
Existence questions of the form, “Does X exist?”
X is a thing, attribute, phenomenon, behavior, ability, condition, state of affairs etc.
Is there a programmer who can write 200k lines per year?
Jane might need to ask,
“Is file navigation something that (certain types of programmers) actually do?” and,
“Is efficiency actually a problem in file navigation?”
What might JOE need to ask?
EXERCISE # 1
10
Research questions: Description/Classification questions
Description and Classification questions such as,
“What is X like?”,
“What are its properties?”,
“How can it be categorized?”,
“How can we measure it?”,
“What is its purpose?”,
“What are its components?”,
“How do the components relate to one another?”,
“What are all the types of X?”
Jane might ask,
“How can we measure efficiency for file navigation?”
What might JOE ask?
EXERCISE # 1
11
Research questions: Descriptive-Comparative questions
Descriptive-Comparative questions of the form, “How does X differ from Y?” investigate similarities and differences between two or more phenomena.
Jane might ask,
“How do fisheye views differ from conventional views?”
What might JOE ask?
EXERCISE # 1
12
Why formulate Exploratory Research Questions?
The answers to these questions result in a clearer understanding of the phenomena, including more precise definitions of the theoretical terms, evidence that we can measure them, and that the measures are valid.
In exploring these questions, Jane and Joe will refine their ideas about the nature of the phenomena they are studying.
It is possible that there are already good answers to these questions in the published literature.
A literature survey, instead of an empirical study, may answer them.
13
Once we have a clearer understanding of the phenomena, we may need to ask base-rate questions about the normal patterns of occurrence of the phenomena
Helps evaluate whether situation is normal or abnormal?
End of the slide discussion:
Once we have a clearer understanding of the phenomena, we may need to ask base-rate questions about the normal patterns of occurrence of the phenomena. If we fail to ask base-rate questions, then we have no basis for saying whether a particular situation is normal or unusual.
Example base-rate questions are on the next slide:
13
Base-Rate Questions
Example Base-Rate Questions include:
Frequency & Distribution Questions:
How often does X occur?
Descriptive Process Questions:
How does X normally work?
14
Research questions: Frequency and Distribution questions
Frequency and distribution questions such as,
“How often does X occur?” and,
“What is an average amount of X?”
Joe’s original question appears to be a frequency question, but there he can formulate it more precisely.
For example, he might ask,
“How many distinct UML diagrams are created in software development projects in large software companies?”
He might discover the results follow some standard statistical distribution.
What might JANE ask? EXERCISE # 1
15
Often, these questions can be answered in terms of a standard distribution of a characteristic within a well-defined population.
15
Research questions: Descriptive Process questions
Descriptive-Process questions such as,
“How does X normally work?”
“What is the process by which X happens?”
“In what sequence do the events of X occur?”,
“What are the steps X goes through as it evolves?”,
“How does X achieve its purpose?”.
For example, Jane might ask,
“How do programmers navigate files using existing tools?”
What might JOE ask? EXERCISE # 1
16
Often, these questions can be answered in terms of a standard distribution of a characteristic within a well-defined population.
16
Research Questions: Relationship Questions
Often, we are interested in relationship questions between two phenomenon, specifically whether the occurrence of one is related to the occurrence of the other. Hence we need to formulate:
Relationship questions:
Are X and Y related?
Do occurrence of X correlate with occurrence of Y?
17
Research questions: Relationship questions
Relationship questions such as,
“Are X and Y Related”?
“Do occurrences of X correlate with the occurrences of Y?”
Jane might ask,
“Does efficiency in file navigation correlate with the programmer’s familiarity with the programming environment?”
What Joe might ask,
EXERCISE # 1
18
18
Next step
Once we have established that a relationship exists between two phenomena, it is natural to try to explain why the relationship holds by attempting to identify a cause and effect.
It is a common mistake to confuse correlation with causality.
In general it is much harder to demonstrate causality than to show that two variables are correlated.
If high values of X correlate with high values of Y, it may be because X causes Y, or because Y causes X.
But it is also possible that X and Y share some common cause and neither causes the other. Or perhaps they co-evolve in complex ways so that there is no clear cause-and-effect.
19
Research questions: Causality questions
Descriptive-Process questions such as,
“Does X cause Y?”
“Does X prevent Y?”
Plus the more general forms:
“What causes Y?”, “
What are all the factors that cause Y?”,
“What effect does X have on Y?”
For example, Jane might ask,
“Do fisheye-views cause an improvement in efficiency for file navigation?”
What JOE might ask? Exercise # 1
20
In software engineering we often ask whether using a particular tool or technique causes an improvement in quality, speed, and so on.
Jane’s initial question appears to be of this type: “Do fisheye-views cause an improvement in efficiency for file navigation?”
20
Research questions: Causality-Comparative questions
Causality-Comparative questions investigates the relationship between different causes
“Does X cause more Y than does Z?”
“Is X better at preventing Y than is Z?”
Unless Jane has good base-rate data for existing file navigation tools, Jane’s causality question would be better formulated as
“Do fisheye-views cause programmers to be more efficient at file navigation than conventional views?”
What JOE might ask? Exercise # 1
21
21
Research questions: Causality-Comparative Interaction
Causality- Comparative Interaction questions investigate how context affects a cause-effect relationship:
“Does X or Z cause more Y under one condition but not others?”
If Jane’s initial studies reveal a factor (e.g., distractions) that affects causality, she might ask
“Do fisheye-views cause programmers to be more efficient at file navigation than conventional views when programmers are distracted, but not otherwise?”
What JOE might ask? Exercise # 1
22
22
Empirical vs. Non-Empirical Research questions
The classes of research question above are all knowledge questions focused on the way the world is.
In contrast, most non-empirical research in software engineering focuses on a very different type of question concerned with designing better ways to do software engineering
23
The classes of research question above are all knowledge questions focused on the way the world is. Empirical research in software engineering addresses these types of questions. In contrast, most non-empirical research in software engineering focuses on a very different type of question
concerned with designing better ways to do software engineering :
23
Design questions
Design questions of the form, “What’s an effective way to achieve X?” or, “What strategies help to achieve X?”
For example, Joe’s research might lead him to ask,
“What is an effective way for teams to represent design knowledge to improve coordination?”
These types of question are necessary when the goal is to design better procedures and tools for carrying out some activity
Such questions presuppose that the associated knowledge questions have already been addressed so that we have enough information about the nature of the design problem to be solved.
In practice, a long term software engineering research program involves a mix of design questions and knowledge questions
24
The classes of research question above are all knowledge questions focused on the way the world is. Empirical research in software engineering addresses these types of questions. In contrast, most non-empirical research in software engineering focuses on a very different type of question
concerned with designing better ways to do software engineering :
• Design questions of the form, “What’s an effective way to achieve X?” or, “What strategies help to achieve X?” For example, Joe’s research might lead him to ask, “What is an effective way for teams to represent design knowledge to improve coordination?”
These types of question are necessary when the goal is to design better procedures and tools for carrying out some activity or to design suitable social or regulatory policies. Such questions presuppose that the associated knowledge questions have already been addressed so that we have
enough information about the nature of the design problem to be solved. In practice, a long term software engineering research program involves a mix of design questions and knowledge questions as the researchers investigate specific problems, how best to solve them, and which solutions work best