Related Articles/Assessment-of-creativity-evaluation-skills--A-psychome_2016_Thinking-Skills-.pdf
A i
M J a
b
a
A R R A A
K C D D O I
1
u a e 1 w a e
1
Thinking Skills and Creativity 21 (2016) 75–84
Contents lists available at ScienceDirect
Thinking Skills and Creativity
j o u r n a l h o m e p a g e : h t t p : / / w w w . e l s e v i e r . c o m / l o c a t e / t s c
ssessment of creativity evaluation skills: A psychometric nvestigation in prospective teachers
athias Benedek a,∗, Nora Nordtvedt a, Emanuel Jauk a, Corinna Koschmieder a, ürgen Pretsch a, Georg Krammer a,b, Aljoscha C. Neubauer a
Department of Psychology, University of Graz, Austria University College of Teacher Education Styria, Austria
r t i c l e i n f o
rticle history: eceived 1 December 2015 eceived in revised form 3 May 2016 ccepted 22 May 2016 vailable online 24 May 2016
eywords: reativity ivergent thinking iscernment penness
ntelligence
a b s t r a c t
An accurate judgement of the creativity of ideas is seen as an important component under- lying creative performance, and also seems relevant to effectively support the creativity of others. In this article we describe the development of a novel test for the assessment of creativity evaluation skills, which was designed to be part of an admission test for teacher education. The final test presents 72 ideas that have to be judged as being common, inappro- priate, or creative. Two studies examined the psychometric quality of the test, and explored relationships of creativity evaluation skills with cognitive ability and personality. In the first study, we observed that creativity evaluation skills are positively correlated with divergent thinking creativity and creative achievement, which suggests that evaluation skills are rel- evant for creative ideation as well as creative accomplishment. Across both studies, people tended to underestimate the creativity of ideas. Openness, intelligence and language com- petence predicted higher creativity evaluation skills, and this effect was partly mediated by a less negative evaluation bias. These findings contribute to our understanding of why people sometimes fail to recognize the creativity in others.
© 2016 Elsevier Ltd. All rights reserved.
. Introduction
How well can people evaluate the creativity of ideas? On the one hand, people show reasonable agreement when eval- ating the creativity of ideas, which indicates creativity is a quantifiable aspect of ideas. On the other hand, there is also
substantial amount of variability in judgements, suggesting that people differ in how discerning they are. An accurate valuation of creativity is thought to be conducive to one’s own creative performance (Cropley, 2006; Finke, Ward, & Smith, 992), and should be similarly important for providing a selective feedback and fostering creativity in others. In this article,
e describe the development of creativity evaluation test, designed to be part of an admission test for teacher education. We
nalyzed data from two studies that examined the psychometric quality of the test, and explored relationship of creativity valuation skills with cognitive ability and personality.
∗ Corresponding author at: Department of Psychology, University of Graz, Universitätsplatz 2, 8010 Graz, Austria. E-mail address: mathias.benedek@uni-graz.at (M. Benedek).
http://dx.doi.org/10.1016/j.tsc.2016.05.007 871-1871/© 2016 Elsevier Ltd. All rights reserved.
dx.doi.org/10.1016/j.tsc.2016.05.007
http://www.sciencedirect.com/science/journal/18711871
http://www.elsevier.com/locate/tsc
http://crossmark.crossref.org/dialog/?doi=10.1016/j.tsc.2016.05.007&domain=pdf
mailto:mathias.benedek@uni-graz.at
dx.doi.org/10.1016/j.tsc.2016.05.007
76 M. Benedek et al. / Thinking Skills and Creativity 21 (2016) 75–84
1.1. Evaluating creativity
A central challenge in creativity research is the criterion problem (Amabile, 1982; Brown, 1989; Shapiro, 1970): There is no easy way to objectively assess the creativity of an idea or product. Moreover, creativity is not an invariant feature of a product, but depends on the time and socio-cultural environment it is born into (e.g., Glăveanu, 2014; Simonton, 1998). Still, within a certain time and group, people tend to agree on whether an idea can be considered more or less creative. Creativity research capitalizes on this agreement by using a consensual definition of creativity, which defines the creativity of a product as the averaged evaluation across a set of judges (Amabile, 1982). Subjective ratings of creativity show good inter-rater-reliability for different kinds of creative products including drawings (e.g., Dollinger & Shafran, 2005), stories (e.g., Baer, Kaufman, & Gentile, 2004), or ideas in divergent thinking tasks (Benedek, Mühlmann, Jauk, & Neubauer, 2013; Silvia et al., 2008). The agreement across judges indicates that creativity is generally an identifiable and quantifiable characteristic of new ideas and products (Benedek & Jauk, 2014).
Creativity scholars have tried to further define the characteristics that lead to the perception of creativity. While many relevant characteristics have been proposed, there is strong agreement that a creative product above all needs to be novel. If it is not novel, it cannot be creative “no matter what other positive qualities it might possess” (Jackson & Messick, 1967). However, mere novelty is usually not enough, but a product is additionally required to meet a criterion of meaningfulness or appropriateness to be considered creative (Barron, 1955; Stein, 1953; see also Runco & Jaeger, 2012). This notion has been confirmed by research showing that creativity evaluations strongly depend on the perceived novelty, and, to a lesser degree, also on their perceived appropriateness (Caroff & Besancon, 2008; Diedrich, Benedek, Jauk, & Neubauer, 2015;Runco & Charles, 1993). It is important to note that novelty and appropriateness are generally inversely related, because highly common ideas are usually also highly appropriate. But within novel ideas, appropriateness predicts perceived creativity, thereby moderating the effect of novelty on creativity (Diedrich et al., 2015).
1.2. Assessment of creativity evaluation skills
Different ways have been proposed to measure discernment of creativity evaluations (cf. Silvia, 2008). One approach is to measure evaluation accuracy in terms of hit rates, which is the percentage of correctly identified creative or uncreative ideas (Runco & Dow, 2004; Runco & Smith, 1992). Runco and Smith asked participants to rate lists of ideas from others as well as own ideas for creativity on a 1–7 scale. A judgement was defined as correct when an idea was unique (i.e., statistically infrequent) and given a rating of 6 or 7, or when the idea was common (i.e., given by more than 10%) and rated as 1 or 2. Accuracy rates were generally moderate (20–50%). Interestingly, divergent thinking ability predicted higher evaluation accuracy for own ideas but not for the evaluation of others’ ideas. A potential problem with this way of scoring is that it uses different criteria for individual judgements and criterion values. Moreover, it separately scores the evaluation accuracy related to creative and common ideas, which can be differently affected by response biases: judging most ideas as creative will lead to high hit rates for creative ideas, but low hit rates for common ideas, thus reflecting high sensitivity but low specificity in separate scores. Finally, low intercorrelations of scores across tasks indicate low reliability of this scoring.
Another approach to assess accuracy is to compute the discrepancy of evaluations with criterion scores measured on the same scale. Grohman, Wodniecka, and Kłusak (2006) employed this approach and separately measured accuracy for rated originality and uniqueness when judging own ideas and ideas from others. Criterion values were based on the ratings of three trained raters and the relative frequency of ideas within the sample. They found that people generally overestimate the originality of ideas, which was more pronounced for own ideas than for the ideas from others. Divergent thinking ability, however, was not consistently related to better evaluation accuracy. While this approach aims at a more differentiated measurement of discernment compared to hit rates, its actual precision seems to strongly depend on the reliability of the established criterion scores.
Finally, discernment can also be measured in terms of the covariation of evaluations with criterion values. This method does not require the presumption that criterion values reflect the true, absolute level of creativity and hence reflects accuracy in terms of relative rather than absolute agreement. For example, Silvia (2008) asked people to select their two most creative ideas and analyzed to what extent top-2 choices predict the ratings of judges by means of a multi-level approach. He found that people are generally discerning when evaluating their own ideas, but people high in openness were more discerning than others. Since this method is based on covariation, it reflects whether people are able to recognize relative differences in creativity, but is not affected by judgement biases such as general leniency or strictness. However, this method does not necessarily indicate whether people agree on whether a particular idea is creative or not, because this requires a judgement of the absolute level of creativity. Accuracy in the absolute level of creativity is not needed when people are asked to select the best from a set of given ideas, but it should be relevant in contexts that require the judgement of individual ideas, which is common in many applied settings such as those of teachers, curators, or investors (Cropley, 2001; Sternberg & Lubart, 1992).
1.3. The present research
The main goal of this project was the development and psychometric examination of a creativity evaluation test (CET). The CET was designed to be included in an admission test for teacher education in Austria, because creativity evaluation
M. Benedek et al. / Thinking Skills and Creativity 21 (2016) 75–84 77
Table 1 Descriptive statistics and Spearman correlations for Study 1.
N M SD 1 2 3 4 5 6 7 8 9 10 11 12
1 CET Informedness 214 0.68 0.19 2 CET Bias 214 −0.11 0.17 0.54** 3 CET Sensitivity 214 0.80 0.15 0.83** 0.89**
4 CET Specificity 214 0.88 0.10 0.52** −0.34 0.04 5 DT creativity 147 4.12 1.07 0.18* 0.14 0.18* 0.04 6 DT fluency 147 16.35 6.66 0.15+ 0.18* 0.19* 0.02 0.53**
7 CA 75 1.89 0.66 0.27* 0.17 0.31** 0.06 0.27* 0.10 8 Honesty 213 3.69 0.61 0.02 −0.07 −0.03 0.00 −0.02 −0.10 0.00 9 Emotionality 213 3.20 0.71 0.04 −0.01 0.00 0.02 −0.13 0.08 −0.23+ −0.13 10 eXtraversion 213 3.82 0.62 0.05 0.03 0.04 −0.02 0.04 0.01 0.05 0.20** −0.36** 11 Agreeableness 213 3.33 0.60 −0.02 0.06 0.02 −0.07 −0.06 −0.07 −0.05 0.21** −0.21** 0.23** 12 Conscientiousness 213 3.81 0.71 −0.13+ −0.12+ −0.16* −0.05 −0.06 0.01 −0.28* 0.28** −0.09 0.23** 0.07 13 Openness 213 3.80 0.58 0.08 0.01 0.06 0.06 0.21* 0.11 0.14 0.34** −0.30** .28** 0.11 0.31**
Notes: CET = creativity evaluation test; DT = Divergent thinking; CA = Creative achievement.
s U e C e u a m i r &
c c o 2 c t r h ( t (
a w
2
2
2
e v 4 t
2 2 f i
* p < 0.05. ** p < 0.01. + p < 0.10.
kills are seen as an important prerequisite for being able to foster creativity in education (Cropley, 2001; Finke et al., 1992; rban & Cropley, 2000). As in previous research, we decided to use ideas from divergent thinking tasks as item material to nsure that the test does not require any domain-specific knowledge or depend on aesthetic preferences (Kaufman, Baer, ropley, Reiter-Palmon, & Sinnett, 2013). The CET asks for the evaluation of a prespecified set of ideas, rather than for the valuation of own, self-generated ideas. Being able to accurately evaluate one’s own ideas is assumed to be a critical skill nderlying creative potential (Cropley, 2006; Finke et al., 1992; Groborz & Nęcka, 2003). It seems likely that this skill reflects
more general evaluation skill, which is not restricted to own ideas. Moreover, the evaluation of own ideas implies serious ethodological issues, because the assessment is biased by the person’s creative ability. More creative people produce more
deas and more creative ideas, which may make it easier to differentiate between them. This may also explain why previous esearch found that divergent thinking ability is related to evaluation accuracy for own ideas but not for others’ ideas (Runco
Smith, 1992). Finally, we aimed to assess creativity evaluation skills in a way that reflects discernment on relevant dimensions that
ontribute to an overall perception of creativity. Standard definitions of creativity (Runco & Jaeger, 2012) emphasize that reative ideas require both originality/novelty and usefulness/appropriateness. Educators sometimes confuse creativity with riginality, thereby ignoring that creative value arises from meeting relevant constraints placed on originality (Beghetto, 010). However, when they do not recognize the necessary role of constraints for creativity they may easily associate reativity with negative forms of deviance rather than with a desirable trait (Plucker, Beghetto, & Dow, 2004). Following he two-dimensional definition of creativity, we initially considered asking people to assign ideas to one of four categories esulting from the combinations of low vs. high novelty and low vs. high appropriateness. An initial pilot test (n = 15), owever, showed that people hardly ever evaluate ideas as being low on both novelty and appropriateness. Highly common i.e., not-novel) ideas are typically proved and tested and hence appropriate (Diedrich et al., 2015). Therefore, we decided o use three response categories: creative (novel and appropriate), common (not novel but appropriate) and inappropriate novel but not appropriate).
We compiled an initial test version consisting of 180 candidate items. In Study 1, we performed a thorough psychometric nalysis of this initial test version. The findings led to the construction of a final test version consisting of 72 items, which as employed in the teacher admission test of 2014 (Study 2).
. Study 1: test development
.1. Methods
.1.1. Participants A total of 214 people participated in study 1. This study included a large number of newly devised tests as well as
stablished validation tests. Therefore, it was not possible to assign all tests to all participants, and the actual sample size aried across measures as described in Table 1. Participants were university students (77.6% females) aged between 18 and 9 years (M = 22.54, SD = 3.79) majoring mostly in psychology (44%) or teacher education (43%). The study was approved by he local ethics committee.
.1.2. Tests and measures
.1.2.1. Creativity evaluation test (CET). The test instructions explained that this test presents lists of ideas that were collected rom various creative idea generation tasks, but that not all ideas are really creative. The task hence is to decide which of those deas are common, inappropriate or actually creative. A common idea was described as obvious and typical; it is appropriate,
78 M. Benedek et al. / Thinking Skills and Creativity 21 (2016) 75–84
but not novel and hence not creative (e.g., “Using a hat for collecting donations”). An inappropriate idea was described as one that is not appropriate for the task at hand; it is often novel, but not actually creative (e.g., “Using a hat as cooking pot”). A creative idea was described as being both novel and appropriate; it can be clever, humorous and imaginative (e.g., “Using a hat as a Frisbee”).
The initial version of the CET included 180 items, which represented actual responses collected in previous research on creative idea generation (Benedek et al., 2013, 2014; Diedrich et al., 2015; Jauk, Benedek, & Neubauer, 2014). The responses referred to twelve different divergent thinking (DT) tasks, eight alternate uses tasks (car tire, knife, can, bucket, glass bottle, hairdryer, paper clip, and funnel) and four instances tasks (faster locomotion, noise, flexible, and round) with 15 items per task. Items were presented in blocks for each DT task. Each block was headed with a brief task description (e.g., “What can a car tire be used for?”) followed by 15 ideas related to this task (e.g., “It can be burned”). Participants were asked to assign each idea to one out of three response categories (i.e., common, inappropriate, or creative) by marking the respective box. The ideas within each block were distributed equally across these three categories. The full item list cannot be disclosed because it is part of an admission test.
For the scoring of the CET we computed the informedness of judgements, a standard index from signal detection theory that equally accounts for sensitivity and specificity of judgements (Powers, 2011). Sensitivity reflects the ratio of correctly identified creative ideas; specificity reflects the ratio of correctly identified non-creative ideas (i.e., common or inappropriate). The informedness of judgements is defined as Informedness = Sensitivity + Specificity − 1. A perfect informedness of 1 hence is achieved when sensitivity and specificity are both maximal. We can further assess the bias of judgments, which is defined as Bias = Sensitivity − Specificity. A positive bias occurs when sensitivity is higher than specificity and hence reflects a tendency to overrate the creativity of responses.
2.1.2.2. Divergent thinking ability. We assessed divergent thinking (DT) ability with three alternate uses tasks, which were different from those used in the CET (i.e., umbrella, toilet paper, and garden hose). Participants had 2 min per task to produce creative uses and enter them in a text box. The responses were scored for fluency and creativity. The average number of ideas per task was used as an index of DT fluency. For the scoring of DT creativity, we created a list of 1936 non-redundant responses, which were alphabetically sorted and rated for creativity by four independent raters on a scale between 0 (not creative) to 3 (very creative). The inter-rater reliability at idea level ranged from ICC = 0.72 to 0.80 for the three tasks. The average creativity of the three most creative ideas per task (according to the ratings averaged across raters) was used as an index of DT creativity (for a similar procedure see Benedek, Jauk, Sommer, Arendasy, & Neubauer, 2014). Total scores of DT fluency and DT creativity were computed by averaging across the three tasks. The internal consistency was good for DT fluency (� = 0.84) and satisfactory for DT creativity (� = 0.71).
2.1.2.3. Creative achievement. The individual level of creative achievement was assessed by asking to report the three most creative achievements of one’s life (a brief screening measure included in the Inventory of Creative Activities and Achieve- ments, ICAA; https://osf.io/zjrn6/). The responses were rated by four raters on a 6-point scale ranging from 0 (not creative) to 5 (ingenious) with each level being briefly described in a rater’s instruction. The inter-rater reliability was acceptable (ICC = 0.78).
2.1.2.4. Personality. The structure of personality was assessed with the 60-item version of HEXACO personality inventory (Ashton & Lee, 2009), which measures the six dimensions honesty-humility, emotionality, extraversion, agreeableness, conscientiousness, and openness.
A number of other newly devised tests were also piloted in this study, but were not analyzed for this article.
2.1.3. Procedure Participants were tested in groups in university computer classes. All tests were individually administered with the
assessment software Questionmark Perception (Questionmark; London, UK). The order of tests was varied between test sessions. The total test session took up to 2 h.
2.2. Results
2.2.1. Test analysis The goal of this first study was to construct a shortened, reliable test based on the initial item pool of 180 items. In a
first step, we removed items that were ambiguous in terms of low consensual agreement regarding the target category (i.e., correct response category; either common, inappropriate, or creative). Items were considered unambiguous if the target category was selected at a rate that was at least 50% higher than any of the two distractor categories. This criterion was met by 109 items, whereas 71 items had to be excluded. In a next step, we removed items that showed low discriminatory power
(rit < 0.10), which led to the exclusion of another 13 items. The remaining 96 items included only 24 items from the response category “creative”. Therefore, we finally removed further “inappropriate” and “common” items based on item difficulty and discriminatory power until we reached a balanced distribution of items across response categories in total and within task blocks. Finally, one “common” item had to be removed because it was the only item left within one task block.
http://https://osf.io/zjrn6/
D S (
2
o m l ( ( s
c I a ( ( fl n
3
3
3
n o
3 3 s o c t i
3 2 i C f a
3 i i
3 T a a
3
t m c
M. Benedek et al. / Thinking Skills and Creativity 21 (2016) 75–84 79
The shortened test version hence consisted of 71 items, including 23 common, 24 inappropriate and 24 creative responses. ue to the selection of items with high consensual agreement, the remaining items showed high solution rates (M = 0.85, D = 0.08, range = 0.68–0.99) and thus generally low item difficulty. The average discriminatory power of items was 0.21 SD = 0.08; range = 0.06–0.44) and the internal consistency of the test was acceptable (� = 0.78).
.2.2. Validity analysis Table 1 shows the descriptive statistics and inter-correlations of all measures including the informedness and bias
f creativity judgements, divergent thinking ability, creative achievement, and personality. We examined whether nor- al distribution can be assumed for informedness and bias measures with tests of skewness and kurtosis (alpha
evel = 0.01; Tabachnick & Fidell, 2007). The informedness measure was substantially negatively skewed and leptokurtic skewness = −1.66, z = 9.76, p < 0.01; kurtosis = 4.92, z = 14.91, p < 0.01); the bias measure was not skewed but leptokurtic skewness = −0.02, z = 0.41, ns.; kurtosis = 1.42, z = 4.30, p < 0.01). We hence employed non-parametric tests throughout this tudy.
Higher creativity evaluation skills (i.e., informedness of creativity judgements in the CET) was associated with higher DT reativity and higher creative achievement, but it was not correlated with any of the personality measures (see Table 1). nterestingly, individual differences in Hexaco openness were not significantly associated with CET informedness or creative chievement in this study, but only with DT creativity and some other Hexaco traits. The average judgement bias was negative Wilcoxon W = 3791, z = −8.45, p < 0.001), which indicates that people tend to underestimate the creativity of ideas. A smaller i.e., less negative) judgement bias was correlated with higher creativity evaluation skills (CET informedness) and higher DT uency, but it was not significantly correlated with other creativity measures or the HEXACO personality traits. It should be oted that significant correlations were generally rather small.
. Study 2: admission test
.1. Methods
.1.1. Participants A total of 1119 people participated in the admission test for secondary teacher education of 2014. One participant did
ot complete the creativity evaluation test. We hence report all findings for a sample of 1118 participants, which consisted f 675 females (60.4%) and 443 male (39.6%) aged between 17 and 49 years (M = 20.83, SD = 4.51).
.1.2. Tests and measures
.1.2.1. Creativity evaluation test (CET). The item analysis of the original 180 CET items in Study 1 led to a shortened test ver- ion, which encompasses 71 items with unambiguous solutions. In Study 2 we used the shortened test version and included ne new “common” item to ensure an equal distribution of 24 items per solution category (i.e., common, inappropriate, and reativity). The final test hence consisted of 72 items that were grouped to 10 task blocks with five to ten items each (the ask blocks knife and faster locomotion were entirely removed from the original test, since no items survived the criteria of tem analysis in Study 1). The task was administered and scored as described in Study 1.
.1.2.2. Intelligence. Intelligence was assessed with four tests of the Intelligence Structure Battery (INSBAT; Arendasy et al., 009). Tests of figural-inductive thinking, arithmetic flexibility, visual short-term memory, and verbal fluency were admin-
stered as computerized adaptive tests (CAT: van der Linden & Glas, 2000) with the target reliability set to an equivalent of ronbach’s � = 0.80. The adaptive testing algorithm terminated the test, as soon as the target reliability was reached. The
our tests were used to compute a total IQ score reflecting general cognitive ability. The total test duration varied due to the daptive testing method, but on average was roughly one hour.
.1.2.3. Language competence. Individual differences in language competence was assessed with tests on orthography (38 tems), grammar (23 items), and reading comprehension (11 items). The total number of correct responses is used as an ndex of language competence. Reliability was � = 0.71, and test duration on average 30 min.
.1.2.4. Personality. Personality structure was measured with the Big-Five Inventory (BFI; Lang, Lüdtke, & Asendorpf, 2001). he self-report test included 42 items that had to be answered on a 5-point scale. Additional tests were included in the dmission test but are not relevant to the topic of creativity evaluation and, therefore, were not further considered in this rticle.
.1.3. Procedure
Participants enrolled for the admission test were tested in groups in university computer classes. All tests except, for
he intelligence tests, were individually administered with the assessment software Questionmark Perception (Question- ark; London, UK). The assessment started with the intelligence tests, followed by tests of language competence, emotional
ompetence, creativity evaluation and personality. The total assessment took on average 3 h.
80 M. Benedek et al. / Thinking Skills and Creativity 21 (2016) 75–84
Table 2 Relative frequency of responses across items with different target responses.
Target response Actual response (%)
Common Inappropriate Creative
Common 86.3 5.5 8.1 Inappropriate 2.4 90.4 7.2 Creative 7.8 16.9 75.3
Table 3 Descriptive statistics and Pearson correlations for Study 2.
M SD 1 2 3 4 5 6 7 8 9 10
1 CET Informedness 0.62 0.16 2 CET Bias −0.18 0.14 0.64 3 CET Sensitivity 0.74 0.13 0.89 0.91 4 CET Specificity 0.88 0.07 0.59 −0.21 0.16 5 Intelligence (IQ) 110.32 11.11 0.26 0.10 0.19 0.23 6 Language 38.77 5.14 0.33 0.16 0.26 0.26 0.29 7 Neuroticism 2.05 052 0.00 −0.01 0.01 0.00 −0.04 0.00 8 Extraversion 4.18 048 0.02 0.06 0.03 −0.02 −0.06 −0.03 −0.51 9 Openness 4.09 047 0.14 0.16 0.16 0.01 0.01 0.13 −0.26 0.38 10 Agreeableness 4.14 0.45 0.01 0.05 0.04 −0.04 −0.05 0.03 −0.41 0.29 0.24
11 Conscientiousness 4.20 0.47 −0.06 −0.03 −0.05 −0.04 −0.04 0.09 −0.38 0.36 0.30 0.48
Notes: Given N = 1118, p < 0.05 for r ≥ 0.06, and p < 0.01 for r ≥ 0.08. Language = language competence.
3.2. Results
3.2.1. Test analysis In a first step, the dimensionality of the CET was assessed with a confirmatory factor analysis approach. Since the informed-
ness of creativity evaluation reflects sensitivity (i.e., the percentage of correctly identified creative ideas) and specificity (i.e., the percentage of correctly identified common and inappropriate ideas), the 24 creative items were assumed to form one factor, and the 48 common and inappropriate items were assumed to form another factor, and both factors were assumed to correlate. For the resulting model, items were parceled into eight indicators per factor: each 3rd creative item was aggre- gated to item parcels for the sensitivity factor; each 6th common or inappropriate item was aggregated to item parcels of the specificity factor. To account for the essentially dichotomous nature of the items, a WLSMV estimator was used. The resulting model fitted the data well (�2[244.166] = 103, p < 0.001, RMSEA = 0.035, CFI = 0.948, SRMR = 0.045), and showed sensitivity and specificity to be correlated (r = 0.31, p < 0.001). The construct reliability (Hancock & Mueller, 2001) was H = 0.69 and 0.73 for the sensitivity and specificity factor, respectively.
The solution rate of the CET items ranged from p = 0.48 to 0.99, with an average solution rate of 84%, representing relatively low average item difficulty. The target response (i.e., correct response category) was selected most frequently in all 72 items. Moreover, the selection rate of the target response was at least 50% higher than for any of the two incorrect response categories in 88% of items. The average discriminatory power of items was M = 0.15 (SD = 0.07); 15 items (21%), however, showed a low discriminatory power (rit < 0.10). Table 2 shows the average distribution of responses across response categories for common, inappropriate and creative items. Creative ideas tended to be mistaken more often for inappropriate ideas than for common ideas, and inappropriate ideas were more often misjudged as creative than as common. For common ideas false responses appear balanced across inappropriate and creative categories.
The average informedness was high and the average bias was again slightly negative (t[117] = −42.92, p < 0.001; see Table 3). A negative response bias indicates a tendency to underrate the creativity of ideas, by judging creative ideas as common or inappropriate. The distribution of informedness scores was negatively skewed and leptokurtic (skewness = −0.81, z = 11.12, p < 0.01; kurtosis = 1.58, z = 10.82, p < 0.01), whereas the bias score was normally distributed (skewness = −0.18, z = −2.01, ns.; kurtosis = 0.09, z = 0.62, ns.). Given the large sample size of study 2, we employed parametric tests throughout this study.
Age was negatively correlated with creativity evaluation skills in terms of CET informedness (r = −0.08, p < 0.01), but not with bias (r = −0.04, ns.). Analyses of sex differences revealed that women showed slightly higher creativity evaluation skills (M = 0.64, SD = 0.16) compared to men (M = 0.60, SD = 0.16; t[1116] = −3.57, p < 0.01, d = .21), as well as a less negative creativity evaluation bias (M = −0.17, SD = 0.14) than men (M = −0.20, SD = 0.16; t[1116] = −2.81, p < 0.01, d = 0.17) reflecting a lower underestimation of creativity.
3.2.2. Creativity evaluation, cognitive ability and personality. Table 3 shows the correlations between creativity evaluation skills (i.e., CET informedness) and evaluation bias with intel-
ligence, language competence and the Big 5 personality traits. Higher creativity evaluation skills were associated with higher intelligence, higher language competence and, to a lesser degree, also with higher openness and lower conscientiousness.
M. Benedek et al. / Thinking Skills and Creativity 21 (2016) 75–84 81
F b f
S h a I b v i h e c
C m l o s L p ( b i s
4
e i s
ig. 1. Path analysis testing direct effects and indirect effects by intelligence (IQ), language comptence (Lang), Openness (Open) via the mediator evaluation ias (Bias) on creativity evaluation skills (Inform = CET informedness). Positive associations with bias indicate reflect a lower underestimation of creativity or higher trait scores.
imilarly, a lower underestimation of creativity (i.e., a smaller, less negative evaluation bias in the CET) was correlated with igher intelligence, language competence and openness. In order to determine the independent contributions of cognitive bility and personality for creativity evaluation skills (i.e., CET informedness), we computed a hierarchical regression analysis. n a first block, we entered age and sex to account for general variability in these basic demographic variables. In the second lock, we entered intelligence, language competence and the Big 5 personality traits. This regression analysis revealed that all ariables that showed significant zero-order correlations with CET informedness actually explained unique variance of the nformedness of creativity evaluations (F[9, 1105] = 27.63, p < 0.01; R2 = .18). Independent predictions of CET informedness ence include intelligence ( ̌ = .20, p < 0.01), language competence (� = .26, p < 0.01), openness ( ̌ = .13, p < 0.01), and consci- ntiousness ( ̌ = −.17, p < 0.01), which together explained �R2 = 17% of variance in CET informedness above and beyond the ontrol variables. Neuroticism, extraversion and agreeableness did not significantly predict CET informedness (all ps > 0.10).
Considering that evaluation bias was also correlated with intelligence, language competence, and openness, as well as with ET informedness, it seems possible that effects of cognitive ability and personality on CET informedness are to some extent ediated by evaluation bias. To test this mediation hypothesis, we computed a path analysis with MPlus 7 using maximum
ikelihood (ML) estimation. In this mediation model, informedness is directly predicted by intelligence, language competence, penness, as well as indirectly predicted by these variables via the potential mediator evaluation bias (see Fig. 1). The tatistical significance of indirect effects was determined by means of bias-corrected bootstrap (1000 iterations; MacKinnon, ockwood, & Williams, 2004). This path analysis revealed that evaluation bias significantly mediates the effects of all three redictors on creativity evaluation informedness. Intelligence and language competence show significant indirect effects � = .04, p < 0.05; � = .07, p < 0.01, respectively) and significant direct effects suggesting a partial mediation by evaluation ias (see Fig. 1). Openness showed a significant indirect effect (� = .09, p < 0.01), but no significant direct effect on CET
nformedness and hence was fully mediated by evaluation bias. Direct and indirect effects on informedness are generally mall but statistically significant due to the large sample size.
. Discussion
We examined the psychometric properties of a newly devised creativity evaluation test (CET) to measure creativity valuation skills in two consecutive studies. Study 1 describes the test analysis leading to the final test version, and examines ts validity with respect to common criteria of creativity. Study 2 analyzed the relationship between creativity evaluation kills and individual differences in cognitive ability, personality and the role of evaluation bias.
82 M. Benedek et al. / Thinking Skills and Creativity 21 (2016) 75–84
In study 1 we observed positive correlations between creativity evaluation skills (i.e., informedness in the CET) and divergent thinking creativity, as well as a tendency towards a positive correlation with divergent thinking fluency, two common indicators of creative cognitive potential. People who produce more creative ideas thus tend to be more discerning in the evaluation of ideas. This finding extends previous research, which has only provided evidence that creative people are better at identifying their own best ideas (Karwowski, 2015; Runco & Smith, 1992; Silvia, 2008), but not necessarily better at evaluating the ideas of others (Grohman et al., 2006; Runco & Smith, 1992). Moreover, it was unclear whether the association between creativity and intrapersonal evaluation skills is biased by the fact that the assessment of creative potential and evaluation skills relies on the same response data. Our findings substantiate the association between creativity evaluation skills and creative potential on the basis of independent assessments of these constructs. The finding corroborates the notion that evaluative processes play an important role in creative thought as claimed by creativity theory (Beaty, Silvia, Nusbaum, Jauk, & Benedek, 2014; Campbell, 1960; Cropley, 2006; Finke et al., 1992) and as also suggested by recent neuroscience evidence (Beaty, Benedek, Kaufman, & Silvia, 2015; Beaty, Benedek, Silvia, & Schacter, 2016). Similar arguments apply to the finding that creativity evaluation skills were also correlated with creative achievement. Higher evaluation skills may be helpful to adequately identify high potential ideas that have good chances to become acknowledged and hence are worth to invest in their realization (Sternberg & Lubart, 1992). Taken together, these findings provide initial support for the validity of the CET. Future research will also examine the predictive validity of the CET. For example, it would be highly interesting to see whether individual differences in creativity evaluation skills of prospective teachers actually predict later job performance in terms of teaching and fostering of students.