Atkins v. Virginia: Implications and recommendations for forensic practice
BY GILBERT S. MACVAUGH III, PSY.D. AND MARK D. CUNNINGHAM, PH.D., ABPP
In 2002, the United States Supreme Court held in the landmark case of Atkins v. Virginia that the execution of individuals who have mental retardation is unconstitutional. Following the Atkins holding, courts in death penalty jurisdictions have relied heavily upon mental health professionals in making a determination of whether or not capital offenders have mental retardation. The determination of mental retardation in death penalty cases, however, presents complex challenges for both courts and mental health professionals. In addition, there is variability in how death penalty states define mental retardation and in the assessment methods used by mental health professionals to diagnose mental retardation in such cases. The purpose of this article is to (a) describe how statutes in death penalty jurisdictions have operationalized the various clinical definitions of mental retardation, (b) discuss issues confronting examiners in assessing and diagnosing mental retardation in Atkins cases, and (c) provide recommendations for forensic practice.
KEY WORDS: Mental retardation, Atkins, death penalty, capital punishment, intelligence, adaptive functioning, malingering.
© 2010 by Federal Legal Publications, Inc.
132 ATKINS V. VIRGINIA
In 2002, the United States Supreme Court held in Atkins v. Virginia that the execution of individuals who have mental retardation is unconstitutional because it violates the Eighth Amendment’s prohibition against cruel and unusual punishments. Bonnie (2004) has observed that one of the “striking aspects” of the Court’s decision in Atkins is that this prohibition is framed in the language of a clinical diagnosis. No other class of individuals is constitutionally exempt from the death penalty solely on the basis of a psychological diagnosis (DeMatteo, Marczyk, & Pich, 2007). Equally striking, the Atkins decision elevated psychodiagnostic assessment to an unprecedented position in criminal law. For the first time, a score on a psychological test(s) and an associated diagnostic finding became dispositive. Mental health professionals, by necessity, have become primary sources of information and expertise regarding these assessment and diagnostic determinations.
The scholarly literature has lagged in grappling with the complex issues surrounding professional practice in performing these assessments. Similarly, the fields of psychology and psychiatry are only just beginning to develop formal standards or guidelines for professional practice in Atkins cases. This is surprising, as there is no other type of psychodiagnostic evaluation in which the stakes are higher and the consequences of misdiagnosis are greater. The necessity of developing standards for evaluations in Atkins cases is also demonstrated by the limited specialized training of professionals undertaking these evaluations. As Olley (2006b) points out, few psychologists have extensive specialized training in the areas of forensic evaluation and mental retardation. In an unpublished survey by Macvaugh and Grisso (2006) of 20 forensic clinicians’ practices in post-conviction Atkins cases, 40% reported formal training in mental retardation, and 45% reported at least some formal training in forensic evaluation. Only one of the forensic clinicians surveyed (5%) reported significant formal training and experience in both the fields of mental retardation and forensic
133
evaluation. This is particularly problematic in light of the observation of Keyes, Edwards, and Derning (1998): “Training in traditional mental health graduate programs includes little, if any, information about mental retardation” (p. 535).
Professional standards for Atkins evaluations would promote greater uniformity of these evaluations, a characteristic that is not currently present. Results of informal surveys of psychologists’ professional practices in Atkins cases suggest that there is much variability in the assessment methods used to assess and diagnose mental retardation (Everington & Olley, 2004; Macvaugh & Grisso, 2006). Further, the articulation of such standards would illuminate what is generally accepted in the field, one of the factors governing the admissibility of scientific evidence (Daubert v. Merrell Dow Pharmaceuticals, Inc., 1993).
In 2005, Division 33 of the American Psychological Association (Mental Retardation and Developmental Disabilities) formed an Ad Hoc Committee (Olley, Greenspan, & Switzky, 2006) to identify issues related to mental retardation and the death penalty and to clarify psychologists’ role in Atkins proceedings. In August of 2008, the Ad Hoc Committee held a meeting at the American Psychological Association’s annual convention in Boston, Massachusetts to address the issue of standards of practice in Atkins cases. At this meeting, a panel of experts in the fields of mental retardation, forensic psychology, law, psychometrics, and others, convened to begin working on determining areas of consensus in the field regarding the assessment of mental retardation in Atkins proceedings. The panel interpreted the results of several recent unpublished surveys regarding professional practice in Atkins cases and began developing position statements regarding best practice. The results of the surveys reviewed by the panel are expected to be published in the near future. The work of the Ad Hoc Committee and position statements regarding the issues described above also are pending.
134 ATKINS V. VIRGINIA
This article seeks to inform the discussion on professional standards of practice for evaluations of mental retardation in capital cases by considering how this landmark decision has been variously operationalized by statutes across death penalty jurisdictions, the commonalities and differences in “clinical” definitions of mental retardation, and issues encountered by mental health professionals who conduct evaluations of mental retardation in capital cases. The associated “practice recommendations” are those of the authors alone.
Operationalizing Atkins
The Atkins Court made reference to definitions of mental retardation both by the American Association on Mental Retardation (AAMR, 1992) and the diagnostic criteria for mental retardation in the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition, Text Revision (DSM-IV- TR) (American Psychiatric Association, 2000). These will be detailed subsequently. The Court, however, left to the individual states the task of how to define mental retardation, as well as the procedures for making these determinations. This lack of specificity would seem to be a prudent way of allowing for the inevitable evolution of the diagnostic criteria of mental retardation as this intellectual and behavioral deficiency is understood by the mental health professions, as well as providing individual states some discretion in selecting from the various professionally-accepted diagnostic criteria. An unsurprising expression of this ambiguity is the variability across death penalty jurisdictions regarding which definition of mental retardation is used (DeMatteo et al., 2007) and the procedures for assessments and determination of mental retardation in such cases (Duvall & Morris, 2006).
A wrinkle of some moment, however, is introduced by the rather cryptic language of the majority opinion:
135
In this case, for instance, the Commonwealth of Virginia disputes that Atkins suffers from mental retardation. Not all people who claim to be mentally retarded will be so impaired as to fall within the range of mentally retarded offenders about whom there is a national consensus. (Atkins v. Virginia, 2002, p. 317)
This language can be interpreted as standing for the proposition that some offenders will attempt to assert mental retardation who do not meet the nationally-accepted diagnostic criteria to be classified as “mentally retarded.” Alternatively, this language could reflect an expectation that not all persons with mental retardation will be “retarded enough” to qualify for an exemption from the death penalty. In this latter interpretation, the diagnosis of mental retardation is a necessary, but not sufficient condition. Instead of a national consensus regarding diagnostic classification (i.e., substantially a professional/clinical determination), this latter interpretation invokes a “community values” determination not unlike competency and sanity considerations. A “community values” approach to restricting death penalty exemption to a subcategory of capital offenders with mental retardation has been asserted by the Texas Criminal Court of Appeals in Ex parte Briseno (2004).
It is thus understandable that those in the mental health profession should define mental retardation broadly to provide an adequate safety net for those who are at the margin and might well become mentally-unimpaired citizens if given additional social services support. We, however, must define that level and degree of mental retardation at which a consensus of Texas citizens would agree that a person should be exempted from the death penalty. Most Texas citizens might agree that Steinbeck’s Lennie [Footnote: See John Steinbeck, Of Mice and Men (1937)] should, by virtue of his lack of reasoning ability and adaptive skills, be exempt. But, does a con- sensus of Texas citizens agree that all persons who might legiti- mately qualify for assistance under the social services definition of mental retardation be exempt from an otherwise constitutional penalty? Put another way, is there a national or Texas consensus that all of those persons whom the mental health profession might diagnose as meeting the criteria for mental retardation are automati- cally less morally culpable than those who just barely miss meeting those criteria? Is there, and should there be, a “mental retardation”
136 ATKINS V. VIRGINIA
bright-line exemption from our state’s maximum statutory punish- ment? As a court dealing with individual cases and litigants, we decline to answer that normative question without significantly greater assistance from the citizenry acting through its Legisla- ture...Some might question whether the same definition of mental retardation that is used for providing psychological assistance, social services, and financial aid is appropriate for use in criminal trials to decide whether execution of a particular person would be constitutionally excessive punishment. (Ex parte Briseno, 2-11-04)
Two aspects of this Texas Court of Criminal Appeals decision are notable. First, a Texas consensus is substituted for a national consensus as specified by the Atkins Court. Second, the seven criteria specified by the Texas Court of Criminal Appeals to identify the subcategory of capital offenders with mental retardation who would be exempted from the death penalty reflect a level of impairment that is consistent with Moderate Mental Retardation (IQ = 40-55) or Severe Mental Retardation (IQ = 25-40), rather than the Mild Mental Retardation category (IQ = 55-70), which constitutes virtually all capital offenders who have mental retardation. The seven criteria of the Briseno opinion operationalize an Atkins interpretation that only exempts a subcategory of persons with mental retardation from execution. That said, the authors are unaware of a case—in Texas or elsewhere— where a capital defendant was identified as having mental retardation by clinical/professional standards, but then found not retarded enough to be exempted from the death penalty.
There are obviously grave problems with mental health professionals idiosyncratically parsing a subcategory of offenders who are sufficiently mentally retarded to meet a community consensus of death penalty ineligibility. Accordingly, it is our position that mental health professionals in Atkins proceedings are tasked with making what is essentially a psychodiagnostic assessment, in this case of mental retardation, albeit in a forensic context. This is in sharp contrast to the psycholegal assessments that are undertaken in evaluations of competency to stand trial and criminal responsibility.
137
Practice recommendation 1
Because restricting death penalty ineligibility to a subcategory of particularly impaired offenders with mental retardation has not yet been tested by the U.S. Supreme Court and because mental health professionals possess no special expertise in identifying community values, it is recommended that an Atkins assessment of a capital defendant specify the clinical/professional definition of mental retardation being employed and how the offender in question comports with that standard, in addition to illuminating more restricted jurisdictionally-specific criteria.
Regardless of whether the Court envisioned a diagnostic or diagnostic + community values determination, the definition of mental retardation (operationalized in diagnostic criteria) holds a critical position. Mental retardation has been defined by several professional organizations in the field. The American Association on Intellectual and Developmental Disabilities (AAIDD) (formerly the AAMR) and the American Psychiatric Association (APA) have provided the two most widely accepted definitions. Ellis (2003) has observed that many state legislatures enacted statutes based on the definition provided by the American Association on Mental Deficiency (1983), the former name for the AAMR: “Mental retardation refers to significantly subaverage general intellectual functioning existing concurrently with deficits in adaptive behavior and manifested during the developmental period” (p. 11). Nine years later, the AAMR (1992) revised its definition, with an emphasis on refining the adaptive functioning component of the previous version:
Mental retardation refers to substantial limitations in present func- tioning. It is characterized by significantly subaverage intellectual functioning, existing concurrently with related limitations in two or more of the following applicable adaptive skill areas: communica- tion, self-care, home living, social skills, community use, self- direction, health and safety, functional academics, leisure, and work. Mental retardation manifests before age 18. (p. 1)
The AAMR (1992) definition was cited in Atkins and adopted by several state legislatures in the 1990s (Ellis, 2003).
Definitions of mental retardation
138 ATKINS V. VIRGINIA
However, it has been criticized for lacking theoretical grounding and empirical research support (Greenspan, 1997). In addition, Olley et al. (2006) have raised the question as to whether or not a consensus exists in the field regarding the meaning of the 10 domains of adaptive behavior when applied in a forensic context.
The American Psychiatric Association’s current definition in the DSM-IV-TR (APA, 2000) contains language similar to the definition by the AAMR (1992) and was also one of the definitions cited by the Court in Atkins:
A. Significantly subaverage intellectual functioning: an IQ test of approximately 70 or below on an individually administered IQ test (for infants, a clinical judgment of significantly subaverage intel- lectual functioning). B. Concurrent deficits or impairments in adap- tive functioning (i.e., the person’s effectiveness in meeting the standards expected for his or her age by his or her cultural group) in at least two of the following areas: communication, self-care, home living, social/interpersonal skills, use of community resources, self- direction, functional academic skills, work, leisure, health, and safety. C. The onset is before age 18 years. (p. 49)
Five days before the Court’s decision in Atkins, the AAMR (2002) again revised its definition, primarily by modifying the description of adaptive functioning: “Mental retardation is a disability characterized by significant limitations both in intellectual functioning and in adaptive behavior as expressed in conceptual, social, and practical adaptive skills. This disability originates before age 18” (p. 1).
The most recent AAMR (2002) definition also does not specify a particular IQ score in its description of significant limitations in intellectual functioning. Instead, this prong of the definition is operationalized as an IQ score of “approximately two standard deviations below the mean, considering the standard error of the measurement for specific assessment instruments used and the instruments’ strengths and limitations” (p. 14). Similarly, the AAMR (2002) definition further defines significant limitations in adaptive
139
Definitions of mental retardation in death penalty statutes
behavior as: “performance that is at least two standard deviations below the mean of either (a) one of the following three types of adaptive behavior: conceptual, social, or practical, or (b) an overall score on a standardized measure of conceptual, social, and practical skills” (p. 14).
In the authors’ experience, the issue of which definition should be used by experts in forming an opinion regarding mental retardation is routinely debated in Atkins proceedings. This also has been raised as a controversial issue in the professional literature (Olley et al., 2006). Ellis (2003), who argued Atkins before the United States Supreme Court, has suggested that the AAMR (2002) definition is the most appropriate, because it contains the three essential components of all definitions cited in the Atkins decision. The current AAMR definition also has been described as being more consistent with contemporary thinking and research related to the assessment of adaptive behavior (Everington & Olley, 2008). Because of its tripartite model of conceptualizing adaptive behavior (i.e., conceptual, social, and practical), the 2002 AAMR definition better addresses the issue of impaired social intelligence, which has been described as a key characteristic of those with mental retardation (Greenspan, Switzky, & Granfield, 1996), and particularly those who become involved in the criminal justice system (Greenspan, Loughlin, & Black, 2001). However, Olley et al. (2006) have questioned whether a new definition, at least in terms of measuring deficits in adaptive behavior, is needed for the purpose of forensic cases.
A recent review by DeMatteo et al. (2007) of state legislation defining mental retardation reflects a general acceptance of professional/clinical definitions of mental retardation, though endorsing different definitions or only a portion of the diagnostic criteria. More specifically, four death penalty states (i.e., Delaware, Idaho, North Carolina, and Oklahoma) use the DSM-IV-TR definition. Six death penalty states (i.e., Connecticut, Florida, Oregon, Texas, Virginia, and
140 ATKINS V. VIRGINIA
Washington) have adopted either the 1992 or the 2002 AAMR definition. Only one state, Maryland, has adopted the definition provided by the American Psychological Association, which consists of significant limitations in general intellectual functioning, significant concurrent limitations in adaptive functioning, and onset prior to age 22 (Jacobson & Mulick, 1996). The remaining states that currently permit the death penalty have statutes that define mental retardation in ways that diverge somewhat from the DSM-IV-TR, AAMR, and American Psychological Association definitions (DeMatteo et al., 2007).
The differences between definitions across statutes exist primarily in terms of whether or not all three prongs of the definition are required (i.e., significantly subaverage intellectual functioning, limitations in adaptive functioning, and age of onset) and whether any or all of the three prongs are specifically operationalized in the definition (e.g., IQ score of 70 or below, deficits in two out of ten areas of adaptive behavior). Eight states’ statutes (Alabama, Colorado, Georgia, Nevada, New Hampshire, New Jersey, Ohio, and South Carolina) use the three prongs common to widely accepted definitions in the field, but do not operationalize any of these three criteria by identifying a specific IQ score, the required number of adaptive deficits, or a particular age of onset (DeMatteo et al., 2007). Twelve states (i.e., Arizona, California, Indiana, Kentucky, Louisiana, Missouri, Mississippi, Pennsylvania, South Dakota, Tennessee, Utah, and Wyoming) that currently permit the use of the death penalty have statutes containing all three prongs common to most definitions; but these statutes operationalize only one or two of the three clinical criteria common to all definitions (DeMatteo et al., 2007). Four states (i.e., Arkansas, Illinois, Nebraska, and New Mexico) allow IQ scores that are below a specified cutoff to constitute presumptive evidence of mental retardation, regardless of whether an individual has demonstrated deficits in adaptive functioning and onset during the developmental period (DeMatteo et al., 2007).
141
This appears to focus the determination on the more “objective” data provided by intelligence testing, even if broadening the classification of eligible offenders.
Many states with statutory definitions of mental retardation have not revised their statutes post-Atkins. As of September of 2008, 12 death penalty states have yet to develop statutes for determining mental retardation in Atkins cases. These include: Alabama, Mississippi, Montana, New Hampshire, New Jersey, Ohio, Oklahoma, Oregon, Pennsylvania, South Carolina, Texas, and Wyoming (DPIC, 2008). Although most of these states have statutes that define mental retardation (DeMatteo et al., 2007), it is unclear how these statutes apply in Atkins proceedings. Some death penalty states, such as Mississippi and Texas, which do not yet have statutes to define mental retardation specifically for the purpose of Atkins proceedings, have adopted the Atkins decision in case law (i.e., Chase v. State, 2004; Ex parte Briseno, 2004). In the face of definitional differences between individual states, it is likely that the Atkins decision is applied inconsistently across death penalty jurisdictions. As DeMatteo et al. (2007) have observed:
Given the differing definitions of mental retardation among the states . . . an offender diagnosed as mentally retarded in one state may not qualify for that diagnosis in a neighboring state due to def- initional differences. As such, after Atkins, where a capital crime is committed has a large effect on whether an offender can be sen- tenced to death. (p. 791)
Beyond these types of definitional issues, the Atkins Court also did not specify how, when, or by whom the issue of mental retardation is to be decided in capital cases. In most states, the judge makes the determination of mental retardation in Atkins cases (Ellis, 2003). But, procedures vary across jurisdictions with regard to when the issue must be raised, who has the burden of persuasion, and the burden of proof that is required (DPIC, 2008). Such procedural differences increase the likelihood that the Atkins decision will be applied inconsistently across death penalty states
142 ATKINS V. VIRGINIA
Diagnosis and misdiagnosis of mental retardation in Atkins cases
(DeMatteo et al., 2007; Duvall & Morris, 2006; Orpen, 2003).
Because the population of individuals with mental retardation consists mostly (i.e., approximately 85%) of those who function in the mild range of impairment (APA, 2000), and because their impairments are often not immediately observable, accurate diagnosis for this subpopulation can be particularly difficult (Everington & Olley, 2008). This issue is of no less concern among capital offenders who have mental retardation, as virtually all are within the mild category of mental retardation.
Some commentators (Baroff, 1991; Keyes et al., 1998) have suggested that misdiagnosis may, in part, be due to a lack of understanding of the definition of mental retardation and failure to properly assess each diagnostic criterion. Misdiagnosis also may stem from inaccurate and stereotyped notions regarding the characteristics of those with mental retardation (Everington & Olley, 2008; Keyes et al., 1998; Olvera, Dever, & Earnest, 2000). For example, those with mild mental retardation who become involved in the criminal justice system typically do not exhibit stereotypical physical or behavioral characteristics commonly associated with severe mental retardation. As a result, they are often misperceived as having a “normal” appearance (Keyes et al., 1998). Basing a diagnostic finding on first impression is additionally problematic, as persons with mental retardation often attempt to compensate for their limitations through behaviors that mask their disability (Keyes et al., 1998). Though there are variations in the course and behavioral expression of mild mental retardation, a particularly cogent description of mild mental retardation was provided the Editorial Board of the APA Division 33 in the Manual of Diagnosis and Professional Practice in Mental Retardation (1996):
143
Practice recommendation 2
People classified with mild MR evidence small delays in the preschool years but often are not identified until after school entry, when assessment is undertaken following academic failure or emer- gence of behavior problems. Modest expressive language delays are evident during early primary school years, with the use of 2- to 3- word sentences common. During the later primary school years, these children develop considerable expressive speaking skills, engage with peers in spontaneous interactive play, and can be guided into play with larger groups. During middle school, they develop complex sentence structure, and their speech is clearly intelligible. The ability to use simple number concepts is also pre- sent, but practical understanding of the use of money may be lim- ited. By adolescence, normal language fluency may be evident. Reading and number skills will range from 1st to 6th-grade level, and social interests, community activities, and self-direction will be typical of peers, albeit as affected by pragmatic academic skill attainments. Baroff (1986) ascribed a mental age range of 8 to 11 years to adults in this group. This designation implies variation in academic skills, and for a large proportion of these adults, persis- tent low academic skill attainment limits their vocational opportuni- ties. However, these people are generally able to fulfill all expected adult roles. Consequently, their involvement in adult services and participation in therapeutic activities following completion of edu- cation preparation is relatively uncommon, is often time-limited or periodic, and may be associated with issues of adjustment or dis- ability conditions not closely related to MR. (pp. 17-18)
Regardless of the definition (e.g., DSM-IV-TR, AAMR) used to diagnose mental retardation in death penalty cases, evaluators should address all three of the clinical components of the widely accepted definitions in the field (Everington & Olley, 2008). In addition, because of the high stakes nature of these cases, it is essential that forensic assessment methods are consistent with standards of professional practice and psychological testing (Ellis, 2003; Ellis & Luckasson, 1985; Olvera et al., 2000; see also American Psychological Association, 2002; Committee on Ethical Guidelines for Forensic Psychologists, 1991).
Despite the minor differences between the various clinical definitions of mental retardation, all have three common components: (1) significant deficits in intellectual
144 ATKINS V. VIRGINIA
Instruments for measuring intelligence
According to the current and most widely accepted definitions of mental retardation, intellectual functioning must be assessed using standardized, individually administered measures of intelligence (AAMR, 2002; APA, 2000). In addition, only global measures of intelligence are acceptable for making a diagnosis of mental retardation (AAMR, 1992; Sattler, 2002).
There are three intelligence tests that are generally accepted measures of mental retardation for adults (Everington & Olley, 2008). The most current editions of these instruments include the Wechsler Adult Intelligence Scale—Fourth Edition (WAIS-IV) (Wechsler, 2008); the Stanford-Binet Intelligence Scale—Fifth Edition (SB-5) (Roid, 2003); and the Kaufman Adolescent and Adult Intelligence Test (KAAIT) (Kaufman & Kaufman, 1993). The WAIS-III (and now WAIS- IV) and the SB-5 are considered by many practitioners as the “gold standard” in assessments of mental retardation in death penalty cases (Macvaugh & Grisso, 2006). Studies have shown that historically, the Wechsler scales have been the most emphasized in graduate level psychological assessment courses (Oakland & Zimmerman, 1986) and also have tended to be the most frequently used by clinical psychologists in practice (Kaufman, 1990).
The WAIS-IV contains ten Core Subtests and five Supplemental Subtests, producing scores on four scales: verbal comprehension,
functioning; (2) related or concurrent deficits in adaptive functioning; and (3) onset during the developmental period. In the following sections, we discuss the assessment of these three prongs, with additional emphasis on the topic of assessment of malingered mental retardation and other controversial issues related to the evaluation of mental retardation in death penalty cases.
Assessment of intellectual functioning
145
perceptual reasoning, working memory, and processing speed. These replace the verbal and performance (non-verbal) scales that had characterized earlier editions of the WAIS. The WAIS-IV yields a General Ability Index and Full Scale IQ score. IQ scores on the WAIS-IV have a mean of 100 and a standard deviation of 15. Therefore, an overall IQ score of 70 on the WAIS-IV is two standard deviations below the mean and represents the bottom 2.2% of the standardization sample.
Not uncommonly, mental health professionals will encounter group-administered intelligence test scores in the records of capital offenders. For example the Revised Beta Examination (Kellogg & Morton, 1978) has been widely used as a screening test for inmates who are entering into correctional facilities (Baroff, 1991; 2003). The Revised Beta Examination is a nonverbal, group administered, intelligence test that was originally developed during World War I for assessments of draftees who were unable to read English. It should not be given the same weight at the Wechsler or Stanford Binet scales and should not be used to diagnose mental retardation (Baroff, 1991; Everington & Olley, 2008). Because independent effort cannot be assured, mental health professionals are also cautioned about relying on scores from group-administered tests, particularly when administered in a correctional setting, to rule out mental retardation.
In addition, scores from short forms and/or abbreviated tests of intelligence such as the Wechsler Abbreviated Scale of Intelligence (WASI) (Wechsler, 1999), Slosson Intelligence Test—Revised (Slosson, 1991), and the Kaufman Brief Intelligence Test (K-BIT) (Kaufman & Kauman, 1990) also are occasionally encountered in the records of capital defendants or utilized in Atkins evaluations (Everington & Olley, 2008). These, however, should be considered supplemental and should not be given the same weight as the more comprehensive, global measures of intelligence (e.g., WAIS-IV, SB-5), which are required for diagnosing mental retardation (Keyes et al., 1998).
146 ATKINS V. VIRGINIA
Practice recommendation 3
Factors affecting interpretation of intelligence test scores
Standard error of measurement
In evaluating the intellectual prong in Atkins evaluations, mental health professionals should place primary reliance on scores from global, individually administered, comprehensive, multisubtest, standardized measures of intelligence.
Even when global, individually administered, standardized tests of intelligence are used in accordance with standards of professional practice, there are a number of factors that affect the interpretation of IQ scores, all of which can greatly impact the diagnosis of mental retardation in Atkins cases. These include: (a) standard error of measurement, (b) practice effects, (c) the Flynn Effect, (d) active symptoms of mental illness, (e) cultural and linguistic factors, and (f) verbal and performance IQ score discrepancies. In addition to discussing factors that affect IQ score interpretation, we will further consider the related issues of the examiner’s clinical judgment and the imprecision of IQ scores.
A fundamental assumption in the field of psychological assessment is that all tests have error. Error invariably exists in intelligence testing because of factors related to test construction. Test error is defined in psychometric terms as the standard error of measurement (SEM), which provides an estimate of the amount of error in a person’s observed test score. The SEM is simply another way of expressing the reliability of a test; as the reliability of the instrument increases, the SEM decreases, which gives the examiner more confidence in the accuracy of an observed score. The SEM is calculated based on the reliability coefficient and standard deviation of the instrument. The SEM varies across instruments, age ranges, and even between individual IQ scores due to the statistical concept of regression to mean (Kaufman & Lichtenberger, 1999). The key point here is that that a particular obtained IQ score should be interpreted as existing within the range of error for the test instrument (e.g., “confidence interval”), as an obtained score is only an estimate of a person’s “true” IQ score. For example, if a 32 year-old male capital murder defendant obtained a Full Scale
147
Practice recommendation 4
Practice effects
IQ score of 72 on the WAIS-III, because of the SEM (at a “.95% confidence interval”), there is a 95% chance that this his “true” Full Scale IQ score would likely fall somewhere between 67 and 76 (because the 95% confidence interval is 72 +/- the SEM of 2.32 x 1.96 = 4.5). Because of the measurement error associated with all intelligence test scores, it is possible to diagnose mental retardation based on an IQ score of 75 or below, as long as there is evidence of related deficits in adaptive behavior (AAMR, 1992; APA, 2000).
Error in intellectual assessment is not solely a function of psychometric statistics. Other sources of error or assessment imprecision may involve the examinee, the examiner, and/or the testing situation on the particular day in which the test is administered. Such factors include the mental and physical health, mood, effort, and motivation of the examinee during testing; subtle examiner mistakes in administration and scoring; and other events that occur unexpectedly in the testing environment that create a less than optimal testing situation (e.g., poor lighting, noise distractions in the testing room).
Reports of IQ scores obtained by a capital defendant should include a description of these scores in light of the SEM at an identified confidence interval. Efforts should be made to minimize other sources of error by strict adherence to test instructions and rechecking scoring. When additional error is introduced, such as through sub-optimum testing conditions or examiner mistakes in test administration or scoring, these should be candidly and proactively acknowledged.
Gain scores, also called “practice effects,” can be caused by repeated administrations of the same intelligence test in a short period of time. This may be problematic in Atkins cases should multiple experts administer the same intelligence test to offenders within a relatively brief timeframe. Practice effects tend to be larger on performance (non-verbal) subtests, most likely because these types of tasks are only novel during their first administration, and they become more
148 ATKINS V. VIRGINIA
Practice recommendation 5
Flynn effect
familiar on subsequent administrations because an examinee may recall the strategy used to solve the problems measured by the test items (Kaufman & Lichtenberger, 1999).
Estimates of practice effects based on test—retest administrations over an interval of several weeks or months amounted to approximately two to three points for Verbal IQ, nine to ten points for Performance IQ, and six to seven points for Full Scale IQ (Kaufman, 1990; 1994); although this tends to vary by age (Kaufman & Lichtenberger, 1999). As noted in the WAIS-III and WMS-III Technical Manual (The Psychological Corporation, 1997), in one study involving 394 subjects in the standardization sample of the WAIS-III who were tested and retested at a mean interval of 34.6 days, mean test scores were two to three points higher on Verbal IQ scores, three to eight points higher on Performance IQ scores, and two to three points higher on Full Scale IQ scores; this was attributable “mainly to practice” (p. 57). These gains reflect only exposure to the test, not valid improvements in intellectual ability. Accordingly, the impact of such gains can have critical implications in Atkins evaluations.
Avoid administration of the same intellectual assessment within 12 months. Testing protocols should reflect verbatim responses from the examinee, allowing other professionals to reasonably scrutinize the findings and reduce the necessity of redundant assessments. Further, mental health experts should be prepared to analyze test scores in light of practice effects and carefully explain these considerations to legal professionals.
The Flynn Effect is a well-established finding that IQ scores are inflating (becoming increasing overestimates) by approximately .31 points per year from the date of test standardization to the date of test administration (AAMR, 2005; Flynn, 1984a, 1984b, 1987, 1998, 2000, 2006; Kanaya, Scullin, & Ceci, 2003). Thus, an individual’s IQ score becomes artificially increased as a function of when the
149
intelligence test was administered relative to the date in which it was standardized. The Flynn Effect is more pronounced for performance (i.e., nonverbal or fluid) intelligence.
Although the Flynn Effect is a well-established statistical phenomenon of intelligence tests and has gained general acceptance in the scientific community (Neisser, 1998), the practice of adjusting individual IQ scores downward in capital cases to correct for the Flynn Effect is an issue of some debate in the post-Atkins era. Lack of widespread adoption of Flynn Effect score corrections in Atkins evaluations may be a function of limited familiarity of examiners with this concept. Instruction regarding the modification of individual IQ scores to account for the Flynn Effect has not traditionally been a component of psychology graduate school training in intelligence testing. Not surprisingly, then, correcting IQ scores for the Flynn Effect in clinical practice has also lagged behind the scientific acceptance of this statistical phenomenon.
The implications of the Flynn Effect are not limited to Atkins evaluations or even the forensic arena. In a large-scale study designed to explore the impact of the Flynn Effect and its impact on special education placement recommendations, Kanaya et al. (2003) reviewed archived special education records for 8,944 school-age children from nine sites around the United States who had been tested and retested for special education programs and had IQ scores that fell in the borderline and the mild range of mental retardation. By comparing students’ Full Scale IQ scores on the older Wechsler Intelligence Scale for Children—Revised (WISC-R; Wechsler, 1974) to their scores on the newer Wechsler Intelligence Scale for Children—Third Edition (WISC-III; Wechsler, 1991), Kanaya et al. (2003) found that students in both groups lost an average of 5.6 points when retested with the newer version of the test. Stated differently, these students’ scores on the outdated WISC-R were on average 5.6
150 ATKINS V. VIRGINIA
points higher compared to their scores when tested on the renormed WISC-III, and these students also were more likely to be classified as mentally retarded compared to their peers who were retested on the same test (Kanaya et al., 2003).
Flynn (2006, 2007) and Greenspan (2006, 2007), as well as Schalock et al. (2007), have advocated that it is appropriate to adjust individual test scores to account for the Flynn Effect in Atkins cases (see also Kanaya et al., 2003). Specifically, Flynn (2006, 2007) proposed that individual IQ scores should be lowered 0.3 points per year, in order to cover the period of time between the year in which the test was normed and the year in which a person was administered the test. Flynn (2006, 2007) further proposed that an additional 2.34 points should be deducted from IQ scores obtained on the WAIS-III because of a sampling error in its standardization. In an attempt to correct the “tree stump” phenomenon, whereby a subject was able to obtain an IQ score in the 40s without giving a single correct answer, The Psychological Corporation, the publisher of the WAIS-III, apparently did too good of a job in stratifying for low ability, in that the sample contained too many low scoring subjects, which produced norms that overstated IQ by 2.34 points (Greenspan, 2007). According to Flynn (2007), for example, an IQ score of 81 on the WAIS-III obtained in 2007 should be reduced 3.6 points to account for 12 years of obsolescence, and then further reduced by 2.34 points to account for the sampling error unique to the WAIS-III, yielding a total IQ score reduction of 5.94 points. Using Flynn’s (2006, 2007) proposed score reductions, an IQ score of 81 (after subtracting approximately six points), therefore, becomes a corrected IQ score of 75, which is the upper limit for mild mental retardation when considering the SEM. However, this recommendation is not without disagreement (see Moore, 2006). Further, the publisher of the Wechsler tests does not endorse the recommendation to modify WAIS-III scores to correct for the Flynn Effect (Weiss, 2007).
151
Practice recommendation 6
Mental illness and IQ scores
Although the practice of adjusting individual IQ scores in capital cases to account for the Flynn Effect has been argued in a number of Atkins cases at both the trial and appellate court levels, the courts’ willingness to accept the Flynn Effect has varied. For example, in the California case of People v. Vidal (2007), the trial court accepted the Flynn Effect and noted that it must be considered in the determination of the defendant’s IQ. Some courts have ruled that the Flynn Effect should be considered on a case-by-case basis (e.g., Walker v. True, 2005), whereas others have explicitly rejected the Flynn Effect. In Ledford v. Head (2008), for example, the Federal Court for the Northern District of Georgia noted, “The Court is not impressed by the evidence concerning the Flynn effect...The Court is hesitant to apply a theory that is used solely for the purpose of lowering IQ scores in a death penalty context” (p. 7). (Note, however, the discussion of Kanaya et al. (2003) regarding applications to special education and mental retardation classifications of children.) To date, no state statute addresses the Flynn Effect (Duvall & Morris, 2006).