Subscribe: Psychological Assessment - Vol 18, Iss 3
Preview: Psychological Assessment - Vol 18, Iss 3

Psychological Assessment - Vol 29, Iss 6

Psychological Assessment publishes mainly empirical articles concerning clinical assessment. Papers that fall within the domain of the journal include research on the development, validation, application, and evaluation of psychological assessment instrum

Last Build Date: Fri, 23 Jun 2017 18:00:35 EST

Copyright: Copyright 2017 American Psychological Association

Taking forensic mental health assessment “out of the lab” and into “the real world”: Introduction to the special issue on the field utility of forensic assessment instruments and procedures.


The last several decades have seen a major upswing in the development and use of psychological assessment instruments in forensic and correctional settings. At the same time, admissibility standards increasingly have stressed the importance of the reliability and validity of evidence in legal proceedings. Recent research has, however, raised serious concerns about (a) the reliability of forensic science evidence in general, (b) the replicability of psychological research findings in general and in field settings especially, and (c) the interrater reliability and predictive validity of forensic psychological assessment evidence in particular. In this introduction to the special issue of Psychological Assessment on the field utility of forensic assessment instruments and procedures, we provide an overview of key issues bearing on field studies, focusing on why such research is critically important to improving the quality of the practice of forensic mental health assessments. We also identify various methodological issues and constraints relevant to conducting research outside of controlled settings. We conclude with recommendations for how future field research can improve upon the current state of the discipline in forensic mental health assessment. (PsycINFO Database Record (c) 2017 APA, all rights reserved)(image)

Field validity of Static-99/R scores in a statewide sample of 34,687 convicted sexual offenders.


The Static-99 (and revision, the Static-99R) reflect the most researched and widely used approach to sex offender risk assessment. Because the measure is so widely applied in jurisdictions beyond those on which it was developed, it becomes crucial to examine its field validity and the degree to which published norms and recidivism rates apply to other jurisdictions. We present a new and greatly expanded field study of the predictive validity (M = 5.23 years follow-up) of the Static-99 as applied system-wide in Texas (N = 34,687). Results revealed stronger predictive validity than a prior Texas field study, especially among offenders scored after the release of an updated scoring manual in 2003 (AUC = .66 to .67, d = .65 to .69), when field reliability was also stronger. But calibration analyses revealed that the Static-99R routine sample norms led to a significant overestimation of risk in Texas, especially for offenders with scores ranging from 1 to 5. We used logistic regression to develop local Texas recidivism norms (with confidence intervals) for Static-99R scores. Overall, findings highlight the importance of revisiting and updating field study findings, and the potential benefits of using statewide data to develop local norms. (PsycINFO Database Record (c) 2017 APA, all rights reserved)(image)

Actuarial risk assessment of sexual offenders: The psychometric properties of the Sex Offender Risk Appraisal Guide (SORAG).


The Sex Offender Risk Appraisal Guide (SORAG) is one of the most commonly used actuarial risk assessment instruments for sexual offenders. The aims of the present field study were to examine the predictive validity of the German version of the SORAG and its individual items for different offender subgroups and recidivism criteria in sexual offenders released from the Austrian Prison System (N = 1,104; average follow-up period M = 6.48 years) within a prospective-longitudinal research design. For the prediction of violent recidivism the German version of the SORAG yielded an effect size of AUC = .74 (p < .001, 95% CI = .70–.78). The predictive accuracy for general and violent recidivism was slightly higher than for general sexual and sexual hands-on recidivism. The effect sizes were found to be higher for the child molester sample than for rapists. However, the differences were significant only for general recidivism (z = 2.48, p = .001). Further analyses exhibited the SORAG to have incremental predictive validity beyond the VRAG and the PCL-R, and to remain the only significant predictor for violent recidivism once all 3 instruments were forced into a combined regression model. Twelve out of the 14 SORAG items were found to have a significant positive relationship with violent recidivism. The comparison of the relative and absolute risk indices between the Austrian and the Canadian samples showed that the normative data distribution yielded more (absolute risk indices) or less (relative risk indices) meaningful differences between the 2 countries. (PsycINFO Database Record (c) 2017 APA, all rights reserved)(image)

"Psychometric properties of the PTSD Checklist for Diagnostic and Statistical Manual of Mental Disorders–Fifth Edition (PCL-5) in veterans": Correction to Bovin et al. (2016).


Reports an error in "Psychometric properties of the PTSD Checklist for Diagnostic and Statistical Manual of Mental Disorders–Fifth Edition (PCL-5) in veterans" by Michelle J. Bovin, Brian P. Marx, Frank W. Weathers, Matthew W. Gallagher, Paola Rodriguez, Paula P. Schnurr and Terence M. Keane (Psychological Assessment, 2016[Nov], Vol 28[11], 1379-1391). In the article, the departments and affiliations were incorrectly listed for authors Michelle J. Bovin, Brian P. Marx, Matthew W. Gallagher, Paola Rodriguez, Paula P. Schnurr, and Terence M. Keane. The first department and affiliation for authors Michelle J. Bovin, Brian P. Marx, Matthew W. Gallagher, Paola Rodriguez, and Terence M. Keane and should have read “National Center for PTSD at VA Boston Healthcare System, Boston, Massachusetts”. The first department and affiliation for author Paula P. Schnurr should have read “National Center for PTSD, White River Junction, Vermont.” The online version of this article has been corrected. (The following abstract of the original article appeared in record 2015-55809-001.) This study examined the psychometric properties of the posttraumatic stress disorder (PTSD) Checklist for Diagnostic and Statistical Manual of Mental Disorders-Fifth Edition (PCL-5; Weathers, Litz, et al., 2013b) in 2 independent samples of veterans receiving care at a Veterans Affairs Medical Center (N = 468). A subsample of these participants (n = 140) was used to define a valid diagnostic cutoff score for the instrument using the Clinician-Administered PTSD Scale for DSM–5 (CAPS-5; Weathers, Blake, et al., 2013) as the reference standard. The PCL-5 test scores demonstrated good internal consistency (α = .96), test–retest reliability (r = .84), and convergent and discriminant validity. Consistent with previous studies (Armour et al., 2015; Liu et al., 2014), confirmatory factor analysis revealed that the data were best explained by a 6-factor anhedonia model and a 7-factor hybrid model. Signal detection analyses using the CAPS-5 revealed that PCL-5 scores of 31 to 33 were optimally efficient for diagnosing PTSD (κ(.5) = .58). Overall, the findings suggest that the PCL-5 is a psychometrically sound instrument that can be used effectively with veterans. Further, by determining a valid cutoff score using the CAPS-5, the PCL-5 can now be used to identify veterans with probable PTSD. However, findings also suggest the need for research to evaluate cluster structure of DSM–5. (PsycINFO Database Record (c) 2017 APA, all rights reserved)(image)

Field measures of psychopathy and sexual deviance as predictors of recidivism among sexual offenders.


Offenders with high levels of both psychopathy and deviant sexual interests are often described as being more prone to recidivate than other sexual offenders, and many forensic evaluators report considering this psychopathy and sexual deviance interaction when coming to conclusions about sex offender risk. However, empirical support for the interaction comes from studies using sexual deviance measures that are rarely used in the field. We examined the ability of Psychopathy Checklist-Revised (PCL-R) field scores and possible field measures of sexual deviance (e.g., paraphilia diagnosis, offense characteristics) to predict sexual recidivism among 687 offenders released after being evaluated for postrelease civil commitment (M follow-up = 10.5 years). PCL-R total scores and antisocial personality diagnoses were predictive of a combined category of violent or sexual recidivism, but not sexual recidivism. Paraphilia diagnoses and offense characteristics were not associated with an increased likelihood of reoffending. There was no evidence that those with high levels of both psychopathy and sexual deviance were more likely than others to reoffend. Although the psychopathy and sexual deviance interaction findings from prior studies are large and compelling, our findings highlight the need for research examining the best ways to translate those findings into routine practice. (PsycINFO Database Record (c) 2017 APA, all rights reserved)(image)

Use of structured professional judgment by probation officers to assess risk for recidivism in adolescent offenders.


The current study tested a method of risk assessment for adolescent offenders that relies on structured professional judgment: the Structured Assessment of Violence Risk for Youth (SAVRY; Borum, Bartel, & Forth, 2006). Trained probation officers in 3 jurisdictions administered the SAVRY to 505 adjudicated adolescents (M age = 15.43 years, SD = 1.62). The results supported the validity of the SAVRY administered in this juvenile justice context. Specifically, scores from the SAVRY differentiated violent from nonviolent offenders and predicted both violent and nonviolent recidivism over a 12-month follow-up period. Violent offenders showed more historical and individual risk factors than nonviolent offenders, and violent sex offenders were rated as more deficient in empathy and remorse. The anger control item was a particularly important indicator of risk for reoffending in the violent offender group. The implications of these findings for weighting risk factors in individual cases when using structured professional judgment are discussed. (PsycINFO Database Record (c) 2017 APA, all rights reserved)(image)

Are risk assessments racially biased?: Field study of the SAVRY and YLS/CMI in probation.


Risk assessment instruments are widely used by juvenile probation officers (JPOs) to make case management decisions; however, few studies have investigated whether these instruments maintain their predictive validity when completed by JPOs in the field. Moreover, the validity of these instruments for use with minority groups has been called into question. This field study examined the predictive validity of both the Structured Assessment of Violence Risk in Youth (SAVRY; n = 383) and the Youth Level of Service/Case Management Inventory (YLS/CMI; n = 359) for reoffending when completed by JPOs. The study also compared Black and White youth to examine the presence of test bias. The SAVRY and YLS/CMI significantly predicted reoffending at the test level, with most of the variance in reoffending accounted for by dynamic risk scales not static scales. The instruments did not differentially predict reoffending as a function of race but Black youth scored higher than White youth on the YLS/CMI scale related to official juvenile history. The implications for use of risk assessments in the field are discussed. (PsycINFO Database Record (c) 2017 APA, all rights reserved)(image)

How well do juvenile risk assessments measure factors to target in treatment? Examining construct validity.


There has been a surge of interest in using 1 type of risk assessment instrument to tailor treatment to juveniles to reduce recidivism. Unlike prediction-oriented instruments, these reduction-oriented instruments explicitly measure variable risk factors as “needs” to be addressed in treatment. There is little evidence, however, that the instruments accurately measure specific risk factors. Based on a sample of 237 serious juvenile offenders (Mage = 18, SD = 1.6), we tested whether California Youth Assessment Inventory (CA-YASI) scores validly assess the risk factors they purport to assess. Youth were assessed by practitioners with good interrater reliability on the CA-YASI, and by research staff on a battery of validated, multimethod criterion measures of target constructs. We meta-analytically tested whether each CA-YASI risk domain score (e.g., Attitudes) related more strongly to scores on convergent measures of theoretically similar constructs (e.g., criminal thinking styles) than to scores on discriminant measures of theoretically distinct constructs (e.g., intelligence, somatization, and pubertal status). CA-YASI risk domain scores with the strongest validity support were those that assess criminal history. The only variable CA-YASI risk domain score that correlated more strongly with convergent (Zr = .35) than discriminant (Zr = .07) measures was Substance Use. There was little support for the construct validity of the remaining 6 variable CA-YASI risk domains—including those that ostensibly assess strong risk factors (e.g., “Attitudes,” “Social Influence”). Our findings emphasize the need to test the construct validity of reduction-oriented instruments—and refine instruments to precisely measure their targets so they can truly inform risk reduction. (PsycINFO Database Record (c) 2017 APA, all rights reserved)(image)

Diagnostic field reliability in forensic mental health evaluations.


How likely are multiple forensic evaluators to agree on defendants’ diagnoses in routine forensic mental health evaluations? A total of 720 evaluation reports were examined from 240 cases in which 3 evaluators, working independently, provided diagnoses for the same defendant. Results revealed perfect agreement across 6 independent diagnostic categories in 18.3% of cases. Agreement for individual diagnostic categories was higher, with all 3 evaluators agreeing on the separate presence of psychotic, mood, or substance disorders in more than 64.7% of cases and agreeing on the presence of cognitive or developmental disorders in more than 89.7% of cases. However, evaluators agreed about the combination of psychotic and substance-related diagnoses in only 46.5% of cases. Agreement was enhanced by diagnoses with low base rates, and it was suppressed in evaluations conducted in jails. Psychiatrists and contracted evaluators were more likely to provide dissenting diagnostic categories than psychologists and state-employed evaluators. These results are among the first to document diagnostic agreement among nonpartisan practitioners in forensic evaluations conducted in the field, and they allow for practice and policy recommendations for evaluators in routine forensic practice to be made. (PsycINFO Database Record (c) 2017 APA, all rights reserved)(image)

Assessing “credible fear”: A psychometric examination of the Trauma Symptom Inventory-2 in the context of immigration court evaluations.


Recent immigration trends indicate that the United States is home to a remarkably diverse and rapidly growing population of displaced persons. Many of these individuals have survived exceptional trauma and are thus particularly vulnerable to trauma-related behavioral health disorders. Mental health professionals are commonly asked to assess immigrants within this population in the service of immigration court decision making. These assessments present a variety of challenges for clinicians, including the assessment and documentation of trauma-related symptoms across cultural bounds. The Trauma Symptom Inventory-2 (TSI-2) may be uniquely suited to the demands of immigration court assessments, but it has not been previously examined in a culturally diverse sample. The current study provided an examination of the TSI-2 within a sample of immigrants with histories of trauma. De-identified TSI-2 data were drawn from several clinicians’ existing immigration assessment files. Reliability and standardization sample comparison results indicated that the TSI-2 exhibits sufficient internal consistency within this population, and that immigrants with histories of trauma generally respond similarly to individuals in trauma-specific clinical samples (with several notable exceptions). Specific clinical implications are discussed. (PsycINFO Database Record (c) 2017 APA, all rights reserved)(image)

The influence of veteran race and psychometric testing on veterans affairs posttraumatic stress disorder (PTSD) disability exam outcomes.


This study examined the influence of veterans’ race and examiners’ use of psychometric testing during a Department of Veterans Affairs posttraumatic stress disorder (PTSD) disability examination on diagnostic and service connection status outcomes. Participants were 764 veterans enrolled in a national longitudinal registry. Current and lifetime PTSD diagnostic status was determined with the Structured Clinical Interview for DSM–IV (SCID) and was compared with PTSD diagnosis conferred upon veterans by their compensation and pension (C&P) examiners as well as with ultimate Veterans Affairs (VA) PTSD service connected status. The concordance rate between independent SCID current PTSD diagnosis and PTSD disability examination diagnosis was 70.4%, and between SCID lifetime PTSD diagnosis and PTSD disability examination diagnosis was 77.7%. Among veterans with current SCID diagnosed PTSD, Black veterans were significantly less likely than White veterans to receive a PTSD diagnosis from their C&P examiner (odds ratio [OR] = .39, p = .003, confidence interval [CI] = .20–.73). Among veterans without current SCID diagnosed PTSD, White veterans were significantly more likely than Black veterans to receive a PTSD diagnosis from their C&P examiner (OR = 4.07, p = .005, CI = 1.51–10.92). Splitting the sample by use of psychometric testing revealed that examinations that did not include psychometric testing demonstrated the same relation between veteran race and diagnostic concordance. However, for examinations in which psychometric testing was used, the racial disparity between SCID PTSD status and disability exam PTSD status was no longer significant. Results suggest that psychometric testing may reduce disparities in VA PTSD disability exam outcomes. (PsycINFO Database Record (c) 2017 APA, all rights reserved)(image)

Identifying mental health issues in detained youth: Testing the structure and invariance of the Massachusetts Youth Screening Inventory–Version 2 (MAYSI-2).


This study examined the factor structure of the Massachusetts Youth Screening Instrument–Version 2 (MAYSI-2), a brief self-report measure designed to flag clinically significant mental health needs among youth entering the juvenile justice system. Participants were 981 detained youth in the southeastern United States (mean age = 14.58 years; SD = 1.28 years; 67.5% male; 71.5% African American). Confirmatory factor analyses showed that a seven-factor model represented a satisfactory solution for the data, similar to previous research. The factor structure fit well across gender, age group, race (Black/White), and offense type (violent/nonviolent). Given the widespread use of the MAYSI-2 in juvenile justice settings, examining its psychometric properties is of key importance. Implications and limitations of the study are discussed. (PsycINFO Database Record (c) 2017 APA, all rights reserved)(image)

Using the MAYSI-2 to identify mental disorder among Latino juvenile offenders.


Many juvenile justice agencies have adopted the Massachusetts Youth Screening Inventory–Version 2 (MAYSI-2; Grisso & Barnum, 2006) to facilitate appropriate programming for young offenders with mental illness. Although Latinos are the fastest-growing ethnic group in the criminal justice system, there is scant research on the utility of the MAYSI-2 among Latino adolescents. The present study examined the utility of the MAYSI-2 in detecting diagnosable mental illness among 398 Latino and 60 European American adolescents in a juvenile justice agency. In addition to testing the scoring configuration used by the agency to identify adolescents in need of further attention, we tested 2 additional scoring configurations of the MAYSI-2. We found that the MAYSI-2 had similar utility at identifying serious mood and anxiety disorders for both ethnic groups, but was less sensitive to behavioral and substance use disorders among Latinos than it was among European Americans. In addition, the MAYSI-2 overall was less sensitive to mental illness among Latino boys compared with Latina girls. We discuss these findings within the context of best practices for identifying adolescents with mental illness in juvenile justice agencies. (PsycINFO Database Record (c) 2017 APA, all rights reserved)(image)

Psychometric properties and prognostic usefulness of the Youth Psychopathic Traits Inventory (YPI) as a component of a clinical protocol for detained youth: A multiethnic examination.


Prior studies have shown that the Youth Psychopathic Traits Inventory (YPI) holds promise as a self-report tool for assessing psychopathic traits in detained adolescents. However, these studies have been conducted in a research context where anonymity and confidentiality are provided. Few studies have examined the usefulness of the YPI in clinical settings. To address this research gap, the present study examined data from 1,559 detained boys who completed the YPI as part of a clinical protocol. Official criminal records were available for a subsample (n = 848), allowing us to test the prognostic usefulness of the YPI. Results of confirmatory factor analyses, overall, support the proposed 3-factor structure, though model fit indices were not as good in Dutch boys compared to boys from other ethnic groups. Measurement invariance tests showed that the YPI scores are manifested in the same way across all 4 ethnic groups and suggest that means scores between the 4 ethnic groups are comparable. The YPI scores were internally consistent, and correlations with external variables, including aggression and conduct problems, support the convergent validity of the interpretation of YPI scores. Finally, results demonstrated that YPI scores were not significantly positively related to future criminality. In conclusion, this study suggests that the YPI may hold promise as a self-report tool for assessing psychopathic traits in detained male adolescents during a clinical protocol. However, the finding that the YPI did not predict future offending suggests that this tool should not yet be used for risk assessment purposes in forensic settings. (PsycINFO Database Record (c) 2017 APA, all rights reserved)(image)

Can probation officers identify remorse among male adolescent offenders?


Judgments about a youth’s level of remorse are frequently used to make important decisions in the juvenile justice system that can have serious consequences to the person. Unfortunately, little is known about these ratings and what factors may influence them. In a sample of 325 1st-time youth offenders who were arrested for offenses of moderate severity, we tested whether probation officers’ ratings of an adolescent’s remorse soon after arrest were associated with the youth’s self-report of showing a callous and unemotional interpersonal style, being arrested for a violent offense, and several demographic and background characteristics (e.g., age, race, socioeconomic status [SES], and intelligence). Our analyses indicated that both arrest for a violent offense and the adolescent’s self-reported level of callous–unemotional (CU) traits were associated with probation officers’ ratings of remorse. Further, youth age, SES, and intelligence neither were associated with these judgments nor moderated the association between CU traits and probation officers’ ratings of remorse. However, youth race or ethnicity did moderate the association between CU traits and judgments of remorse, such that Latino youth who were high on CU traits showed a very low probability of being rated as remorseful. (PsycINFO Database Record (c) 2017 APA, all rights reserved)(image)

Examining the interrater reliability of the Hare Psychopathy Checklist—Revised across a large sample of trained raters.


The goal of the current study was to assess the interrater reliability of the Psychopathy Checklist—Revised (PCL–R) among a large sample of trained raters (N = 280). All raters completed PCL–R training at some point between 1989 and 2012 and subsequently provided complete coding for the same 6 practice cases. Overall, 3 major conclusions can be drawn from the results: (a) reliability of individual PCL–R items largely fell below any appropriate standards while the estimates for Total PCL–R scores and factor scores were good (but not excellent); (b) the cases representing individuals with high psychopathy scores showed better reliability than did the cases of individuals in the moderate to low PCL–R score range; and (c) there was a high degree of variability among raters; however, rater specific differences had no consistent effect on scoring the PCL–R. Therefore, despite low reliability estimates for individual items, Total scores and factor scores can be reliably scored among trained raters. We temper these conclusions by noting that scoring standardized videotaped case studies does not allow the rater to interact directly with the offender. Real-world PCL–R assessments typically involve a face-to-face interview and much more extensive collateral information. We offer recommendations for new web-based training procedures. (PsycINFO Database Record (c) 2017 APA, all rights reserved)(image)

The MacArthur Competence Assessment Tool—Criminal Adjudication: Factor structure, interrater reliability, and association with clinician opinion of competence in a forensic inpatient sample.


Adjudicative competence is the most frequently referred evaluation in the forensic context, and it is because of this that periodic evaluation of competence assessment instruments is imperative. Among those instruments, the MacArthur Competence Assessment Tool—Criminal Adjudication (MacCAT-CA) has demonstrated adequate psychometric properties suggesting its utility in informing the forensic inquiry. The purpose of the current study was to further investigate the psychometric properties and ultimate utility of subscale scores using archival data from a sample of 103 male and female forensic patients who were hospitalized for competence restoration treatment. Results of the present study suggested adequate internal consistency and good model fit for the factor structure. Interrater reliability was evaluated by comparing the absolute agreement of scores derived from 2 independent research assistants for each of the subscales; 2 of the 3 subscales fell within the acceptable range given established interpretative benchmarks for forensic assessment. Of particular interest was that the Appreciation subscale, while heralding the lowest intraclass correlation coefficient, explained the largest proportion of variance in clinician opinion relative to the other 2 subscales. In other words, the most subjective subscale (as evidenced by the lowest intraclass correlation), explained the largest proportion of variance in ultimate opinion. The authors argue that, although these results are an important consideration in these assessments, they are neither surprising nor entirely problematic when considering the case-specific nature of the inquiries on the subscale, as well as the subjectivity of scoring criteria for each of the Appreciation items. (PsycINFO Database Record (c) 2017 APA, all rights reserved)(image)

Field reliability influences field validity: Risk assessments of individuals found not guilty by reason of insanity.


Individuals acquitted as not guilty by reason of insanity (NGRI) are usually committed to psychiatric hospitals for treatment until they are considered suitable for conditional release back to the community. The clinical evaluations that inform conditional release decisions have rarely been studied but provide an ideal opportunity to examine the reliability and validity of complex evaluations in the field. For example, to what extent do forensic evaluators agree about an acquittee’s readiness for conditional release? And how accurate are their opinions? We reviewed 175 evaluation reports across 62 cases from Hawaii, which requires 3 separate evaluations from independent clinicians for each felony NGRI acquittee referred for conditional release evaluation. Evaluators agreed about an NGRI acquittee’s readiness for conditional release in only 53.2% of evaluations (κ = .35). Courts followed the majority evaluator opinion in 79.3% of all cases but ruled in an opposite direction from the majority evaluator opinion in more than a third of cases in which evaluators disagreed. Evaluators accurately differentiated those conditionally released acquittees who remained in the community from those who were rehospitalized in 62.4% of cases. Among the 43 insanity acquittees who were ultimately released, evaluator agreement was significantly associated with rehospitalization within 3 years. When the evaluators unanimously agreed that conditional release was appropriate, only 34.5% were rehospitalized. When the evaluators disagreed, 71.4% were rehospitalized. Overall, results reveal poor agreement among independent evaluators in routine practice but suggest that opinions may be more accurate when evaluators agree than when they disagree. (PsycINFO Database Record (c) 2017 APA, all rights reserved)(image)

Field reliability of competency and sanity opinions: A systematic review and meta-analysis.


We know surprisingly little about the interrater reliability of forensic psychological opinions, even though courts and other authorities have long called for known error rates for scientific procedures admitted as courtroom testimony. This is particularly true for opinions produced during routine practice in the field, even for some of the most common types of forensic evaluations—evaluations of adjudicative competency and legal sanity. To address this gap, we used meta-analytic procedures and study space methodology to systematically review studies that examined the interrater reliability—particularly the field reliability—of competency and sanity opinions. Of 59 identified studies, 9 addressed the field reliability of competency opinions and 8 addressed the field reliability of sanity opinions. These studies presented a wide range of reliability estimates; pairwise percentage agreements ranged from 57% to 100% and kappas ranged from .28 to 1.0. Meta-analytic combinations of reliability estimates obtained by independent evaluators returned estimates of κ = .49 (95% CI: .40–.58) for competency opinions and κ = .41 (95% CI: .29–.53) for sanity opinions. This wide range of reliability estimates underscores the extent to which different evaluation contexts tend to produce different reliability rates. Unfortunately, our study space analysis illustrates that available field reliability studies typically provide little information about contextual variables crucial to understanding their findings. Given these concerns, we offer suggestions for improving research on the field reliability of competency and sanity opinions, as well as suggestions for improving reliability rates themselves. (PsycINFO Database Record (c) 2017 APA, all rights reserved)(image)

Can credibility criteria be assessed reliably? A meta-analysis of criteria-based content analysis.


This meta-analysis synthesizes research on interrater reliability of Criteria-Based Content Analysis (CBCA). CBCA is an important component of Statement Validity Assessment (SVA), a forensic procedure used in many countries to evaluate whether statements (e.g., of sexual abuse) are based on experienced or fabricated events. CBCA contains 19 verbal content criteria, which are frequently adapted for research on detecting deception. A total of k = 82 hypothesis tests revealed acceptable interrater reliabilities for most CBCA criteria, as measured with various indices (except Cohen’s kappa). However, results were largely heterogeneous, necessitating moderator analyses. Blocking analyses and meta-regression analyses on Pearson’s r resulted in significant moderators for research paradigm, intensity of rater training, type of rating scale used, and the frequency of occurrence (base rates) for some CBCA criteria. The use of CBCA summary scores is discouraged. Implications for research vs. field settings, for future research and for forensic practice in the United States and Europe are discussed. (PsycINFO Database Record (c) 2017 APA, all rights reserved)(image)