Psychological Assessment publishes mainly empirical articles concerning clinical assessment. Papers that fall within the domain of the journal include research on the development, validation, application, and evaluation of psychological assessment instrum

Development of an itemwise efficiency scoring method: Concurrent, convergent, discriminant, and neuroimaging-based predictive validity assessed in a large community sample.


Traditional “paper-and-pencil” testing is imprecise in measuring speed and hence limited in assessing performance efficiency, but computerized testing permits precision in measuring itemwise response time. We present a method of scoring performance efficiency (combining information from accuracy and speed) at the item level. Using a community sample of 9,498 youths age 8–21, we calculated item-level efficiency scores on 4 neurocognitive tests, and compared the concurrent, convergent, discriminant, and predictive validity of these scores with simple averaging of standardized speed and accuracy-summed scores. Concurrent validity was measured by the scores’ abilities to distinguish men from women and their correlations with age; convergent and discriminant validity were measured by correlations with other scores inside and outside of their neurocognitive domains; predictive validity was measured by correlations with brain volume in regions associated with the specific neurocognitive abilities. Results provide support for the ability of itemwise efficiency scoring to detect signals as strong as those detected by standard efficiency scoring methods. We find no evidence of superior validity of the itemwise scores over traditional scores, but point out several advantages of the former. The itemwise efficiency scoring method shows promise as an alternative to standard efficiency scoring methods, with overall moderate support from tests of 4 different types of validity. This method allows the use of existing item analysis methods and provides the convenient ability to adjust the overall emphasis of accuracy versus speed in the efficiency score, thus adjusting the scoring to the real-world demands the test is aiming to fulfill. (PsycINFO Database Record (c) 2016 APA, all rights reserved)(image)

Interrater reliability of Violence Risk Appraisal Guide scores provided in Canadian criminal proceedings.


Published research suggests that most violence risk assessment tools have relatively high levels of interrater reliability, but recent evidence of inconsistent scores among forensic examiners in adversarial settings raises concerns about the “field reliability” of such measures. This study specifically examined the reliability of Violence Risk Appraisal Guide (VRAG) scores in Canadian criminal cases identified in the legal database, LexisNexis. Over 250 reported cases were located that made mention of the VRAG, with 42 of these cases containing 2 or more scores that could be submitted to interrater reliability analyses. Overall, scores were skewed toward higher risk categories. The intraclass correlation (ICCA1) was .66, with pairs of forensic examiners placing defendants into the same VRAG risk “bin” in 68% of the cases. For categorical risk statements (i.e., low, moderate, high), examiners provided converging assessment results in most instances (86%). In terms of potential predictors of rater disagreement, there was no evidence for adversarial allegiance in our sample. Rater disagreement in the scoring of 1 VRAG item (Psychopathy Checklist–Revised; Hare, 2003), however, strongly predicted rater disagreement in the scoring of the VRAG (r = .58). (PsycINFO Database Record (c) 2016 APA, all rights reserved)(image)

Validation of the Narcissistic Grandiosity Scale and creation of reduced item variants.


The Narcissistic Grandiosity Scale (NGS) is a short adjective-based measure of narcissistic grandiosity (Rosenthal, Hooley, & Steshenko, 2007). The NGS has already shown promise as a measure of grandiose narcissism, but it has never been the subject of a formal validation study. In the current study (N = 870 across 3 samples), the factor structure of NGS was examined and item response theory analyses were used to generate abbreviated versions of the scale. The NGS scales’ relations to measures of grandiose and vulnerable narcissism, the five-factor model (FFM), the interpersonal circumplex, self-esteem, and the Personality Inventory of the Diagnostic and Statistical Manual for Mental Disorders-Fifth Edition (DSM–5, PID-5) were assessed. The correlation profile of the NGS was also correlated with expert ratings of prototypical cases of narcissistic personality disorder using both the FFM and PID-5 trait profiles. Overall, the NGS was found to be a unidimensional measure of narcissistic grandiosity with good convergent, discriminant, and criterion validity. The abbreviated versions of the NGS manifested strong reliability and associations entirely consistent with the full version. (PsycINFO Database Record (c) 2016 APA, all rights reserved)(image)

Convergent and discriminant validity of alternative measures of maladaptive personality traits.


The purpose of the current study was to test empirically the convergent and discriminant validity of 3 recently developed, alternative measures of maladaptive personality traits: the Personality Inventory for Diagnostic and Statistical Manual for Mental Disorders-Fifth Edition (DSM–5, PID-5; Krueger, Derringer, Markon, Watson, & Skodol, 2012), the Computerized Adaptive Test-Personality Disorder Static Form (CAT-PD-SF; Simms et al., 2011), and Five Factor Model Personality Disorder scales (FFMPD; Widiger, Lynam, Miller, & Oltmanns, 2012). These measures were constructed with different rationales and methods, yet the result was highly congruent. The PID-5 and CAT-PD-SF were administered to 286 community adults with current or a history of mental health treatment; the CAT-PD-SF and FFMPD scales to 262 such adults; and the PID-5 and FFMPD scales to 266. The results indicated good to excellent internal consistency, as well as good to excellent convergent and discriminant validity for most scales with a few notable exceptions. Suggestions for future research are provided, including the potential benefits of scales that are unique to a respective instrument, replication of a dependency factor, and exploration as to the basis for instances of questionable convergent or discriminant validity. (PsycINFO Database Record (c) 2016 APA, all rights reserved)(image)

An approach for estimating item sensitivity to within-person change over time: An illustration using the Alzheimer’s Disease Assessment Scale–Cognitive subscale (ADAS-Cog).


When assessments are primarily used to measure change over time, it is important to evaluate items according to their sensitivity to change, specifically. Items that demonstrate good sensitivity to between-person differences at baseline may not show good sensitivity to change over time, and vice versa. In this study, we applied a longitudinal factor model of change to a widely used cognitive test designed to assess global cognitive status in dementia, and contrasted the relative sensitivity of items to change. Statistically nested models were estimated introducing distinct latent factors related to initial status differences between test-takers and within-person latent change across successive time points of measurement. Models were estimated using all available longitudinal item-level data from the Alzheimer’s Disease Assessment Scale–Cognitive subscale, including participants representing the full-spectrum of disease status who were enrolled in the multisite Alzheimer’s Disease Neuroimaging Initiative. Five of the 13 Alzheimer’s Disease Assessment Scale–Cognitive items demonstrated noticeably higher loadings with respect to sensitivity to change. Attending to performance change on only these 5 items yielded a clearer picture of cognitive decline more consistent with theoretical expectations in comparison to the full 13-item scale. Items that show good psychometric properties in cross-sectional studies are not necessarily the best items at measuring change over time, such as cognitive decline. Applications of the methodological approach described and illustrated in this study can advance our understanding regarding the types of items that best detect fine-grained early pathological changes in cognition. (PsycINFO Database Record (c) 2016 APA, all rights reserved)(image)

Likelihood of obtaining Structured Interview of Reported Symptoms (SIRS) and SIRS-2 elevations among forensic psychiatric inpatients with screening elevations on the Miller Forensic Assessment of Symptoms Test.


The Miller Forensic Assessment of Symptoms Test (M-FAST) was designed as a screening measure for feigned psychiatric symptoms. When M-FAST Total Scores are elevated (raw score ≥6), the test manual recommends follow-up with a more comprehensive measure of feigning, such as the widely used and researched Structured Interview of Reported Symptoms (SIRS) or the revised version of the test (SIRS-2). The purpose of the current study was to evaluate how often M-FAST screening elevations are associated with subsequent elevations on the SIRS or SIRS-2. The sample included archival data from 100 forensic psychiatric inpatients who obtained M-FAST Total Score elevations ≥6 during screening and were subsequently administered the SIRS (that was also rescored using SIRS-2 criteria). Among examinees who elevated the M-FAST over the recommended cutoff, 66.0% met standard SIRS feigning criteria, 42% met SIRS-2 criteria for feigning, and 81.0% obtained at least 1 SIRS/SIRS-2 elevation in the Probable Feigning range or higher. These results are consistent with the M-FAST manual guidelines, which support the use of the ≥6 M-FAST cutoff score to screen for potential feigning (but not as an independent marker of feigning). A higher M-FAST cutoff score of ≥16 was associated with subsequently meeting full SIRS criteria for feigning in 100.0% of protocols. Because the SIRS criteria were designed to have very low false positive rates, these findings indicate that more confident assertions about feigning can be made when elevations reach this level on the MFAST. (PsycINFO Database Record (c) 2016 APA, all rights reserved)(image)

Are fearless dominance traits superfluous in operationalizing psychopathy? Incremental validity and sex differences.


Researchers are vigorously debating whether psychopathic personality includes seemingly adaptive traits, especially social and physical boldness. In a large sample (N = 1,565) of adult offenders, we examined the incremental validity of 2 operationalizations of boldness (Fearless Dominance traits in the Psychopathy Personality Inventory [Lilienfeld & Andrews, 1996]; Boldness traits in the triarchic model of psychopathy [Patrick, Fowles, & Krueger, 2009]), above and beyond other characteristics of psychopathy, in statistically predicting scores on 4 psychopathy-related measures, including the Psychopathy Checklist—Revised (PCL–R). The incremental validity added by boldness traits in predicting the PCL–R’s representation of psychopathy was especially pronounced for interpersonal traits (e.g., superficial charm, deceitfulness). Our analyses, however, revealed unexpected sex differences in the relevance of these traits to psychopathy, with boldness traits exhibiting reduced importance for psychopathy in women. We discuss the implications of these findings for measurement models of psychopathy. (PsycINFO Database Record (c) 2016 APA, all rights reserved)(image)

Correspondence between correctional staff and offender ratings of adaptive behavior.


Although several experts have raised concerns about using correctional officers as informants for adaptive behavior assessments, no studies have compared ratings from correctional officers to those from other informants. We compared Adaptive Behavior Assessment System—Second Edition (ABAS–II; Harrison & Oakland, 2003) scores assigned by correctional staff to those assigned by probationers (N = 56) residing in a community corrections facility. Correctional staff assigned markedly lower scores than did probationers on many ABAS–II scales (d = .59 to 1.41 for ABAS–II composite scores). Although none of the probationers qualified for a diagnosis of intellectual disability, 29% received a staff-report ABAS-II composite score that was more than 2 SDs below the normative sample mean, suggesting significant impairment. Correlations between ABAS–II and intelligence measure scores were lower than expected for both types of informants, although they were somewhat stronger for self-report. Lower staff-report scores were associated with higher levels of probationer-reported psychopathology and need for treatment. Overall, these findings highlight limitations of using correctional staff as informants for adaptive behavior assessments. (PsycINFO Database Record (c) 2016 APA, all rights reserved)(image)

Does the self-report Inventory of Callous-Unemotional Traits predict recidivism?


Callous-unemotional (CU) traits, that is, a lack of guilt or empathy and poverty of emotion, are believed to be the developmental precursor to psychopathy in adulthood, capturing its emotional detachment dimension. Similar to psychopathic adults, research shows that children and adolescents with high CU traits represent an important population at heightened risk for criminal behavior. The present study is the first to examine whether a self-report measure of CU traits, the Inventory of Callous-Unemotional Traits (ICU), predicts general and violent recidivism postinstitutional release among a sample of 227 juvenile justice-involved adolescent boys (M age = 15.73, SD = 1.27). Results indicated that boys high on CU traits were faster to reoffend postrelease both nonviolently (Hazard Ratio [HR] = 1.27, p < .01) and violently (HR = 1.54, p < .05). Further, the Uncaring subscale of the ICU predicted faster time to general recidivism (HR = 1.21, p < .05), whereas the Callousness subscale (i.e., “I do not care who I hurt to get what I want”) predicted faster time to violent recidivism (HR = 1.39, p < .05). The present study provides preliminary support for the predictive validity of a brief, yet comprehensive self-report measure of CU traits. Findings inform youth risk assessment by offering possibilities within the domain of self-report for screening high-risk youth in need of intensive, comprehensive, and individualized intervention. (PsycINFO Database Record (c) 2016 APA, all rights reserved)(image)

The application of item response theory in developing and validating a shortened version of the Emirate Marital Satisfaction Scale.


The aim of this study was to determine the feasibility of generating a shorter version of the Emirati Marital Satisfaction Scale (EMSS) using item response theory (IRT)-based methodology. The EMSS is the first national scale used to provide an understanding of the family function and level of marital satisfaction within the cultural context of the United Arab Emirates. A sample of 1,049 Emirati married individuals from different ages, genders, places of residence, and monthly incomes participated in this study. The IRT was calibrated using X-Calibre 4.2 and the graded response model. The analysis was developed on the basis of a short form of the EMSS (7 items), which constitutes a promising alternative to the original scale for practitioners and researchers. This short version is reliable, valid, and it gives results very similar to the original scale. The results of this study confirmed the usefulness of IRT-based methodology for developing psychological and counseling scales. (PsycINFO Database Record (c) 2016 APA, all rights reserved)(image)

Norms for developmental milestones using VABS-II and association with anthropometric measures among apparently healthy urban Indian preschool children.


Assessment of developmental milestones based on locally developed norms is critical for accurate estimate of overall development of a child’s cognitive, behavioral, social, and emotional development. A cross-sectional study was done to develop age specific norms for developmental milestones using Vineland Adaptive Behavior Scales (VABS-II) (Sparrow, Cicchetti, & Balla, 2005) for apparently healthy children from 2 to 5 years from urban Bangalore, India, and to examine its association with anthropometric measures. Mothers (or caregivers) of 412 children participated in the study. Age-specific norms using inferential norming method and adaptive levels for all domains and subdomains were derived. Low adaptive level, also called delayed developmental milestone, was observed in 2.3% of the children, specifically 2.7% in motor and daily living skills and 2.4% in communication skills. When these children were assessed on the existing U.S. norms, there was a significant overestimation of delayed development in socialization and motor skills, whereas delay in communication and daily living skills were underestimated (all p < .01). Multiple linear regression revealed that stunted and underweight children had significantly lower developmental scores for communication and motor skills compared with normal children (β coefficient ranges from 2.6–5.3; all p < .01). In the absence of Indian normative data for VABS-II in preschool children, the prevalence of developmental delay could either be under- or overestimated using Western norms. Thus, locally referenced norms are critical for reliable assessments of development in children. Stunted and underweight children are more likely to have poorer developmental scores compared with healthy children. (PsycINFO Database Record (c) 2016 APA, all rights reserved)(image)

Measurement invariance and child temperament: An evaluation of sex and informant differences on the Child Behavior Questionnaire.


Parent reports of temperament are used to study many important topics in child development, such as whether boys and girls differ in their levels of emotional reactivity and self-regulation. However, questions regarding measurement equivalence in parental reports of temperament are largely unexplored, despite the fact that this issue is critical for drawing the correct conclusions from mean-level comparisons. In the current study, measurement invariance across boys and girls (as targets), and mothers and fathers (as informants), was investigated in the Child Behavior Questionnaire (CBQ; Rothbart et al., 2001) using a sample of children ranging in age from 3 to 7 years (N = 605). Several instances of noninvariance were identified across both girls and boys, and mothers and fathers. An evaluation of effect size indices suggests that the practical impact of this noninvariance ranges from negligible to moderate. All told, this study illustrates the importance of taking a psychometrically informed approach to the use of parent reports of child temperament. (PsycINFO Database Record (c) 2016 APA, all rights reserved)(image)

Dimensional latent structure of PTSD-symptoms reporting: Is it adding by subtracting?


Although posttraumatic stress disorder (PTSD) is used as a distinct diagnosis in clinical practice, its symptoms were characterized as a dimensional structure in several taxometric analyses. However, a categorical latent structure of PTSD could be superimposed by using indistinct PTSD symptoms that can appear within the framework of other trauma-induced syndromes (e.g., depression, anxiety disorders). For that reason, in revising the International Classification of Diseases (ICD-11), a core set of cardinal symptoms that determine the presence of PTSD as selectively as possible will be used. To determine whether the latent status of a recommended core set of PTSD symptoms is dimensional, the authors analyzed the latent status of PTSD symptoms reported by participants who had experienced at least 1 traumatic event during their lifetime in 2 nationwide surveys of the German population (N = 1,212). Using the Posttraumatic Diagnostic Scale (PDS), they applied 3 popular taxometric methods: maximum eigenvalue, mean above minus below a cut, and latent mode factor analysis, using the core set and PTSD symptom clusters of previous taxometric studies. Although the analysis replicated findings of previous taxometric analyses using symptom clusters, the item core-set approach indicated a categorical solution of PTSD cardinal symptoms. These results seem to support the procedure used by the ICD-11 expert group. (PsycINFO Database Record (c) 2016 APA, all rights reserved)(image)

Factor structure of the ARIC-NCS Neuropsychological Battery: An evaluation of invariance across vascular factors and demographic characteristics.


Neuropsychological test batteries are designed to assess cognition in detail by measuring cognitive performance in multiple domains. This study examines the factor structure of tests from the ARIC-NCS battery overall and across informative subgroups defined by demographic and vascular risk factors in a population of older adults. We analyzed neuropsychological test scores from 6,413 participants in the Atherosclerosis Risk in Communities Neurocognitive Study (ARIC-NCS) examined in 2011–2013. Confirmatory factor analysis (CFA) was used to assess the fit of an a priori hypothesized 3-domain model, and fit statistics were calculated and compared to 1- and 2-domain models. Additionally, we tested for stability (invariance) of factor structures among different subgroups defined by diabetes, hypertension, age, sex, race, and education. Mean age of participants was 76 years, 76% were White, and 60% were female. CFA on the a priori hypothesized 3-domain structure, including memory, sustained attention and processing speed, and language, fit the data better (comparative fit index [CFI] = 0.973, root mean square error of approximation [RMSEA] = 0.059) than the 2-domain (CFI = 0.960, RMSEA = 0.070) and 1-domain (CFI = 0.947, RMSEA = 0.080) models. Bayesian information criterion value was lowest, and quantile–quantile plots indicated better fit, for the 3-domain model. Additionally, multiple-group CFA supported a common structure across the tested demographic subgroups, and indicated strict invariance by diabetes and hypertension status. In this community-based population of older adults with varying levels of cognitive performance, the a priori hypothesized 3-domain structure fit the data well. The identified factors were configurally invariant by age, sex, race, and education, and strictly invariant by diabetes and hypertension status. (PsycINFO Database Record (c) 2016 APA, all rights reserved)(image)

Evaluating the psychometric properties of the Interpersonal Needs Questionnaire and the Acquired Capability for Suicide Scale in military veterans.


Joiner’s (2005) interpersonal-psychological theory of suicide (IPTS) has become one of the most frequently studied in the field. Currently there are 2 primary measures designed to assess the 3 main constructs of the theory—the Interpersonal Needs Questionnaire (INQ; Van Orden, Witte, Gordon, Bender, & Joiner, 2008) and the Acquired Capability for Suicide Scale (ACSS; Van Orden et al., 2008). The psychometric properties of these 2 measures were evaluated in a sample of 477 U.S. military veterans. It was determined that the factor structure for both measures is consistent with the underlying theory and that all internal consistency reliability estimates are good. Acceptable convergent validity was found for the INQ, but not for the ACSS. Recommendations for refining the ACSS based on the results of the current analyses are provided. Comparisons of scale performance were made with data from participants with and without a history of 1 or more suicide attempts. Burdensomeness alone and the interaction between thwarted belongingness and burdensomeness were associated with prior suicide attempts. In conclusion, although some refinement may improve performance of the ACSS, both measures are appropriate and psychometrically sound for use in research and clinical applications with veterans of the U.S. military. (PsycINFO Database Record (c) 2016 APA, all rights reserved)(image)

Establishing an Information Avoidance Scale.


People differ in their openness to different types of information and some information may evoke greater avoidance than does other information. We developed an 8-item measure of people’s tendency to avoid learning information. The flexible instrument can function as both a predictor and outcome measure. The results from 4 studies involving 7 samples and 4,393 participants reveal that scores on the measure are generally internally consistent, remain relatively stable across time, and correlate modestly with measures of similar constructs and with avoidance behavior. The measure is adaptable to a variety of types of information (e.g., health outcomes, attractiveness feedback) and is internally consistent in several distinct populations (e.g., high school students, college students, U.S. adults, low-socioeconomic-status adults). Discussion centers on potential uses for the scale and an online supplement discusses a 2-item version of the scale. (PsycINFO Database Record (c) 2016 APA, all rights reserved)(image)

Norm comparisons of the Spanish-language and English-language WAIS-III: Implications for clinical assessment and test adaptation.


This study provides a systematic comparison of the norms of 3 Spanish-language Wechsler Adult Intelligence Scales (WAIS–III) batteries from Mexico, Spain, and Puerto Rico, and the U.S. English-language WAIS–III battery. Specifically, we examined the performance of the 4 normative samples on 2 identical subtests (Digit Span and Digit Symbol-Coding) and 1 nearly identical subtest (Block Design). We found that across most age groups the means associated with the Spanish-language versions of the 3 subtests were lower than the means of the U.S. English-language version. In addition, we found that for most age ranges the Mexican subsamples scored lower than the Spanish subsamples. Lower educational levels of Mexicans and Spaniards compared to U.S. residents are consistent with the general pattern of findings. These results suggest that because of the different norms, applying any of the 3 Spanish-language versions of the WAIS–III generally risks underestimating deficits, and that applying the English-language WAIS–III norms risks overestimating deficits of Spanish-speaking adults. There were a few exceptions to these general patterns. For example, the Mexican subsample ages 70 years and above performed significantly better on the Digit Symbol and Block Design than did the U.S. and Spanish subsamples. Implications for the clinical assessment of U.S. Spanish-speaking Latinos and test adaptation are discussed with an eye toward improving the clinical care for this community. (PsycINFO Database Record (c) 2016 APA, all rights reserved)(image)