Psychometric analysis of the Generalized Anxiety Disorder Scale and the Patient Health Questionnaire using Mokken scaling and confirmatory factor analysis

The Generalized Anxiety Disorder Scale (GAD-7) and Patient Health Questionnaire Depression Scale (PHQ-9) are widely used measures in primary health care used to assess anxiety and depression. While there is no doubt about their reliability and validity, there is some question over their factor structure. Many have suggested that both are one factor scales. However, there is some evidence that they can also be seen as having two subscales that correspond to a Cognitive/Affective aspect and a Somatic aspect. In this study the dimensional structure of both scales is examined using Mokken analysis which is an Item Response Theory approach, and also confirmatory factor analysis. The relationship of the scales two each other and also to the Work and Social Adjustment Scale (WSAS) is also investigated. There is evidence to support the idea of a Cognitive/Affective and Somatic subscale in both measures. While both measures are positively correlated with the WSAS, the PhQ-9 and indeed the PHQ-2, have significantly stronger relationships with it than the GAD-7. *Correspondence to: Steven Muncer, Clinical Psychology, Psychology Department, University of Teesside, Middlesbrough, TS1 3BA, UK, E-mail: S.Muncer@tees.ac.uk Received: July 16, 2018; Accepted: July 24, 2018; Published: July 27, 2018 Introduction Generalized anxiety disorder and depression are among the most frequent disorders in primary care with a prevalence rate of about 8 – 10 % [1], and also show a high degree of comorbidity [2]. The seven item Generalized Anxiety Disorder Scale (GAD-7) [3]is a widely used measure of anxiety which has been shown to have good psychometric properties on numerous occasions. Most recently Jordan, et al. [4] investigated the properties of the GAD 7 using item response theory as well as classical test theory methods. Similarly, the Patient Health Questionnaire Depression Scale (PHQ9) [5,6] has also been shown to have good psychometric properties. Ryan, et al. [7] for example, have shown that the factor structure of the PHQ 9 was not affected by different methods of data collection, face to face or telephone interview. They are both, therefore, widely used tests with a fair amount of evidence attesting both to their reliability and validity. There are, however, still some areas which are open to discussion or have contradictory positions. For example, there is disagreement over the factor structure of the PHQ-9. Many authors have suggested that the scale is best seen as a one factor scale. Ryan, et al. [7] for example found that a one factor model provided a good fit in his sample of 23672 patients from the UK’s Improving Access to Psychological Therapies (IAPT) programme, as long as some of the error covariances were allowed to correlate. The PHQ-9, however, has not always been found to fit a single factor model. Beard, et al. [8] studied 1,023 psychiatric participants who completed the PHQ-9 at admission and discharge from an outpatient programme. Confirmatory factor analysis (CFA) suggested a two-factor solution; the first factor represented cognitive and affective symptoms whilst the second factor reflected somatic symptoms. Furthermore, Elhai, et al. [9] study of 2,615 Army National Guard Soldiers in Ohio, USA used CFA to evaluate three, two-factor models previously established in the literature. A two-factor model (X2 = 210.35, p <0.001, CFI = .96, TLI = .94, RMSEA = .05) fitted the data better than a single factor model (X2 = 317.71, p <0.001, CFI = .94, TLI = .91, RMSEA = .06). The preferred two-factor model reflected a somatic factor and a cognitive-affective factor of depressive symptoms. The cognitive-affective items loading on to factor 1 were items 1 (Anhedonia), 2 (Depressed mood), 6 (Feelings of worthlessness) and 9 (Suicidal ideation). Items 3 (Sleep difficulties), 4 (Fatigue), 5 (Appetite changes), 7 (Concentration difficulties) and 8 (Psychomotor agitation) loaded on to the somatic. Whilst Spitzer, et al. [3] criterion standard study has been supported by several studies in different populations, the GAD-7 has also been found to have a different factor structure to that discovered originally [10,11]. Within an acute psychiatric population (N = 232) in the US Kertz, et al. [10] found that although the GAD-7 showed excellent internal consistency (Cronbach’s ⍺ = 0.91), confirmatory factor analysis failed to support a unidimensional factor structure. The sample included patients with a diagnosis of: social anxiety disorder (n = 42), panic disorder (n = 27), obsessive-compulsive disorder (OCD, n = 25) and PTSD (n = 19). Kertz, et al. [10] found that items 5 (‘Being so restless that it is hard to sit still’) and 6 (‘Becoming easily annoyed or irritable’) loaded only moderately (0.52 and 0.53 respectively) on to the latent factor in comparison to all other items (0.64 0.81). A unidimensional factor structure was only found to be a good fit if items 4 (‘Trouble relaxing’), 5 and 6 could co-vary. Whilst the sample of each anxiety disorder was relatively small, it is suggested that the GAD-7 may perform differently in anxiety disorders other than GAD. Boothroyd L (2018) Psychometric analysis of the Generalized Anxiety Disorder Scale and the Patient Health Questionnaire using Mokken scaling and confirmatory factor analysis Volume 2(4): 2-4 Health Prim Car, 2018 doi: 10.15761/HPC.1000145 A larger scale study of patients receiving brief intensive CBT at a partial hospital program (N = 1,082) in the US by Beard and Bjorgvinsson [11] found the GAD-7 to have psychometric properties like those found in Kertz et al. [10] study. Of the 1,082 patients, 108 (11.7%) had a primary diagnosis of panic disorder, 96 (10.4%) had a primary diagnosis of PTSD and 89 (9.8%) had a primary diagnosis of OCD. The GAD-7 demonstrated good internal consistency across the total sample (Cronbach’s ⍺ = 0.88). A rotated 2-factor structure was found to account for 70% of the variance. Within the 2-factor structure, the first factor included items 1 (Feeling nervous, anxious or on edge), 2 (Not being able to stop or control worrying), 3 (Worrying too much about different things) and 7 (Feeling afraid as if something awful might happen). The second factor included the remaining items 4, 5 and 6. This 2-factor structure supports the findings of Kertz, et al. [10] This could suggest a separate cognitive-affective and a somatic or behavior factor measured by the GAD-7, which has also been highlighted in studies of the PHQ-9. The GAD-7 was found to have a single factor structure in predominantly primary care samples. This was not supported in a psychiatric sample and samples that included a range of anxiety disorders. A two-factor structure was found which appeared to separate GAD-7 items reflecting the cognitive and emotional experiences of anxiety (items 1, 2, 3 and 7) from items that reflected more physical, behavioural manifestations of anxiety (4, 5 and 6). In 2009 Kroenke, et al. [6] provided evidence for an ultra-brief screening scale called the PHQ-4, which was based on a combination of the PHQ-2 (‘Feeling down, depressed or hopeless’ and ‘Little interest or pleasure in doing things’) and GAD-2 (Feeling, nervous anxious or on edge’ and ‘Not being able to stop or control worrying’), taking the first two items from each of the scales which have already been shown to be good for screening. Interestingly the analysis of the GAD-7 by Jordan, et al. [4] also found that the first item pair was better than almost all the others with the possible exception of the second and third item pair, but so far there has been no item response theory analysis of the PHQ-9. Another area of interest with these scales is in the possible overlap of symptoms and items. There is clearly often comorbidity between depression and anxiety with the two co-occurring at as much as 50% of the time [12]. The strong correlation between the two scales is both evidence of convergent validity in reflecting the comorbidity of the disorders, but also might indicate the possibility of redundancy among items. In general, psychometric analysis has treated the two scales as separate and conducted two sets of analyses on the items. It would be useful to analyze the items as if they were one scale and see if there is redundancy, and also investigate the number of factors needed to explain all 16 items. In the current study an item response theory approach, Mokken analysis, will be adopted to investigate the items and separate scales, in a similar way to which Jordan, et al. [4] investigated the GAD-7. Mokken scaling is a non-parametric method of item response theory which can be used to investigate the dimensional structure of scales. Mokken scaling is similar to Rasch scaling techniques but has the advantage of having fewer restrictions in its use [13]. Although based on Guttman scaling, Mokken does not assume error-free data. Nor does it include assumptions about the sigmoid shape of item characteristic curves that can result in the rejection of many items and so decrease the reliability of the resultant measure. Confirmatory factor analysis will also be used to investigate the factor structure of the items both as separate scales and together. Lastly, we will look at the relationship of the various scales and subscales to a simple measure of impairment in functioning, the Work and Social Adjustment Scale [14].


Introduction
Generalized anxiety disorder and depression are among the most frequent disorders in primary care with a prevalence rate of about 8 -10 % [1], and also show a high degree of comorbidity [2]. The seven item Generalized Anxiety Disorder Scale (GAD-7) [3]is a widely used measure of anxiety which has been shown to have good psychometric properties on numerous occasions. Most recently Jordan, et al. [4] investigated the properties of the GAD 7 using item response theory as well as classical test theory methods. Similarly, the Patient Health Questionnaire Depression Scale (PHQ9) [5,6] has also been shown to have good psychometric properties. Ryan, et al. [7] for example, have shown that the factor structure of the PHQ 9 was not affected by different methods of data collection, face to face or telephone interview. They are both, therefore, widely used tests with a fair amount of evidence attesting both to their reliability and validity.
There are, however, still some areas which are open to discussion or have contradictory positions. For example, there is disagreement over the factor structure of the PHQ-9. Many authors have suggested that the scale is best seen as a one factor scale. Ryan, et al. [7] for example found that a one factor model provided a good fit in his sample of 23672 patients from the UK's Improving Access to Psychological Therapies (IAPT) programme, as long as some of the error covariances were allowed to correlate. The PHQ-9, however, has not always been found to fit a single factor model. Beard, et al. [8] studied 1,023 psychiatric participants who completed the PHQ-9 at admission and discharge from an outpatient programme. Confirmatory factor analysis (CFA) suggested a two-factor solution; the first factor represented cognitive and affective symptoms whilst the second factor reflected somatic symptoms. Furthermore, Elhai, et al. [9] study of 2,615 Army National Guard Soldiers in Ohio, USA used CFA to evaluate three, two-factor models previously established in the literature. A two-factor model (X 2 = 210.35, p <0.001, CFI = .96, TLI = .94, RMSEA = .05) fitted the data better than a single factor model (X 2 = 317.71, p <0.001, CFI = .94, TLI = .91, RMSEA = .06). The preferred two-factor model reflected a somatic factor and a cognitive-affective factor of depressive symptoms. The cognitive-affective items loading on to factor 1 were items 1 (Anhedonia), 2 (Depressed mood), 6 (Feelings of worthlessness) and 9 (Suicidal ideation). Items 3 (Sleep difficulties), 4 (Fatigue), 5 (Appetite changes), 7 (Concentration difficulties) and 8 (Psychomotor agitation) loaded on to the somatic. Whilst Spitzer, et al. [3] criterion standard study has been supported by several studies in different populations, the GAD-7 has also been found to have a different factor structure to that discovered originally [10,11]. Within an acute psychiatric population (N = 232) in the US Kertz, et al. [10] found that although the GAD-7 showed excellent internal consistency (Cronbach's ⍺ = 0.91), confirmatory factor analysis failed to support a unidimensional factor structure. The sample included patients with a diagnosis of: social anxiety disorder (n = 42), panic disorder (n = 27), obsessive-compulsive disorder (OCD, n = 25) and PTSD (n = 19). Kertz, et al. [10] found that items 5 ('Being so restless that it is hard to sit still') and 6 ('Becoming easily annoyed or irritable') loaded only moderately (0.52 and 0.53 respectively) on to the latent factor in comparison to all other items (0.64 -0.81). A unidimensional factor structure was only found to be a good fit if items 4 ('Trouble relaxing'), 5 and 6 could co-vary. Whilst the sample of each anxiety disorder was relatively small, it is suggested that the GAD-7 may perform differently in anxiety disorders other than GAD.
A larger scale study of patients receiving brief intensive CBT at a partial hospital program (N = 1,082) in the US by Beard and Bjorgvinsson [11] found the GAD-7 to have psychometric properties like those found in Kertz et al. [10] study. Of the 1,082 patients, 108 (11.7%) had a primary diagnosis of panic disorder, 96 (10.4%) had a primary diagnosis of PTSD and 89 (9.8%) had a primary diagnosis of OCD. The GAD-7 demonstrated good internal consistency across the total sample (Cronbach's ⍺ = 0.88). A rotated 2-factor structure was found to account for 70% of the variance. Within the 2-factor structure, the first factor included items 1 (Feeling nervous, anxious or on edge), 2 (Not being able to stop or control worrying), 3 (Worrying too much about different things) and 7 (Feeling afraid as if something awful might happen). The second factor included the remaining items 4, 5 and 6. This 2-factor structure supports the findings of Kertz, et al. [10] This could suggest a separate cognitive-affective and a somatic or behavior factor measured by the GAD-7, which has also been highlighted in studies of the PHQ-9.
The GAD-7 was found to have a single factor structure in predominantly primary care samples. This was not supported in a psychiatric sample and samples that included a range of anxiety disorders. A two-factor structure was found which appeared to separate GAD-7 items reflecting the cognitive and emotional experiences of anxiety (items 1, 2, 3 and 7) from items that reflected more physical, behavioural manifestations of anxiety (4, 5 and 6).
In 2009 Kroenke, et al. [6] provided evidence for an ultra-brief screening scale called the PHQ-4, which was based on a combination of the PHQ-2 ('Feeling down, depressed or hopeless' and 'Little interest or pleasure in doing things') and GAD-2 (Feeling, nervous anxious or on edge' and 'Not being able to stop or control worrying'), taking the first two items from each of the scales which have already been shown to be good for screening. Interestingly the analysis of the GAD-7 by Jordan, et al. [4] also found that the first item pair was better than almost all the others with the possible exception of the second and third item pair, but so far there has been no item response theory analysis of the PHQ-9.
Another area of interest with these scales is in the possible overlap of symptoms and items. There is clearly often comorbidity between depression and anxiety with the two co-occurring at as much as 50% of the time [12]. The strong correlation between the two scales is both evidence of convergent validity in reflecting the comorbidity of the disorders, but also might indicate the possibility of redundancy among items. In general, psychometric analysis has treated the two scales as separate and conducted two sets of analyses on the items. It would be useful to analyze the items as if they were one scale and see if there is redundancy, and also investigate the number of factors needed to explain all 16 items.
In the current study an item response theory approach, Mokken analysis, will be adopted to investigate the items and separate scales, in a similar way to which Jordan, et al. [4] investigated the GAD-7. Mokken scaling is a non-parametric method of item response theory which can be used to investigate the dimensional structure of scales. Mokken scaling is similar to Rasch scaling techniques but has the advantage of having fewer restrictions in its use [13]. Although based on Guttman scaling, Mokken does not assume error-free data. Nor does it include assumptions about the sigmoid shape of item characteristic curves that can result in the rejection of many items and so decrease the reliability of the resultant measure. Confirmatory factor analysis will also be used to investigate the factor structure of the items both as separate scales and together. Lastly, we will look at the relationship of the various scales and subscales to a simple measure of impairment in functioning, the Work and Social Adjustment Scale [14].

Method Participants
Questionnaire data from seven thousand seven hundred and sixtythree patients (38% male; 62% female) registered with an IAPT service in the North of England were examined. The data was collected between February 2009 and August 2015.

Data Analysis
Cronbach's alpha and the Molenaar Sijtsma (MS) statistic were calculated as measures of reliabilitiy. Confirmatory factor analysis (CFA) was carried out using Lavvan package in R [15]. Diagonally weighted least squares estimation with correction to means and variances was used as it is considered to be the best estimator for categorical data. The Comparative Fit Index (CFI) and Root Mean Square Error of Approximation were used as measures of fit. Mokken analysis was used to further understand the structure of the scales using the Mokken package in R [16]. Loevinger's coefficient (H) is the most important calculation in Mokken scale analysis. The basis of Loevinger's coefficient is the extent to which pairs of items conform to Guttman criteria. Scores on pairs of items should consistently be relative to one another. That is, an item that is more or less likely to be endorsed than another should be consistently so across participants. The 'difficulty' of an item refers to how easily an item of a scale is agreed with by respondents; more difficult items have lower mean scores. If the easier to endorse item is endorsed less than the more difficult item then this is a Guttman error. In this case for a PHQ-9 item, a higher depression level should lead to a higher score on the item. Loevinger

Mokken Analysis
Cronbach's alpha for the sixteen items as a scale was 0.91 and the MS statistic was also 0.91, suggesting the scale has good reliability. The individual item H values are presented in Table 1. The items appear in the table in the order that they entered into the scale. As can be seen six of the GAD-7 items were first added to the scale, with items 2 and 3 ("Not being able to stop worrying" and "Worrying too much") coming out first. The nine items from the PHQ-9 are added to the scale next, with each item having an H value above 0.36, and above the criterion of .3. As the items are added to the scale the H value of the scale decreases from 0.79 to 0.43 when all items are included. It is interesting to note that the "Become easily annoyed. " item from GAD-7 is the last item to enter. A scale which consisted of all sixteen items would have an H value of 0.43 which suggest that this is only a moderate scale ( Table 1).
The scales were then considered separately. For the GAD-7 the order of entry was the same with "Become easily annoyed" as the last item entered. The pattern of results (r s = 0.87) is similar to that found by Jordan, et al. [4] and supports their contention that using item 2 and item 3 as an alternative to items 1 and 2 in a two-item version of the GAD7 may be possible. On this occasion the H value of the scale is 0.52, suggesting a strong scale. The Cronbach's alpha of 0.86 and MS or 0.87 are also acceptable values.
Not surprisingly the order of item entry into the scale is different for the PHQ9 when analyzed separately. Here "Feeling down, depressed or hopeless" and "Little interest or pleasure in doing things" are the first two items, and these are the two items that Kroenke, et al. [6] suggest for the PHQ4. The scale overall has a lower H value of 0.47 which puts it into the moderate category. Overall the results from the Mokken analysis at this stage suggest that it is best to see the GAD-7 and PHQ-9 as two scales measuring related but different constructs.
Mokken analysis was also conducted on the four possible subscales that have been suggested for the PHQ-9 and GAD-7. In both cases these can be seen as somatic/behavioural and cognitive/affective. The

Confirmatory Factor Analysis
The results of the confirmatory factor analysis are presented in Table 2. Not surprisingly given previous results and the Mokken analysis, a one factor solution for the combined 16 items from the GAD-7 and PHQ-9 is not a good fit. Both the GAD-7 and PHQ-9 show reasonable fit to a one factor model when considered separately, but both are significantly better represented by a two factor model with items identified as Cognitive/Affective and Somatic (ΔΧ 2 = 833, p < .001; ΔΧ 2 = 660, p < .001). A model with four factors representing each of the subscales is a good fit to the data with an RMSEA of 0.06 and a CFA of 0.99, which is again significantly better than the one factor model (ΔΧ 2 (6) = 9240, p < .001). Overall the CFA supports the results of the Mokken analysis in suggesting that while a one factor solution for each of the scales is reasonable, two factors provide better fit and scale statistics ( Table 2).

Relationship with The Work and Social Adjustment Scale (WSAS)
The WSAS has a Cronbach's alpha and an MS of 0.79. The correlation between the various scales and subscales of the GAD-7 and PHQ-9 are presented in Table 3. All of the scales and subscales of the PHQ-9 and GAD-7 correlate significantly with the WSAS. The PHQ-9 is a significantly better predictor of WSAS total score than the GAD-7 (t(7760) = 15.36, p < .001). Interestingly the PHQ-4 (t(7760) = 11.96, p < .001) and also the PHQ-2 which consists of the first two items in the PHQ9, are also significantly (r = 0.50; t(7760) = 6.54, p < 0.01)) more correlated with WSAS scores than the GAD7 (Table 3).

Discussion
The pattern of results suggests that although the items from the GAD-7 and PHQ-9 can be considered as one scale; from a Mokken point of view this would be a weak scale, and also with poor fit from a more traditional classical test theory standpoint. There is much stronger evidence that they should be considered as two separate scales from both the Mokken and confirmatory factor analysis. It should be noted, however, that the PHQ-9 is not as strong as the GAD-7 from a Mokken standpoint. There is also good evidence that within the two scales, it is possible to find two subscales and three of these four subscales would be regarded as strong scales. The results from our Mokken analysis of the GAD-7 show similarity to those of Jordan, et al. [4] except that we carried the analysis one stage further by examining the possibility of subscales. It should be noted that the subscales do not appear when straightforward items selection procedures are used. All of the subscales are significantly positively correlated with WSAS. There is a significant difference in the strength of the correlation, with the PHQ-9 being significantly more correlated. Perhaps even more importantly both of the PHQ-9 subscales and indeed, the PHQ-2, have a significantly higher correlation with the WSAS than the GAD-7. In sum the results suggest that both the PHQ-9 and the GAD-7 can be used reliably when considered as two separate scales, but there may also be some use in recognizing the possible cognitive/affective and somatic subscales of each. The subscale information may prove useful for clinical purposes. For example, Elhai, et al. [17] found that, the somatic items of a depression measure were significantly more related than the cognitive-affective items to Post Traumatic Stress Disorder (PTSD) factors in Canadian military veterans.