Mapping of the FKSI specific kidney disease measure onto three generic preference-based measures to generate utility values

Chronic kidney disease (CKD) is a progressive and irreversible loss of renal function, due to different causes (diabetic nephropathy, hypertension, glomerulonephritis, hereditary renal failure, pharmacological poisoning, etc.) that lead to the kidneys losing their ability to eliminate waste, concentrate urine and preserve electrolytes in the blood, progressing towards the total loss of kidney function. At advanced stages, usual treatments are kidney transplantation, hemodialysis and peritoneal dialysis, all of which have a notable impact on daily life and the quality of life of patients.

complementary information for assessing clinical deterioration. Nowadays, HRQOL is accepted as a clinical goal by itself in patients with limited life expectancy or in therapies seeking to cope with the disease or to accommodate to symptoms, typical aspects of chronic diseases. PROMs have proven to be very sensitive when we study variations in health status for a particular pathology, being this sensitivity one of the reasons for usual inclusion in clinical studies [5][6][7].
Researchers commonly prefer to use pathology-specific questionnaires in patient follow-up, due to their greater sensitivity to health changes and better targeting to the pathology under study. But when the aim is to compare results with those of other pathologies or to perform economic evaluations, it is preferred to use generic (non disease-specific) HRQOL instruments. This is not without limitations, since generic instruments may capture information on patient characteristics (such as age, comorbidities or unwanted treatment effects) which may not be relevant or might be insensitive to mild health conditions.
The most popular generic instruments (such as SF6D, EQ-5D and HUI3) offer the possibility to calculate the utility value associated with each health condition (according to the profile given by the attributes measured by the instrument), which reflects the population preference towards each state of health, in a choice situation of uncertainty. This feature allows the use of utilities in the calculation of quality of life adjusted for years of life (QALY) and in any health-economy study in general.
In real life research, it is usually the case that we prefer to use a disease specific PROM instrument instead of a generic one, and not to include one of the later, so as not to overload the patient with selfreported measures. In such cases, the usual strategy is to perform a metric translation (mapping) from the specific measurement over the generic instrument. The mapping is also of interest when we want to compare our results with those obtained using a different generic instrument or even when there was no generic instrument available (as in the studies of retrospective databases or in meta-analyses).

Objective
The objective of this study is to obtain the mapping algorithms necessary to translate the specific HRQoL measurement obtained using the FKSI specific CKD health index into three of the most popular preference-based generic instruments (SF6D, EQ-5D-3L and HUI-3). We will compare two procedures, on based on regression methods and another obtaining profiles by means of cluster analysis.
As a secondary benefit, we will be able to assess which one of the generic instruments is more adequate for capturing deterioration in HRQoL due to CKD condition.

Study design
The present study was designed as a cross-sectional prospective observational study. The sample was designed with the aim of reaching a large enough size to carry out the proposed multivariate analyses. Patients were included randomly on demand for treatment in the participating centers. A minimum target sample size of 150 patients with complete responses was determined. Patients were recruited by the collaborating therapists from the ALCER association, without limitations regarding the geographical origin, and including them as they gave their informed consent. The study protocol was approved by the Universidad Autómoma de Madrid (Spain) Research Ethic Committee. The Helsinki Declaration guidelines were met.

Participants
The following inclusion criteria were applied: both genders, age above 18 years old, being in treatment of a chronic kidney disease, no cognitive impairment, being able to answer the questionnaires on their own, and having given their informed consent.
The final sample consisted of 161 patients, 41.6% of the participants being women. Mean age was 54.6 years (SD=15.5) and with an average time from diagnosis of 2.82 years (SD=1.61). A total of 18.6% were obese, 37.3% suffered from congenital pathology (14.9% Polycystic, 16.1% Glomerulonephritis, 2.5% Pyelonephritis) and 44.7% from acquired pathology (28.1% due to diabetes, hypertension or cardiovascular accident, 2.5% due to food or drug intoxication and 1.2% due to trauma). The most common treatments were: conservative treatment (13.7%), dialysis (62.7%) and transplantation (23.1%). Form them, 34.8% were above the cut-off score for clinical anxiety, while 23.6% could be classified with a clinical level of depression. Average anger expression score was 32.6 (SD=9.45). It is worth mentioning that 88.2% were also in treatment, at least, for another comorbidity (Table 1).

Instruments
An ad-hoc data collection form was designed including the four questionnaires to be administered: three to measure generic HRQoL (EQ5D-3L, SF6D and HUI3) and a specific instrument measuring severity of CKD symptomatology (FKSI-9). In addition, Hamilton's Hospital Anxiety-Depression Scale (HADS) and the State-Trait Anger Expression Inventory (STAXI) were also administered. Three data collection forms were created, so that each one of the HRQoL questionnaires was presented first in turn, with the aim to control for any possible carryover effect among the quality of life measurements.

Questionnaire EQ5D-3L
EuroQol-5D-3L (EQ5D-3L) [9,10] is a generic instrument of HRQOL based on population preferences. It assesses the level of deterioration in 5 attributes: mobility, self-care, daily activities, pain/ discomfort and anxiety/depression; using items with 3 response levels (1=none, 2=some problems, 3=many problems). Each combination of levels creates a health profile, with a total of 243 possible health states, although not all are equally likely. The profile [11111] corresponds to perfect health and the profile [33333] represents the worst possible state of health (pits). Based on the sorting of health profiles according to social preferences, each health state is translated into a social utility value, which may be computed from the 5 attribute levels using multiattribute utility function (MAUF). Different MAUFs are used in different countries, mainly using estimates based on standard gamble, time trade-off and visual analog scale (VAS) procedures. The basic MAUF equation is additive: where the utility/preference value for health status i is obtained by subtracting from 1 the disutility of the health status. Disutility is obtained by weighting by b jk the level of deterioration k reached in dimension j (dummy variable D ijk ) plus an interaction term (N3), which adds a constant when any of the dimensions reaches its maximum level of deterioration, plus a constant (q). It should be noted that the first estimated for different countries; the state of perfect health [11111111] has a utility of 1, while the utility for the lowest level in the eight attributes [66566565] is -0.36, which is considered a health situation equivalent to worse than being dead. The MAUF for deriving the utility from a profile in the Spanish population, is given by: Where the values b1-b8, in the pits-Full Health metric, are the coefficients calculated in the Spanish population [15], which correspond to the response level attained in each one of the dimensions.

FKSI-DRS questionnaire
The Functional Assessment of Cancer Therapy -Kidney Symptom Index -Disease Related Symptoms (FKSI-DRS-9) [16][17][18] is a selfreported questionnaire composed by the first 9 items of the instrument FKSI-15, which are answered on a 0 to 4 points Likert scale assessing the level of limitation due to the symptoms of kidney diseases. This instrument was used as the disease-specific measure of deterioration. Two dimensions may be distinguished: physical and psychological. Items used to assess perceive individual change were discarded from the FKSI-15. The scoring ranges from 0 to 36 points. A higher score reflects greater deterioration [19].

Statistical analyses
The criterion score on the specific kidney disease health status was obtained computing the factor score for FKSI items, assuming level on any dimension (k = 1), represents that there is no deterioration in that dimension, D ijk =0, and the perfect health profile is anchored at a utility value of 1.

SF6D questionnaire
The medical results survey (MOS), in its 6 dimension utility form (SF6D) [11], is a generic HRQoL instrument based on preferences derived from the MOS SF-36 (36 items). It summarizes the level of deterioration in 6 dimensions: physical functioning, role limitations, social functioning, pain, mental health and vitality; using a coding in 4 to 6 levels of 11 items. It is possible to obtain a total of 18,000 health profiles, with profile [111111] corresponding to perfect health, and [645655] representing the worst possible state of health. Different MAUFs have been estimated to derive utilities in different countries, with the particularity that no constant of severity (interaction) is used. A value of 0 is assigned to the first level for each dimension/attribute [12,13].

HUI3 questionnaire
The Health Utilities Index Mark 3 (HUI-3) [14] survey covers several aspects of health, intentionally restricted to skills (physical and emotional), and excludes role performance and social interaction. It covers eight attributes: vision, hearing, speech, ambulation, dexterity, emotion, cognition and pain, using five to six response levels. Each combination of levels indicates a unique health status. The MAUF of this instrument is multiplicative and different functions have been one overall dimension (Principal Components extraction, factor score regression method), which produces a summary score with 0 mean and standard deviation proportional to the eigenvalue of the dimension.
To interpretation easier, it was re-scaled to a 0-1 metric, since the attainable minimum and maximum scores are known. The score obtained was considered the specific kidney symptomatology indicator of reported severity.
Once the specific severity indicator was obtained, a metric translation of the indicator values was performed on each of the three generic measures of HRQoL used in this study, each one separately. In this way, the predicted utility values were obtained for each generic instrument given a level of kidney symptoms severity. Several regression models, linear and non-linear, were tested and compared using various goodness-of-fit statistics.
In all regression models, the values of disutility (d i =1-u i ) were used, instead of the values of utility, for the following reasons. First, the data mass is usually concentrated around the most favorable health states with least disutilities, so that the points of greatest mass are close to the origin of the coordinate axes, the independent and dependent variables (severity and disutility) are measured in the same direction and the slope of the model is always positive. Secondly, it is always possible to estimate a model without the intersection term, anchoring the 0 value of disutility (perfect health) at the origin, and making it match with the minimum severity value of the FKSI (which will also be 0). Subsequently, it suffices to subtract from 1 the predicted disutility to obtain the model utility predicted value.
The following regression models were estimated: linear, quadratic and cubic, using the density function values; and Tobit, using cumulative values of the distribution function. To anchor the best possible health states in both instruments, symptom severity scores were scaled within the 0-1 range.
Before estimation of the different prediction models, those patients with evident outlier values in two or more of the generic instruments were discarded, since their score could be reflecting peculiarities that were not typical of the pathology under study. Outlier values were identified as those clearly falling outside the 95% individual confidence interval for the linear model (departing in more than 3 standardized residuals, Figure 1).
Along with the statistical significance for the regression coefficient estimates, goodness of fit (GOF) of each model was assessed using R 2 statistic, average absolute error (MAE) and percent average absolute error (MAPE). MAE and MAPE were computed overall and by quintile groups according to the severity scores, in order to assess the local GOF at the different levels of severity. MAE and MAPE indices should be studied with caution since very small utility values can inflate the mismatch values substantially, when dividing by quantities close to 0.
Covariates were not included in the regression models (age, disease seniority, number of treatments, comorbidities, depression level, etc.) with the aim to consider only the direct effect of the disease. In addition, the inclusion of covariates would limit the use of the models in retrospective studies in which the possible covariates could have not been gathered.
As an additional procedure, a latent profile analysis (LCP) was carried out exploring how health states summarized by the three generic instruments rank patients. It could be the case that patients are sorted differently by each generic instrument or that utility measures might show different sensitivity at different levels of severity. The disutilities of the generic instruments (HUI-III, SF-6D and EQ-5D) as well as the severity of the specific instrument (FKSI) were included as active variables in the LCP. Sociodemographic variables and disease descriptors were also included as inactive covariables in order to describe the profiles obtained.
All analyzes were carried out using IBM SPSS v23 software and LatentGold V.5.0.

Results
Observed direct scores on the FKSI renal symptoms severity scale were distributed between 0 and 29 points, with a mean M=7.   Regarding the degree of sensitivity shown by the instruments, it was observed that the EQ-5D was the least sensitive, obtaining only 36 profiles of the possible ones and accumulating 55.9% of the patients in 4 of them (11111, 11112, 11121, 11122), while the 89 profiles were obtained using the HUI-3 and 146 using the SF-6D (Table 2). Table 3 shows the percentage of patient accumulated at the different response levels of the attributes and for each one of the instruments. It can be observed that the patients tend to be located at the less severe health levels, although patients can be found in the higher levels of severity of most attributes.
The cubic model was the best fitting one for the mapping functions three, although it should be noted that the differences in fit were minimal between the models of different shape (linear, quadratic and cubic). The cubic pattern was chosen due to better represent the expected evolution of the utilities, starting at a floor value corresponding to the perfect health state (disutility = 0) and growing towards an asymptotic value at the ceiling of the scale (disutility=1) (Figure 3). Table 4 shows the coefficients needed to estimate the disutilities for the three instruments. Predicted utilities are obtained by subtracting from 1 the value of predicted disutility.
Moderate fit was attained by all models, with the SF-6D reaching the best fit (R 2 =0.619), while EQ-5D (R 2 =0.548) and HUI-3 (R 2 =0.565) were lower. However, the relative error obtained with the SF-6D model was much higher (MAPE=56.9%) than the 20% obtained by the two other models. As expected, the size of residuals stratified by quintiles turned out to be especially bad at the quintile corresponding to high utility values, that is, in less serious health conditions.
Although determining the number of clusters for this validation test is not crucial, the LCP analysis identified 4 clusters with centroids shown in Table 5. The solution reached good fit R 2 =0.87 with an error classification rate of 7%. Cluster profiles (Figure 4) show that averages are arranged in parallel (without crossings) implying that clusters are collecting groups of patients with levels of progressive deterioration in the disease (FKSI) and also in the three generic instruments of HRQOL. In the absence of crosses, we can infer that there are no other aspects of health not being considered, which might be influencing substantially the measurement of HRQoL, but those corresponding to the CKD itself. It is also true that if we would increase enough the number of conglomerates, profiles would end up showing crossings between clusters. Inspection of profiles also shows that the SF-6D tends to assign slightly higher disutility values, and the EQ-5D usually assigns lower disutility values. Progression of disutility when moving between disability strata within each instrument is rather similar for all three instruments.

Discussion
Disease specific HRQoL are the preferred choice for measuring health given their high sensitivity to changes in the patient health state (treatment effectiveness, disease progression, coping with symptoms, etc.). Therefore, using generic instruments instead implies loosing sensitivity and also involves other measurement problems since it is difficult to make the patient isolate the health aspects related only to the pathology that is being assessed. Naturally, patients have an overall view on their health state and it is difficult to filter out the effect of possible comorbidities, adverse events or the affective state. However, even if it is unadvisable to use generic instruments for an accurate assessment of the health state and, therefore, for patients follow-up, there are research situations where obtaining generic measures is       crucial. We must remember that the generic measures reflect the social value of the patient health state (compared to other possible health states) and not really their vital situation. Which is the reason why they are the measures of choice in pharmaco-economic valuations.
A possible strategy to avoid these problems would be to design preference elicitation choice experiments using vignettes based on the health conditions derived from the specific instrument, but this would not prevent from the inflation of marginal utilities due to other serious comorbidities being present. Another possibility would be to determine the generic health profiles that are really prevalent and meaningful in the particular disease, and only to mapping those conditions. This approach could be used when observed distributions are found such as that obtained for EQ-5D-3L, where a small number of health states gather together the majority of patients. However, if we intend to obtain representative results, very large samples should be used, and it could be cumbersome when the number of possible health states is very large, as has happened empirically with the SF-6D (with 127 states) or the HUI-3 (with 64 states, Table 3).
For the time being, the direct mapping of specific health states into generic utility values seems to be the most accepted option [20]. Nevertheless, another possible way to determine the mapping between generic instruments anchored by a specific instrument could be to identify empirical profiles of health states shared by groups of patients, using cluster generation procedures such as the LPA. This procedure would allow to determine as many clusters as considered appropriate, and to obtain the table of correspondences between utility values of different instruments based on the average utility value on each instrument, represented by the centroid of each cluster. We have seen that for a small number of clusters this option is possible (Table  5), but the behavior with a high number of strata (clusters) might not be as uniform as in our case, and inversions between the instruments (profile crossings) may appear, which could be difficult to understand. In fact, in our findings, the LPA solution may have been particularly insensitive to comorbidities due to the removal of extreme cases.
In our study, CKD has shown to be a quite disabling pathology, with low average utility values: M EQ =0.676, M SF =0.514 and M HU =0.673. However, we have observed a large number of patients whose scores are at the lowest level (without deterioration or with mild deterioration) in most of the attributes of the generic instruments (Table 4). It is also true that our sample, even being representative of patients with CKD in touch with patient organizations, is not a sample with a high level of deterioration since 50% of the subjects obtain scores between 0 and 5 points (from a possible maximum of 36 points).
Utility scores obtained using SF-6D and HUI-3 instruments showed to be more sensitive to CKD severity than those obtained using the EQ-5D-3L. This behavior is known and currently a new version of the EQ-5D is being developed with five levels per attribute [21,22]. In addition, the distribution of scores of the first two instruments was more disperse and they did not show a gap between perfect health and the following health profile. The observed cumulative distribution functions for SF-6D and HUI-3 disutility scores were more uniform, while the EQ-5D-3L showed a steeper function, especially at the mild health states.
In the regression models, the strategy of using factor scores to summarize CKD severity is technically preferable to the use of the score obtained directly from the algebraic sum of FKSI item scores, since each item is weighted according to its individual reliability for optimally sorting patients according to CKD severity. Furthermore, it avoids having to decide on how to sum-up the scores when building the criterion variable (disutilities) based on the response levels in each item, and minimizes a possible impact of the covariates over particular levels of response.
Although we have not considered any covariates in the prediction of disutilities, we did check for the influence of other variables in the mapping functions. Variables able to contribute in explaining additional variability present in utility scores where "number of concomitant diseases", "anxiety" and "frequency of anger situations", and also "years since diagnosis" in the case of predicting SF-6D disutilities, results departing from the inclusion of obesity, age and hypertension in the EPIRCE study [2].
The model with best fit for predicting disutility values was the cubic model. All proposed models presented the same problem, the great dispersion of the utility scores observed at the non-severe health states of the FKSI (Figure 3). But this phenomenon should not be understood as an anomalous behavior, rather it reflects the limitation of specific instruments themselves to capture the effect of covariates (that may explain the overall level of deterioration), and not so much due to the limitation of generic instruments for measuring benign health states. In fact, a not irrelevant group of patients obtained very high disutility values (probably due to other aspects of their health deterioration) but with a very low specific CKD deterioration level. Studying these cases with large residuals and low FKSI scores, we found that they were subjects with notable high levels of anxiety and depression, among other possible confounding factors. Better fitting models could have been obtained including covariates not specific to CKD (such as age, psychological health, comorbidities, type of treatment, etc.), but this would lead to a limited applicability of models to other data sets and, subsequently, the mapping models would not be generalizable.
Our study on the behavior of utilities in subpopulations of cases produced the stratification of the sample by levels of severity. The clusters corresponded to strata of patients with progressive levels of deterioration, in which all instruments showed a similar progression, both generic and specific. Although the technique used is very sensitive to the presence of atypical cases, the solution obtained discriminated levels of deterioration but not the presence of this type of cases (perhaps due to the previous filtering of outliers).

Conclusions
The mapping of disease-specific instruments into health related generic measures is a common methodological strategy which takes advantage of the high sensitivity of specific instruments and the broad generalizability of generic measures. It was shown that it is possible to map CKD specific FKSI scores into generic disutilities (SF-6D, HUI-3 and EQ-5D-3L), achieving adequate goodness of fit values and an acceptable amount explained variance (between 55 % and 62%).
The supremacy of the cubic model was not very evident, since the MAPE values of the different models were very similar. The similarity of the models is due to the lack of fit obtained by all of them at low values of disutility (best health states). This is an inherent problem for generic instruments, which have shown to capture health impairments not attributable to the specific deterioration measured by the FKSI.
Our results allow transferring the values of CKD impairment onto the utility attributed by society to those health states, as they are appraised by the three HRQoL instruments most frequently used in research.

Limitations
The present study has been carried out in the Spanish population and it is possible that cultural biases might be present.