Multidimensional population attributable fraction: Independence assumption as a source of bias

Several studies have assessed in detail the combined effects of two RFs on particular outcomes; however, for most situations the relevant information is not available. Several scenarios were simulated considering the following variables (at the following levels): i) steepness of the increasing RRs of RFs when increasing the level of exposure (two levels: “low” slope and “high” slope), ii) level of correlation between exposures to RFs (three levels: “negative”, “zero”, “positive”), and iii) linearity of the functional relationship between RRs values and exposure to RFs (five levels: “high concavity up”, “low concavity up”, “lineal”, “low concavity down”, “high concavity down”). For these scenarios’ biases were quantified when the (usual, convenient, and possibly wrong) assumption of independence is made. The results of this study could be useful for assessing the quality of PAF estimations in real life situations, when two RFs are considered simultaneously. *Correspondence to: Armando Humberto Seuc, National Institute of Hygiene, Epidemiology and Infant Microbiology, Cuba, E-mail: armandoseuc@infomed.sld.cu


Introduction
The Population Attributable Fraction (PAF) is defined as Where P(D) is the (unconditional) probability of disease over a specified time period, and P(D/Ē) is the probability of disease over the same time period conditional on non-exposed status (not exposed to the risk factor under study). The PAF is an epidemiologic measure widely used; it is the difference between overall average risk of the entire population (both exposed and unexposed people) and average risk in the unexposed, expressed as a fraction of the overall average risk [1,2].
The PAF is frequently interpreted as the proportion of disease risk or incidence that could be eliminated from the population if exposure were eliminated. The expectation is that the PAF has a practical value for those interested in public health prevention policy, particularly when dealing with an exposure that is modifiable [2]. Some authors have made a point against "relative" measures as PAF and in favour of "absolute" measures [3], an issue that has been addressed in other contexts [4].
Some of the problems around PAF are related to the concept of "causal attribution", and to potential biases when estimating single risk factors' PAFs under different study designs, and/or when using different counterfactuals [1,3,[5][6][7][8]. However, significant limitations still exist with current strategies for estimating a PAF in multiple risk factor situations [9]; e.g. simplified approaches for PAF calculations under different study designs scenarios have been explored [10], but discussions on new sources of potential bias that may arise, particularly the one related to the (usually made) assumption of independence (no interaction in a multiplicative model using risk ratios as effect measure [11,12], between risk factors, are limited as far as the authors know. A few comments by Ezzati et al. [13,14] about biases when estimating multiple risk factor PAFs (reproduced below) are the main motivation of this paper.

Potential bias when assuming independence
The contribution of a risk factor to disease or mortality, relative to some alternative exposure distribution, defined as the proportional reduction in population disease or mortality that would occur if exposure to the risk factor were reduced to an alternative exposure distribution, is defined by the generalized potential impact fraction (PIF) in equation (2). The alternative (counterfactual) exposure distribution used is one that would result in the lowest population risk, referred to as the theoretical-minimum-risk exposure distribution Where RR(x) is relative risk at exposure level x; P(x) is population distribution of exposure; P'(x) is alternative or counterfactual distribution of exposure; and m is the maximum exposure level.
In equation (2), RR, P, and P' can be joint relative risks and exposure distributions for multiple risk factors (i.e., x can be a vector of risk factors) [13].
To estimate the joint effects of two RFs with continuous exposure variable, each integral in the PIF relation can be replaced with where subscripts 1 and 2 denote the two RFs and P is the joint distribution of the two exposures [13].
In equation (3) we see that independence (no interaction in a multiplicative model [11,12]) is assumed, i.e, The paper of Ezzati et al. [13] goes on: "Because individual RRs are non-linear functions of exposure, and joint RRs the product of such nonlinear RRs, positive correlation between risk factors would usually imply a larger PAF than zero correlation (for some RRs, submultiplicative effect modification could result in slightly smaller PAF even with positive correlation). In turn, PAF with zero risk-factor correlation would in general be larger than negative correlation. Similarly, for categorical risk factors, positive correlation would in general result in larger PAF. " Because these statements are relevant to assess the potential bias introduced when different unknown epidemiological scenarios are faced in practice, and corresponding assumptions are made, we consider it useful to explore in a more comprehensive manner the analytical relationships between PAF values on one hand, and functional relationships between exposure levels and RRs values, degree of correlation between RFs, and level of independence (multiplicativity [12]) between RRs, on the other hand. We have not seen works where these potential biases had been assessed in detail.
The magnitude of the biases in PAF estimations under these different scenarios, when independence is assumed, could be useful in practice because would help researchers to have an idea on the quality of PAF estimations, previous identification of which of the scenarios is closer to the actual situation they are dealing with.

Scenarios for bias assessment
For simplifying reasons, we considered just two RFs, RF1 and RF2 each with 3 equidistant exposure levels (e.g. "low", "medium", "high"), and also assumed the same functional relationship between exposure levels and RRs for both RFs.
Functional relationships between exposure and RRs have been modelled before [15]; in this study they are represented in terms of (global) slope (always positive) and level of concavity (up, lineal, down), as from the three equidistant exposure levels. Low slope values range from 1 to 0.27 when moving from up high to down high concavity; high slope values range from 1.5 to 0.60, also from up high to down high concavity. See table 1 for the particular combinations of slope and concavity values used. Expressions used for calculating the slope and the level of concavity are presented in the Annex.
Finally, the level of independence (no interaction in a multiplicative model, [11,12]) between RRs of RFs was set at five levels, one at the reference "100% independence", two over-multiplicative levels, "150% independence" and "200% independence", and two sub-multiplicative levels, "80% independence" and "60% independence"; the level of independence, IND is defined as for i, j ≥ 2; i, j the levels of exposure of the RFs. RR i,j is the combined relative risk for the i -th level of RF1 and the j -th level of RF2 with respect to the combined reference level 1,1; RR1 i is the relative risk for RF1 of the i -th level of exposure with respect to the reference level 1, similarly for RR2 j and RF2. It should be noted that the assumption of independence in this multiplicative model does not necessarily mean lack of biological interaction (see e.g. [11,12]).
Several studies have assessed the combined effects of two specific RFs on particular outcomes (see some references in [14]; however, for many other situations the relevant information is not available. The scenarios considered in this work were designed as to facilitate the bias assessment when the (usual and convenient) assumption of independence is (necessarily) made, controlling for the other factors just mentioned which are easier to know as they depend not on the combined RRs but on the combined prevalence of the two RFs.

Results
The figures 1-3 present the PAF values for the three correlation levels between RFs ("positive", "zero", and "negative") and the two slope levels ("low" and "high"). In each of the Figures the PAF values (vertical axis) for the 5 independence levels (different lines) and the five concavity levels (horizontal axis) are shown.
It is seen that the impact of the independence assumption on the PAF values decreases from figures 1-3; moving correlation between RF from positive to negative decreases the impact of the independence assumption (bias decreases), and moving slope levels from low to high also decreases the impact (bias decreases). Within each of the graphs it is seen that moving from high up to high down concavity (of the relationship between exposure levels and RRs values), increases the impact of the independence assumption (bias is larger when independence is assumed).
In other words, the smallest impact of the independence assumption on PAF values is observed when correlation between RFs is negative, and the functional relationship between exposure levels and RRs values has high up concavity and high slope values; the largest impact is observed when correlation between RFs is positive, and the functional relationship is characterized by a high down concavity with a low slope.
Percentage biases for the 6 scenarios in figures 1-3 are presented in tables 2-4. Table 5a presents correlation between RFs "alcohol consumption" ("teetotaler", "light", "moderate" and "heavier") and "smoking" ("currently non-smoking" and "smoking"), and table 5b the combined relative risks of the two above RFs for type 2 diabetes [16], observed and under the independence (in a multiplicative model) assumption.

Illustration
It is seen that the observed relative risks are close to independence, except the one corresponding to "currently smoking yes" and "alcohol consumption heavier"; the ratios of observed/expected are 109%, 115%, and 58% for cells (2,2), (2,3), and (2,4) respectively. Spearman's rank correlation coefficient between RFs is 0.129, i.e. close to zero. It is also seen that concavity is close to zero (-0.205), and global slope also close to zero (-0.105).

Discussion
This study made several assumptions that limit the generalizability of its results. First, it considers only two RFs, each of them with three levels of exposure and having the same, and a simple functional relationship between these levels of exposure and the corresponding RRs in terms of slope and concavity.
It has been reported that the biases introduced from wrong assumptions when estimating PAF are in general not important (pp. 2173, [14]); however, our results show that this is not always the case.
Three levels were used for the statistical correlation between exposures of the RFs as a source of bias for the estimation of PAF, though it has been reported this does not have an important effect on the final estimation (pp.2172, [14]). From our results in Tables 2-4, correlation between exposures to RFs does seem to be non-negligible.
Sub-multiplicative models are much more frequent than over-multiplicative models [14,17] so the two levels below 100% independence (80% and 60%) should be more useful in practice than the two levels above (150% and 200%).
In summary, for simple practical situations we might find the scenarios studied in this paper useful to assess approximately the range of biases introduced when the (convenient) independence assumption is made.

Annexure
With n = number of exposure levels; x i = 1,2,…,n; y i = RR1, RR2,…, RRn a) Formula for the slope: b) Formula for the level of concavity: this concavity expression provides an average of the changes in slopes for consecutive pairs of points (levels of exposure to the RF For RRs of "alcohol consumption": slope=-0.105; concavity=-0.205 Table 5b. Combined Relative Risks (RRs) for type 2 diabetes: observed and under independence assumption in women [16] Copyright: ©2019 Seuc AH. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.