Direct comparison of ASGE, EPAGE and alarm-based appropriateness criteria for endoscopic procedures: A retrospective audit

Background and aim: With a heavy referral burden on endoscopic services worldwide, careful selection of patients is needed to optimize limited healthcare resources. This study aimed to directly compare the performance of the ASGE, EPAGE, and alarm-based criteria and to determine the local rate of inappropriate endoscopies. Methods: A retrospective audit of consecutive medical records of patients with completed endoscopy at one Australian public hospital were reviewed (December 2014–October 2014). Indications were categorised by appropriateness using ASGE, EPAGE and alarm-based criteria, and clinical yield determined. Results: A total of 147 endoscopies (63% male, 67% outpatients) and 196 colonoscopies (50% male, 88% outpatients) were reviewed. Four percent (4%) of UGIEs and 2% of colonoscopies were inappropriate per ASGE, and 7% (UGIEs) and 10% (colonoscopies) inappropriate per EPAGE. Custom alarms-based criteria in patients suspected of FGID exhibited greater specificity than ASGE or EPAGE (Z = 3.53, p < 0.001 for each), and were as sensitive as both ASGE and EPAGE (p < 0.001 each) for UGIEs. Similarly, alarm-based criteria had greater specificity than ASGE (53% vs 11%, Z = 2.37, p = 0.018), and comparable specificity to EPAGE (55% vs 20%, p = 0.052) for colonoscopy. Conclusion: A low rate of inappropriate endoscopies was observed. Although ASGE and EPAGE performed similarly, they had different limitations. In patients with suspected functional symptoms neither ASGE or EPAGE-I appear to perform adequately. The use of an alarm-based criteria in patients with clinically suspected FGIDs may further reduce the rate of unnecessary investigations and warrants larger scale evaluation. *Correspondence to: Ecushla C. Linedale, The University of Adelaide, North Terrace, SA 5005, Australia, Tel: 61-882-225-207; E-mail: ecushla.linedale@ adelaide.edu.au


Introduction
There is a heavy referral burden in endoscopic services worldwide, as referrals continue to increase, at least in part due to colorectal cancer screening programs. It is well recognised that the yield of relevant findings is high for some indications, such as positive faecal occult blood test [1,2], whilst in other scenarios such as likely functional gastrointestinal disorders (FGIDs), there is a low relevant endoscopic yield [3].
Although current recommendations are for minimal use of invasive tests for establishing a diagnosis of a FGID, current practice is at odds with the recommendations [4], with most clinicians adopting an exclusionary approach and continuing to refer for invasive procedures [2,[5][6][7][8][9][10][11]. While fear of missed pathology is a recognised driving factor for the over-use of endoscopy [12], this approach cannot be endorsed as a sustainable model of service delivery. It is not efficient, necessary or affordable, and carries avoidable risk to otherwise healthy people.
Careful selection of patients for endoscopic procedures is needed to optimise limited healthcare resources [1].
Endoscopic "appropriateness" guidelines have been developed by the American Society for Gastrointestinal Endoscopy (ASGE) [13] and the European Panel on the appropriateness of Gastrointestinal Endoscopy (EPAGEI and EPAGEII) [14], to better target endoscopic procedures, increase diagnostic yield and improve the quality of patient care. However, both sets of criteria are recommended as monitoring/ decision-making rather than screening tools [15][16][17]. The validity of these guidelines has not been evaluated in randomised controlled trials, but a consistent substantial rate of inappropriate upper gastrointestinal endoscopies (UGIEs) and colonoscopies has been documented in observational studies worldwide [18][19][20].
139 colonoscopies and 186 UGIEs was powered to detect a prevalence of inappropriate indications of 10% and 14% for colonoscopies and UGIEs respectively, with 5% precision.
A subset of procedures performed in patients judged clinically likely to have FGID were selected for further analysis. Likely FGID was defined as the presence of longstanding (≥ 6 months), non-specific gastrointestinal symptoms (abdominal/epigastric pain/discomfort, with or without accompanied bloating, flatulence, altered bowel habit, nausea or vomiting). Procedures performed in this subset of patients were additionally categorized as appropriate/inappropriate according to locally developed custom alarms-based criteria ( Table  1). Procedures were judged as appropriate where one or more clinical alarms were present, and inappropriate in the absence of any alarms, and the subsequent yield of relevant abnormalities was determined.

Data analysis
Data were analysed using SPSS 24, and expressed as frequencies and counts. Confidence limits for the sample proportion of inappropriate indications were calculated using the Wilson method [31]. Z-scores were calculated to test for significant differences between these proportions, with significance set at p < 0.05 (two-sided). Pearson's Chi square test and Fischer's exact test were used to test for associations between appropriateness categories and clinical relevance of findings, with significance set at p < 0.05 (two-sided). Sensitivity (the ability of the criteria to identify those with clinically relevant findings) and specificity (the ability of the criteria to correctly identify those without clinically relevant findings) of the criteria were calculated for the performance of the criteria using the online calculator (http://vassarstats.net). All authors had access to the study data and reviewed and approved the final manuscript.

Ethics
As this was a clinical audit conducted retrospectively with the purpose of quality assurance/evaluation, ethical review was not necessary.

Sample description
The records of 288 patients who underwent either colonoscopy (n = 141, M 61y, SD 16), UGIE (n = 92, M 61y, SD 18) or both (n = 55, M 60y, SD 18) were reviewed. Full demographics are detailed in Table 2. Patients were mostly outpatients referred by gastroenterologists. Most UGIE and colonoscopy booking forms/medical records (60%, 61%) did not state whether prior endoscopic procedures had been performed. The procedure was specifically noted to be the initial procedure in only 8% and 7% of UGIEs and colonoscopies respectively. At least one prior endoscopic investigation was noted in 32% of UGIEs and colonoscopies. The status of the remaining procedures was unable to be determined from the medical records.
There have been only three studies which directly compare ASGE and EPAGE criteria; one in UGIE [27] and two in colonoscopy [28,29], with only one published in full [27]. Bersani et al. [27] found that the diagnostic yield for clinically relevant endoscopic findings was slightly better using ASGE than EPAGE criteria for UGIE. However, these findings have been debated due to significant methodological issues [30]. Using the same methods, Bersani et al. [29] found that the criteria performed similarly to each other for colonoscopy [29]. Adler et al. [28] report a 5-10% higher yield of relevant colonoscopy findings in ASGE and EPAGE appropriate categories, but full comparative data are not presented in the abstract and cannot be further evaluated [28].
Although the ASGE and EPAGE criteria agree on colonoscopy appropriateness in 80% of indications, disagreement occurs in a few frequently encountered indications such as uncomplicated abdominal pain and constipation [30]. Such symptoms occur frequently in people with FGID and are, in general, low-yield indications for colonoscopy. Consistent with this, a simple predictive rule based on age, alarm features and family history has been shown to be as effective as ASGE guidelines in identifying appropriate indications for UGIE (n = 8252) [21].
The rate of 'inappropriate' UGIEs and colonoscopies in Australia has yet to be assessed. The aims of this study are therefore to: 1) compare the performance ASGE, EPAGE and alarm-based criteria 2) evaluate the rate of unindicated endoscopic procedures, and 3) determine what proportion of these "inappropriate" endoscopic procedures are performed in patients clinically suspected of having a FGID.

Methods
Consecutive medical records of patients with completed diagnostic and therapeutic colonoscopies and endoscopies (Oct-Dec 2014) in one metropolitan Australian public hospital were retrospectively reviewed. Liver-related procedures were excluded. The indications for each procedure as documented on the booking form were judged appropriate/inappropriate according to ASGE [13], and necessary/ appropriate or uncertain/inappropriate using EPAGE criteria (www.epage.ch). EPAGE categories were combined and reported as appropriate (including necessary and appropriate procedures) or inappropriate (uncertain or inappropriate procedures). Where a booking form was not found, medical notes, outside referral, or procedure reports were used in lieu, in that order of priority. The clinical relevance of endoscopic findings was assessed by a gastroenterology registrar and senior gastroenterologist, and endoscopic findings classified as normal, non-contributory abnormality or relevant abnormality. Patient demographics, symptoms, symptom duration main indications, previous tests, and endoscopic/histological findings were also recorded. Referral demographics included initial source of referral (gastroenterologist, intern, surgeon, primary healthcare provider) and admission status (inpatient/outpatient). A sample size of ASGE-inappropriate, 5 ASGE-uncodeable, and 2 EPAGE-uncodeable UGIE indications were judged appropriate. There were no instances where EPAGE-inappropriate indications were subsequently judged appropriate. Summaries of the categorization of clinical indications for UGIE and colonoscopy according to ASGE and EPAGE criteria are presented in Tables 2-5.

Performance of custom-alarm based criteria, EPAGE and ASGE in clinically suspected FGID
Likely functional GI symptoms were identified on the referral in 12% (18/147) of UGIEs and 11% of colonoscopy (22/196). All these procedures were able to be categorised as appropriate or inappropriate using the locally developed alarm-based criteria. However, ASGE was unable to classify 3 UGIEs (17%) and 10 (45%) colonoscopies, and EPAGE was unable to classify 2 UGIEs (11%) and 4 (18%) colonoscopies (Table 4). In this subset of procedures, 14/18 UGIEs and almost half of the colonoscopies (10/22) were judged inappropriate using the locally developed alarm-based criteria ( Table 4).
Clinically relevant findings in patients suspected of FGIDs were seen in only 1 UGIE and 3 colonoscopies, occurring in the "appropriate" category of all 3 sets of criteria including the local custom-alarm based ones ( Table 4). The alarm-based local criteria applied to UGIEs in patients suspected of FGIDs exhibited greater specificity than ASGE or EPAGE (Z = 3.53, p < 0.001 for each), and were as sensitive as both ASGE and EPAGE (p < 0.001 each). When applied to colonoscopies in patients with clinically suspected FGIDs, alarm-based criteria had greater specificity than ASGE (53% vs 11%, Z = 2.37, p = 0.018), and comparable specificity to EPAGE (55% vs 20%, p = 0.052). Commonly encountered symptoms that are characteristic of FGIDs and yet deemed appropriate for endoscopic tests by ASGE or EPAGE (but not by local alarm-based criteria) were chronic diarrhoea (sampling of tissue or fluid, or suspected malabsorption) and persistent upper abdominal symptoms (following treatment trial, or uncomplicated dyspepsia).

Local performance
Here we demonstrated a low rate of inappropriate endoscopic procedures according to both ASGE and EPAGE criteria. Our results are on the low end of the spectrum of the published 10-40% rate of inappropriate procedures [19,[21][22][23][24]32,33], and better than published rates for gastroenterologist referred colonoscopies (2% vs 10%) and UGIEs (4% vs 14%) using ASGE [25,26]. This study is the first to assess and report the appropriateness of endoscopic procedures in Australia. The low rates of inappropriate procedures may reflect the service pressure to choose wisely [34], and the lack of financial incentives to over-investigate within a publicly funded system. This study was performed in one metropolitan hospital, and further evaluation in the larger Australian context is warranted to establish generalisability.
judged appropriate, they differed in utility. ASGE was broader in its inclusions, covering most clinical scenarios without consideration of time-frames, whilst EPAGE was more stringent and did not address therapeutic procedures (e.g. stricture dilatation, or intervention for Barrett's oesophagus) [35]. According to ASGE, all UGIEs are appropriate in patients over 45 years of age with upper abdominal symptoms irrespective of the presence or absence of clinical alarms [36]. There were however several indications which were unable to be classified by each set of criteria which were clearly appropriate according to current clinical practice, suggesting that these criteria could benefit from updating. The rigid format of EPAGE resulted in more indications being unable to be categorised. Specifically, EPAGE required flexible sigmoidoscopy results to determine appropriateness of colonoscopy for iron deficiency anaemia, however sigmoidoscopy is now rarely performed and thus, this resulted in an inability to categorise this indication. Similarly, UGIE endoscopy for caustic/ foreign body ingestion was uncodeable in EPAGE whereas they are clearly appropriate based on current data and clinical experience [37][38][39][40].
UGIE is regarded as an important diagnostic procedure for patients with upper abdominal and reflux symptoms, however, the logic is mainly due to a fear of missing significant pathology. However, symptomology/clinical alarms do not correlate well with the yield of endoscopic procedures. One study (n = 7159) has shown that less than 1% of patients with gastroesophageal reflux symptoms had Barrett's or adenocarcinoma. Similarly, a random population study in Sweden (n = 3000) found that although gastroesophageal reflux symptoms were reported in 40% of the general population, only 16% were found to have erosive oesophagitis upon UGIE whilst 6 of 20 (30%) patients with gastric ulcer and 2 of 21 (10%) with duodenal ulcer did not have any symptoms. In patients with epigastric or upper abdominal symptoms, it is generally accepted that UGIE is not needed in those with clinical diagnosis of functional dyspepsia.
A potential limitation of this study is the small sample size. The final number of UGIEs examined was not powered to detect the estimation of 14% inappropriate indications at 5% precision. However, this had negligible effect on the results or subsequent interpretation, as a precision of 6% was achieved. In addition, the number of clinically relevant findings was small, and it is therefore possible that a Type II error has occurred when examining for associations between appropriateness and clinical yield. A larger, prospective comparison of ASGE/EPAGE would be valuable.

Utility of local alarm-based criteria
When applied to patients referred with clinically suspected functional symptoms, the custom alarm-based criteria performed as well as ASGE and EPAGE in terms of sensitivity. In addition, they were more specific that ASGE or EPAGE. Furthermore, the alarmbased criteria enabled categorisation of all indications unlike ASGE or EPAGE. There were several indication categories under which potentially functional symptoms (such as chronic diarrhoea and persistent symptoms) could be coded in both ASGE/EPAGE. These categories could be viewed as "escape clauses" for over-investigating functional symptoms, resulting in more endoscopic procedures than truly necessary according to current guidelines [2].
The use of our alarm-based approach to determining the appropriateness of endoscopic investigation in patients with symptoms suggestive of functional disease may be useful to reduce the number of unnecessary investigations, freeing up valuable endoscopic resources and reducing unnecessary risk to patients. However, this subset of endoscopic procedures performed in potential FGID patients was small and further large-scale evaluation of our custom alarms-based criteria in patients with likely functional symptoms seems justified on these preliminary data.

Conclusion
The targeting of appropriate endoscopic investigations in this unit is very good, with results at the low end of published rates for inappropriate procedures world-wide. Although the ASGE and EPAGE appropriateness criteria performed similarly, both were limited in patients with possible functional symptoms, and less specific than alarm-based criteria. The use of our alarm-based criteria in patients with suspected functional gastrointestinal disorders may further   reduce the rate of unnecessary investigations, and this warrants larger scale evaluation.

Funding
This research was funded by The University of Adelaide's PhD Scholarship.