Efficacy of EEG neurofeedback in psychiatry: A comprehensive overview and meta-analysis

Background: This article provides a comprehensive overview of studies investigating the efficacy of EEG neurofeedback in the treatment of psychiatric disorders. Method: Only studies comparing neurofeedback to a control group (passive/semi-active, placebo, or drug treatment) were included. Effect sizes were calculated for individual studies and when possible combined in meta-analysis (Hedges’s g). Results: We retrieved 30 studies including 1171 participants, evaluating neurofeedback for ADHD, autism, OCD, GAD and depression. For ADHD, combining nineteen trials in meta-analysis yielded small to medium effect sizes for symptoms of inattention, hyperactivity and impulsivity. Subgroup analyses showed that neurofeedback was superior to passive/semi-active treatment (medium effects), while efficacy was similar to placebo (only one study) and drug treatment. For ASD, combining five studies resulted in a superior effect of neurofeedback in reducing general symptomatology; subgroup analyses showed that neurofeedback was more effective than passive/semi-active treatment (four studies) and placebo (based on a single study). Three OCD studies showed varying results, depending on the type of control group used. Two GAD studies found neurofeedback to be similar or inferior to EMG biofeedback. One study on depression showed a large effect for neurofeedback when compared to semi-active treatment. Conclusion: Although 30 studies could be included, our review of the literature reveals serious limitations of the body of research currently performed. Therefore at present, it cannot be concluded that EEG neurofeedback can be regarded as an evidence-based treatment for ADHD, ASD, OCD, GAD and depression. Large, well-designed studies are needed to elucidate whether neurofeedback is a viable treatment option in the field of psychiatry. Correspondence to: M.J.H. Begemann, MSc, Department of Psychiatry, University Medical Center Utrecht (UMCU), Heidelberglaan 100, 3584 CX Utrecht, Netherlands, Tel: +31887556370; E-mail: M.J.H.Begemann@umcutrecht.nl


Introduction
Neurofeedback was originally described as a method in which specific frequency bands of the electroencephalographam (EEG) are used to train the electrical activity of the brain through biofeedback. This operant conditioning of selected brainwave frequencies is achieved by giving real-time audio and/or visual feedback cues. The general rationale behind neurofeedback is that this conditioning will be related to behavioral improvements.
The interest in EEG neurofeedback over the last 30 years can be understood in the light of accumulating research on the electrophysiological basis of various psychiatric disorders, such as Attention Deficit Hyperactivity Disorder (ADHD), Autism Spectrum Disorder (ASD), schizophrenia, Obsessive Compulsive Disorder (OCD), anxiety, depression, Tourette syndrome and anorexia nervosa [1]. A voluminous literature describes the robustness of EEG abnormalities found in a high proportion of psychiatric patients and the clinical implications [2], depending on the psychiatric disorder targeted. As the technique is non-invasive and side-effects such as headache or fatigue due to the attentional demands are minimal [3], EEG neurofeedback has been discussed a promising alternative, nonmedical treatment option [4].
Moreover, functional magnetic resonance imaging (fMRI) has rapidly emerged as an alternative technique for neurofeedback protocols [5]. Similar to EEG, fMRI provides an indirect measure of neuronal activity, by recording the hemodynamic response in the brain -known as the blood oxygenation level-dependent (BOLD) signal 5 . While the spatial resolution is higher than EEG, the temporal resolution is much lower. Following the development of fMRI-based neurofeedback protocols, the interest in the methodological and clinical aspects of EEG neurofeedback is now renewed [5].
To evaluate whether EEG neurofeedback training constitutes a viable treatment method in the field of psychiatry, this article provides a comprehensive overview of studies that have investigated its therapeutic efficacy by comparing EEG neurofeedback to a control group. Studies are quantitatively summarised and combined in metaanalysis where possible. statement (www.prisma-statement.org/ statement.htm). A systematic search for studies published in English, peer-reviewed journals was performed in PubMed, Embase, PsychInfo, ClinicalTrials.gov, and the Cochrane Database of Systematic Reviews, using combinations of the following basic search terms: "neurofeedback", "EEG biofeedback", "neurotherapy", "Slow Cortical Potential", "SCP", in addition to psychiatric diagnosis: ADHD, ASD, OCD, Generalized anxiety disorder (GAD), panic disorder, Post-Traumatic Stress Disorder (PTSD), depression, bipolar disorder substance abuse, Tourette syndrome, anorexia nervosa and schizophrenia. Reference lists of retrieved articles and relevant review articles were examined for cross-references. Search cut-off date was January 2 nd , 2015.
Articles selected for inclusion met the following criteria: 1) Studies using between-subjects or cross-over design, with a passive or semi-active control group (such as waiting list, EMG biofeedback or cognitive training), a placebo condition (sham treatment), or a drug therapy control group.
3) Studies reported sufficient information to compute common effect size statistics or authors could supply these data upon request. 4) Pilot studies that were later continued, resulting in another paper with a larger sample size, were excluded to avoid including the same patient more than once.

Calculation of effect sizes
Two reviewers independently extracted data, disagreements were resolved by consensus. Hedges's g was used to quantify effect sizes (ES) for the mean difference between change scores (end of treatment minus baseline) of the neurofeedback group versus control group. Change scores were preferred over pre-and post-treatment scores to avoid overestimation of the true effect size because of the pre-and -post-treatment correlation. If not reported, pre-and post-treatment means and standard deviations (SDs), or exact F, t or p values were used. Effect sizes were interpreted according to Cohen [9], with an ES of 0.2 indicating a small effect, 0.5 medium, and >0.8 a large effect. When a study compared neurofeedback to both waiting list and a semiactive treatment, the most stringent (i.e. semi-active) control group was used as a reference. Parent ratings were preferred over teacher ratings. Results were combined in meta-analysis when two or more studies were available using similar outcome measures. To differentiate between various methodological designs we also performed subgroup analyses, grouping studies into: (1) those with a passive/semi-active control group, such as waiting list, EMG biofeedback or cognitive training, (2) those with a placebo condition, i.e. sham treatment, and (3) studies comparing neurofeedback to drug therapy. A random effects model was deemed most appropriate for this research area given the heterogeneity in applied methods [10]. To investigate whether studies could be taken together to share a common population effect size, the homogeneity statistic I 2 was calculated [11]. Ranging from 0 to 100%, I 2 reflects which proportion of the observed variance reflects differences in true effect sizes rather than sampling error. Values of 25%, 50%, and 75% can be interpreted as low, moderate, and high, respectively [11]. Moreover, it is important to investigate potential outlier studies, defined as standardized residual z-scores of effect sizes exceeding ± 1.96 (p<0.05, two-tailed). As the number of feedback sessions was expected to vary between studies, random effects meta-regression analyses were conducted to evaluate this as a moderator variable using the unrestricted maximum likelihood model. When interpreting meta-analytic outcomes, the possibility of an upward bias of the calculated effect sizes due to the omission of unpublished, nonsignificant studies must be taken into account [12]. Potential publication bias was investigated by means of a visual inspection of the funnel plot, with an asymmetrical plot indicating publication bias. Egger's test [13] was evaluated when appropriate (i.e., analysis included a range of study sizes, with at least one of 'medium' size (p<0.05 two-tailed). Moreover, the fail-safe number of studies (N R ) was calculated, providing an estimate of how many unpublished nullfindings would be needed to reduce an observed overall significant result to nonsignificance. As a guideline, the fail-safe number should be 5k+10 or higher (k=number of studies in a meta-analysis) to rule out a file drawer problem [12]. All calculations were executed using Comprehensive Meta-Analysis Version 2.0, Biostat [14,15].

Quality check
Evaluating the quality of conducted studies contributes to improved study design, implementation and reporting by researchers. Therefore, randomization procedures, blinded outcome assessments, and indications of sponsoring bias were evaluated. Randomization was qualified as high when all participants were randomly assigned to one of the study groups, and low if (a part of) the participants were not randomly assigned. Furthermore, blinded outcome assessment was qualified as high when raters were blind, compared to a low rating when raters were not blind to treatment allocation. If the acknowledgement section mentioned sponsoring contributions from institutions with connections to neurofeedback materials in general or, in the case of drug-controlled studies, contributions from the pharmacological industry, the qualification was rated as low. If there were no institutions involved that could benefit from the outcome, qualification was rated high.
Eighteen studies were open-label, six used double-blind ratings. Details on methodological design, number of participants, applied neurofeedback protocol, outcome measures and calculated effect sizes for the individual studies are described in Tables 1 to 3.
As Arns and colleagues [51] did not find differences among different neurofeedback protocols in a previous meta-analysis, EEG protocols Design of included studies were combined (i.e., sensorimotor rhythm [SMR] enhancement, beta enhancement with theta suppression, training of slow cortical potentials, SMR/theta and beta/theta training, Table 1). Duric et al. [32] included two different neurofeedback groups: one receiving neurofeedback, the other combining neurofeedback with drug therapy. Data from the first group (neurofeedback only) were included, as the majority of neurofeedback-receiving participants in the other included studies were unmedicated. Duric [32] did not report exact SDs, these were calculated using the 95% confidence intervals (SD=√N*[upper limit-lower limit]/3.92).
The following outcomes were evaluated (Table 1): 1) Inattention: behavioral rating scales, if not available, omission errors/attentional performance on a computer task.
3) Impulsivity: commission/impulsivity errors on a computer task, for Drechsler et al. [19] we used rating scale data, as the two groups showed a significant baseline difference on the Go-Nogo task.

ADHD: inattention
Eighteen studies were included, with 850 participants (Table 1). Neurofeedback showed superior efficacy, with a medium ES of 0.38 Table 1. Summary of studies evaluating the efficacy of neurofeedback in ADHD. Comprehensive Clinical Care and Ritalin as additional therapy for both groups; b Based on additional data provided by author; c 23% of 58 children that completed the diagnostic study procedure were medicated; study is in progress, therefore not all have completed end-of-treatment assessments; d Based on number of patient at baseline (including drop-outs during study) Significant effect sizes are indicated in bold type.

Autism spectrum disorders (ASD)
Five studies [35][36][37][38][39] were retrieved for ASD, including 130 patients ( Table 2). Three studies were open-label, only one randomized double-blind placebo-controlled trial could be included. Although neurofeedback protocols differed greatly between studies, metaanalyses were conducted to provide overall effect sizes. Effects on general symptomatology were evaluated (Table 2), as rated on a behavioral rating scale. Data from the Auti-R as reported by Kouijzer et al. 2009 [36] were insufficient, therefore the Children's Communication Checklist (CCC-2) was used -total scores were calculated by averaging the ten subscales (each consisting of seven items, SD=∑SD/√number of subscales).

Obsessive compulsive disorder (OCD)
Three studies [40][41][42] were included on OCD, with 102 patients ( Table 3). The randomized single blind study by Deng et al. [42] investigated neurofeedback combined with medication and cognitive behavioral therapy (CBT), compared to treatment with medication and CBT only. Barzegary et al. [40] compared neurofeedback with waiting list as well as a medication treatment group, in a randomized openlabel study. Kopřivová et al. [41] was the only randomized doubleblind placebo-controlled study. Effects of neurofeedback on general symptomatology, and obsessions and compulsions separately, were evaluated using behavioral rating scales. NF vs. drug therapy: Barzegary et al. [40] found no differences in efficacy between neurofeedback and drug therapy (ES -0.89, p=.176).

GAD: state anxiety
NF vs. semi-active treatment: Agnihotri et al. 2007 [44] found alpha-enhancement neurofeedback to be inferior to EMG biofeedback in reducing state anxiety, with a large negative effect size of -2.44 (p<.001).

Depression
Only one study [45] was retrieved for depression (Table 3). Choi et al. 2011 [45] randomly assigned participants to neurofeedback (N=12) or a semi-active control group (N=11), patients and investigators were not blind to treatment allocation. Neurofeedback was superior to psychotherapy training, with a large effect size of 0.92 (p=.030).

Meta-Regression
When combining studies for ADHD and autism, significant heterogeneity was detected. A priori, it was assumed that interstudy differences in the number of feedback sessions could possibly explain observed variance between studies. Indeed, number of applied neurofeedback sessions differed greatly, ranging from 20 to 50 sessions. However, meta-regressions conducted for ADHD (inattention, hyperactivity and impulsivity) did not show a significant association between the number of feedback sessions and obtained effect sizes, nor in the subgroup analyses where studies were divided into the different types of control groups. Similarly for autism, meta-regressions did not show significant associations between the number of sessions and calculated effect sizes.

Quality check
Assessment of the methodological quality of the included studies can be found in Table 4. Six of the nineteen trials on ADHD did not randomize participants to the different conditions. In only nine studies, raters assessing symptom severity were blind to the subjects' treatment allocation. There were no indications for sponsoring bias in the majority of trials. For the study by Li et al. [22], two authors had competing interests as they had received funding from profit organizations. Three articles did not report acknowledgements.
When evaluating the five studies on autism, three studies were randomized, raters were blind to treatment allocation in two studies. Four articles acknowledged that neurofeedback equipment was donated or shared by an external company, one study did not include an acknowledgement section. All studies on OCD, GAD and depression were randomized, while raters were blind to treatment allocation in only two studies. Quality with regard to sponsoring bias was rated high for three trials, while the remaining articles did not include an acknowledgements section.

Discussion
We included 30 studies with 1171 participants in total, evaluating neurofeedback as a treatment method for ADHD [16][17][18][19][20][21][22][23][24][25][26][27][28][29][30][31][32][33][34], autism [35][36][37][38][39], OCD [40][41][42], GAD [43,44] and depression [45]. Our review of the literature reveals serious limitations of the body of research currently performed on this topic. The large majority of neurofeedback studies have at least one major methodological limitation such as lack of randomization, non-blind designs and use of waiting list control conditions, as evidenced in our quality check. Studies including a sham EEG feedback control group, accounting for the non-specific effects of EEG neurofeedback training, were sparse. Also, sample sizes were too small. To detect a medium effect size of 0.5, a minimal sample size of 64 per group is needed (alpha error 0.05, power of 80%). This criterion was not met by any of the included studies, with median group size being 15 subjects (ranging from 4 to 51 patients). Underpowered studies carry the risk of both false positive and negative findings, and are more likely to be affected by publication bias, selective data analysis and selective reporting of outcomes [52]. These important shortcomings pose a limitation to the results of published studies in this field, making it impossible to draw any conclusions regarding the efficacy of neurofeedback based on the current literature. The results should therefore be interpreted with caution.

ADHD
Nineteen studies were retrieved for ADHD, including 872 patients. Neurofeedback showed small to medium effects on inattention, hyperactivity and impulsivity. Subgroup analyses showed that neurofeedback training was superior to waiting list/semi-active treatment for all symptoms evaluated (medium effect). However, the only placebo-controlled study by Van Dongen-Boomsma and colleagues [27] showed that the effects of neurofeedback on ratings of inattention did not differ from sham treatment, nor for combined ratings of hyperactivity/impulsivity symptoms (not included in current meta-analysis: ES 0. 36, p=.25). Effects of neurofeedback training were similar to drug therapy, currently the gold standard in ADHD treatment. Given the methodological shortcomings of most included studies however, these findings must be interpreted with great caution.
First, as stressed in a recent meta-analysis by Micoulaud-Franchi and colleagues [53] (updating Sonuga-Barke et al.) [54], the evidence supporting EEG neurofeedback for ADHD is influenced by the (probable) blinded status of the assessor. They only included randomized controlled trials and while positive effects were found on symptoms of inattention in both probably unblinded (parents) and probably blinded (teacher) ratings, the superior effect of neurofeedback on hyperactivity/impulsivity was only significant in the probably unblinded parent assessments. Furthermore, we could retrieve only one randomized double-blind trial [27] that actually included a sham EEG feedback control group, showing no difference between EEG neurofeedback and sham treatment. The other methodologically sound study by Arnold et al. [50] was not included as reported data were insufficient to calculate ES, but this RCT also failed to show superior effects of neurofeedback. A study by Logemann et al. [55] found similar placebo effects when evaluating ADHD symptoms in a student population. Finally, it must be pointed out that not all of our calculated significant effect sizes were confirmed by a large fail-safe number of studies. According to Rosenthal [12], fail-safe N R should be 5k+10 or higher (k=number of included studies). While the effect of neurofeedback treatment on inattention was accompanied by a large fail-safe N R of 124), this number was substantially smaller after removal of one outlier study (N R =35). Fail-safe N R was also small for the positive effect of neurofeedback on hyperactivity and impulsivity (18 and 11, respectively). Overall, given the major methodological limitations of most included studies in addition to the possible mediating role of nonspecific (i.e., placebo) effects, our findings currently cannot confirm the clinical efficacy of neurofeedback for ADHD.

ASD
Five studies including 130 patients showed a large significant effect on general symptomatology. Importantly however, fail-safe N R was only 18. Neurofeedback was superior to passive/semi-active treatment (four studies, although N R was only 11). The single sham-controlled study [39] also showed a large superior effect. However, the same limitations as noted for the ADHD literature apply to studies on ASD, with the added remark that median sample size was even smaller in this field (10). Our meta-analyses primarily relied on comparison of neurofeedback to waiting list, which is more susceptible to placebo effects and only two studies were conducted in a randomized doubleblind fashion. We therefore conclude that the efficacy of neurofeedback in the treatment of ASD is not sufficiently supported by the trials conducted till now.

Other psychiatric disorders
The few studies on OCD, GAD and depression had very small sample sizes, ranging from 4 to 37 participants per treatment condition. Results for the three studies on OCD depended on type of treatment used as comparison. Neurofeedback was superior to waiting list in reducing general symptomatology (one study). When rating obsessions and compulsions separately, neurofeedback was superior to waiting list but similar to drug therapy (one open-label study). The only placebocontrolled trial found a large effect for neurofeedback in reducing compulsions but not obsessions. For GAD, alpha-enhancement (two studies) and alpha-suppression training (one study) were similar to EMG biofeedback (two studies) in reducing trait anxiety. Moreover, alpha-enhancement training was inferior to EMG biofeedback when evaluating state anxiety (one study) size. For depression, the only randomized open-label study included showed a large effect for neurofeedback compared to psychotherapy training.
Taken together, few studies have evaluated the efficacy of neurofeedback in the treatment of OCD, GAD and depression, with very small sample sizes. Only one randomized double-blind study was included. As the found results are inconclusive, future trials are needed to assess the clinical utility of neurofeedback training in the treatment of these three disorders.

Limitations
Although pioneer studies investigating EEG neurofeedback as a treatment for psychiatric disorders were already conducted over 25 years ago, the majority of studies published so far have important methodological shortcomings. The lack of standardization amongst neurofeedback trials is problematic, as also highlighted by Schoenberg & David 2014 [56], with very few trials aiming to replicate previous results. We found that type of control group differed greatly between studies. Generally, neurofeedback was superior to waiting list or a semi-active control group, while efficacy did not differ from sham treatment (although only two placebo-controlled trials could be included). Therapeutic effects were mainly similar to medication therapy. Although the number of applied neurofeedback sessions also varied greatly between studies (ranging from 20 to 50 sessions), meta-regressions did not show significant associations. Studies also used different outcome measures including interviews, rating scales or computerised tests. Furthermore, surprisingly few articles reported the number of responders and non-responders, i.e. which participants gain control over their brain activity and which do not. This information is essential when trying to relate improvements in self-regulated brain activity to clinical outcome [57]. As suggested by Zuberer et al. [57], the treatment process and learning of EEG self-regulation should be carefully analysed when investigating the efficacy and specificity of neurofeedback. Moreover, Arns et al. [58] found that clinical outcome was improved when personalizing neurofeedback training to the individual qEEG. Implementation of this technique as a treatment method for psychiatric symptoms therefore requires good clinical practice, and careful implementation and evaluation of neurofeedback training during treatment sessions is essential.
The large number of studies not meeting our relatively lenient inclusion criteria stresses the fact that systematic, well-designed intervention studies are lacking. Given its mild side effect profile, neurofeedback is widely used to treat psychiatric disorders, in particular children with ADHD or ASD. Although neurofeedback is non-invasive and side-effects such as headache or fatigue due to the attentional demands are indeed minimal [3], individuals can experience somatic complaints such as nausea, muscle twitches, sleep disturbances, OCD like symptoms, agitation, or even seizure [59]. Children may skip school hours to attend neurofeedback sessions. In this light, the risks of subjecting individuals to a treatment method that is not yet evidence based can be more than only a waste of time and finances, as it may also extend the time until effective treatment is started. As recently noted by Holtmann et al. [60], placebo-controlled trials could provide strong evidence for the efficacy of neurofeedback treatment. Although several issues have been raised about the use of sham treatment, including ethical concerns and feasibility problems, large studies comparing neurofeedback to an adequate control condition are needed to assess whether EEG neurofeedback is solely responsible for observed positive effects on symptomatology and cannot be attributed to non-specific factors associated with placebo effects.

Conclusion
In sum, the lack of methodologically sound studies prevents evidence-based conclusions on the efficacy of EEG neurofeedback in the treatment of ADHD, ASD, OCD, GAD and depressive disorder. It is paramount that future studies are carefully planned and executed,