Insertion Element IS6110 based characterisation of Nepalese tuberculosis strains into different genetic lineages

Nepal is geographically located between India and China, a region containing significant Tuberculosis (TB) and Multi-Drug Resistance (MDR-TB) burdens. However, limited information is available on the phylogenetic diversity of Mycobacterium tuberculosis (Mtb) in Nepal. To gain further insight into the diversity of Mtb in Nepal, consecutive clinical samples from 176 newly diagnosed pulmonary tuberculosis patients were collected from two hospitals in Nepal. Insertion Site IS6110 Fluorescent Amplified Fragment Length Polymorphism (FAFLP) PCR and rpoB sequence analysis were carried out on genomic DNA extracts of cultured strains to assign them to accepted genetic lineages and identify MDR-TB. In this study, the IS6110 based characterisation showed a prevalence of 36.36% Central Asian Strain (CAS), 18.75% Beijing, 7.95% Haarlem, 3.97% X, 2.2% each of Latin American Mediterranean (LAM), T-Uganda and T, 1.7% S and 24.4% were unassigned. Further, 3.9% of total M. tuberculosis isolates were of rifampicin resistant genotypes thus indicating that the prevalence of MDR could be higher than the country wide prevalence of MDR among new TB cases (2.2%) as reported by the national drug resistance survey carried out in 2011/2012. Correspondence to: Catherine Arnold, Genomic Services and Development Unit, Public Health England, London, UK, Tel: 0208-327-6068; E-mail: Catherine.arnold@phe.gov.uk


Introduction
TB is ranked as the sixth leading cause of death among the top 20 causes of death in Nepal. According to National Tuberculosis Control Programme (NTCP) in Nepal, in 2014 37,025 TB cases were registered and among them 15,947 (43%) cases were new sputum smear positive TB cases. It was estimated by WHO [1] that 4.6 (2.1-7.5) thousand people in Nepal died from TB in 2014. Even though short course TB drug treatment regimen could cure around 89% of cases, TB mortality was still unacceptably high in Nepal. Since 2006, the STOP TB strategy has been adopted by NTCP. However Drug resistant TB (DR-TB) still threatens national TB control and is a major public health concern. The proportion of MDR-TB cases in new cases was 2.2% and retreatment cases were 15.4%. Even though the Millennium Development Goal (MDG) to halt and reverse TB incidence has been achieved in all six WHO regions, work remains to be done to prevent the deaths from this dreadful disease [1].
The identification of the number and position of Insertion Sequence IS6110 elements in the Mtb genome has been widely used as a genomic tool for the rapid fingerprinting of isolates of Mycobacterium tuberculosis complex (MTBC) [2]. IS6110 based Restriction Fragment Length Polymorphism (RFLP) is considered as the 'gold standard' typing method for strains with more than five copies [3][4][5]. As IS6110 transposition is among the first genetic changes to occur in strains from a transmission chain [6], this marker has also been used for outbreak analysis [5].
Modification of the conventional IS6110 typing method, using differentially labelled primers has allowed characterisation of Mtb isolates into the key genetic lineages more rapidly than traditional methods [7]. This approach can be facilitated with automation, which enables this technique to be performed in a high throughput setting. The fragment patterns generated indicates both copy number and insertion site of IS6110 in the genome [8,9]. The patterns generated correlates directly with other independent markers and can be used for transmission investigation locally or internationally. Specific fragments are common in genetically related lineages and do not occur in other groups (e.g. spoligotype groups such as Beijing and the Euro-American lineage which contain the Latin American Mediterranean (LAM), Haarlem, S, T and X spoligotype groups). The patterns generated correlate directly with other independent markers and can be used for transmission investigation locally/internationally. Principal Genetic Groups (PGGs) can be assigned to Mtb strains based on the combination of polymorphism located at katG codon463 and gyrA codon95 in the respective genomes [10] or spoligotypes [11] or global phylogeny classification based on whole genome sequences [12].
Limited data are available on the characterisation of Mtb strains and genotypes circulating in Nepal. A key factor is the geographical location of Nepal, interlocked between China and India, two countries, which together account for approximately a third of annual global new cases (11% and 24%, respectively) [1]. A recent study of 261 Nepalese isolates found any drug resistance (any drug resistance has been defined as resistance to isoniazid, rifampicin, streptomycin, ethambutol, fluoroquinolones, and/or aminoglycosides) in 12.8%% of Mtb strains that were new untreated cases, with the most frequent lineages reported as CAS/Delhi (40.6%), East Asian (including Beijing) (32.2%), Euro-American (15.7%) and Indo-oceanic (11.5%) [13]. To gain further insight into the characteristics and diversity of Mtb in Nepal, our study aimed to categorise isolates for the first time using IS6110 FAFLP PCR and to assign them to different genetic lineages. Secondly, the level of MDR would be characterized in the population using rpoB (rpoB gene encodes for the Beta subunit of bacterial RNA polymerase) sequencing of the Rifampicin Resistant Determining Region (RRDR) as a predictive surrogate [14][15][16][17].

Strains
Sputum samples from 176 consecutive new TB patients over one year were collected between 2007 and 2008 and cultured alongside routine diagnostics from two Nepalese tuberculosis reference centres located in the Kathmandu valley: the National Tuberculosis Centre (NTC) and the German Nepal Tuberculosis Project (GENETUP). The patient population represented local and referred cases from across Nepal. Bacterial genomic DNA from isolated strains was extracted by the Cetyltrimethylammonium Bromide (CTAB) method [18] at the Mycobacterial Research Laboratories (MRL) in Anandaban Hospital. Informed consent was not required at the time of this study, as samples were collected with routine clinical care and all patient identifiers were anonymized; however, all patients were provided an explanation and were only included upon provision of verbal informed consent. Study procedures were reviewed and approved by NTC and GENETUP. The results for the drug sensitivity tests were unavailable during the entire duration of this study.

IS6110 FAFLP PCR, Fragment Sizing and Analysis
Genomic DNA was digested with the restriction enzymes MseI and TaqI followed by ligation with double stranded TaqI restriction site specific adaptors. The adaptor ligated DNA was amplified following previously published PCR conditions using four fluorescently labelled adaptor specific TaqI forward primers -5'-CGATGAGTCCTGACCGA * /C * /T * /G * each labelled with a single unique selective nucleotide at the 3' end and an IS6110 sequence specific reverse primer-5'-CTGACATGACCCCATCCTTT [9]. In a total volume of 20 µl, 1 µl of the adaptor ligated DNA was added to the reaction containing 1X reaction buffer, 1.5 mM MgCl 2 , 0.2 mM dNTPs (Invitrogen, UK) , 1 µM of labelled Taq I forward primer, 1 µM of IS6110 reverse primer and 1 U of recombinant Taq polymerase (Invitrogen, UK). The following PCR conditions were carried out in a Veriti thermocycler (Applied Biosystems, UK): 94°C for 15min followed by 35 cycles of 94°C for 20s, 66°C for 30s and 72°C for 2 min with the 66°C annealing temperature reducing by 1°C every cycle for nine cycles and the last 25 cycles at 56°C. Finally, an extension of 72°C for 60min was carried out before further manipulations. The fragments were separated on an ABI genetic analyser 3730XL (Applied Biosystems, UK), sized using PeakScanner v1.0 software (Applied Biosystems) and identified using their fluorescent tag (Figure 1). The four-dye FAFLP data collected from the different profiles were then recorded and compared with a reference collection of Mtb isolates [19] using BioNumerics software v6.1 (Applied Maths Inc., Belgium). Fragments common to different lineages (defined as being present in >50% of strains in a particular genetic lineage) were recorded for each Nepalese strain and compared with a fully characterised global collection as detailed by Thorne et al. [20]. These data were then used to build a dendrogram using the Dice coefficient of similarities to compare the similarity matrix and Unweighted Pair Group Method with Arithmetic Mean (UPGMA) derived cluster analysis with cophenetic correlation for the branch quality.

rpoB analysis
The 81 bp Rifampicin Resistant Determining Region (RRDR) of the rpoB gene of all strains were sequenced using published primers [21] and analysed in BIOEDIT software using ClustalW alignment parameters. The PCR was carried out in a total volume of 50 µl where 1 µl of the DNA was added to the reaction containing 1xPCR reaction buffer, 1.5 mM MgCl 2, 0.2 mM dNTPs (Invitrogen, UK), 20 µM each of both rpoB-RRDRforward (5'-CGATCACACCGCAGACGTTGA) and reverse primers (5'-GGCACGCTCACGTGACAGACC) and 5U recombinant Taq polymerase (Invitrogen, UK). The following PCR conditions were carried out using a Veriti thermocycler (Applied Biosystems, UK): 94°C for 2 min followed by 35 cycles of 94°C for 30 sec, 60°C for 30 sec and 72°C for 1 min. Finally, an extension of 72°C for 10 min was performed before cleaning the products using AmpureXP magnetic beads (Beckman Coulter, UK) according to the manufacturer's protocol and sequencing using the forward primer, rpoB-RRDR forward.

Analysis of Data using BioNumerics software v6.1
Of the 176 DNA extracts from isolates analysed, the majority of the samples 97 (55.4%) belonged to either the spoligotype-defined Central Asian Strain (CAS) lineage (64 i.e., 36.6%) or the Beijing lineage (33 i.e., 18.8%) grouping under PGG1 and the rest of the samples group under either PGG2 (1.7% S, 3.97% X, 7.95% Haarlem and 2.27% LAM, 2.27% T-Uganda) or PGG3 (2.27% of T) ( Table 1). Forty three samples (24.4%) grouped under "unassigned" group. Common fragments seen were exactly the same as the earlier published report by Thorne et al., (2011) except for an additional fragment, 78.4 G, for the CAS lineage. A dendrogram was generated using only the IS6110 FAFLP data ( Figure  2) confirming again the above mentioned lineages in relation to the PGGs.

rpoB Analysis
Of 176 DNA extracts analysed for rpoB mutations, seven samples (3.9%) had a single non-synonymous base change which would likely confer resistance to rifampicin ( Table 2). Six of these seven samples showed a second base mutation in a codon triplet whereas sample N70 showed a first base mutation.

Discussion
It has been demonstrated previously that IS6110 FAFLP PCR can be used to delineate the phylogeny of MTBC as shared common fragments can determine the different lineages in a geographical location by comparison with a reference database collection [7]. As limited lineage information is available from strains in Nepal, we have applied the IS6110 method published recently on mapping the IS6110 sites in H37Rv [9] and also carried out rpoB sequencing to further characterize strains from this important region.
Fifty five percent of the 176 Nepalese strains analysed belong to the CAS (36.6%) and Beijing (18.8%) modern genetic spoligotypes (PGG1). The remaining 24.4% of the samples belong to the PGG2 and PGG3 groups (Haarlem, LAM, S, X, T-Uganda and T). However, a limitation of this technique is its difficulty to characterise the samples with less than 4-5 copies of IS6110 as seen in the unassigned group (24.4%) in figure 2, which can be overcome by the use of other typing techniques like Mycobacterial Interspersed Repetitive Units-Variable Number Tandem Repeats (MIRU-VNTR) [20]. The geographical position of Nepal is likely to have influenced this distribution, with a mixture of predominantly Beijing lineage from the North of the Himalayas and the CAS lineage from the south [ Where B-Blue coloured fragment R-Red coloured fragment G-Green coloured fragment and Y-Black/ Yellow coloured fragment seen in the electropherogram. PGG represents Principal Genetic groups according to Sreevatsan et al. [10], spoligotypes follow spolDB4 classification [4] and sub-lineages are grouped following Gagneux's classification [8].

Conclusions
The IS6110 FAFLP data from our study reiterates the fact that the geographic location of Nepal is the key for the circulation of PGG1 TB lineages, CAS and Beijing, which were predominant in India and China respectively. Further the RRDR study correlates with the recent work by Creswell et al. showing that prevalence of MDR-TB may be marginally higher than the national average in new untreated TB cases. As the monitoring of TB is important in Nepal, this simple and informative PCR-based molecular epidemiological technique would prove useful for the study of outbreaks of the disease and also to detect cross-contamination between different strains or isolates in resource poor settings. The most common mutation site in the RRDR is at codon 531 and parallels the findings of earlier studies [21,22].

Authorship
All authors mentioned above gave substantial intellectual contribution to this manuscript.