Bacterial whole genome sequencing as powerful tool for hospital molecular epidemiology: Acinetobacter baumannii as a model

In this study, the power of whole genome sequence data in the characterization of 40 clinical isolates of Acinetobacter baumannii was explored. The aim is this study is to demonstrate how bacterial genomic data can be analyzed using easy and semi-automated bioinformatics tools to find answers to clinical microbiology diagnostic problems. These bioinformatics tools can use assembled or un-assembled genome data for species identification, prediction of antibiotics resistant mechanisms and genotyping. In the studied sample, genomics data was successfully used to correct species identification and confirm resistant phenotypes. In addition, multi locus sequence types with three novel sequence types were determined. In conclusion, next generation whole genome sequence data with minor improvement and customization of currently available bioinformatics tools will shortly change the shape of clinical microbiology laboratory services. Correspondence to: Abdalla Ahmed, Department of Microbiology, College of Medicine, Umm Al-Qura University, Makkah, Saudi Arabia, Tel: +966543031577; E-mail: aoaahmed@uqu.edu.sa


Introduction
Recently, the use of Next Generation Sequencing technologies provided unprecedented amount of microbial genomics data. The availability of these genome data is rapidly changing our understanding of microbial behavior, interactions, virulence, antibiotic resistance and genotyping. With the availability of this sequencing technology, it become so popular and many studies have been published in the last few years reporting the whole genome sequences of clinically important bacterial species such as Klebsiella pneumoniae and Acinetobacter baumannii [1][2][3][4]. Whole genome sequence data has been used to study antibiotics resistance [1][2][3]5], molecular epidemiology [4,6] and comparative genomics [7,8]. With few exceptions most of these reports were research articles describing bacterial whole genome sequencing with complex data presentation, complex bioinformatics workflow and many bioinformatics terms. These type of research articles are difficult to understand by general readers, such as clinical microbiology practitioners, without special training in bioinformatics. This knowledge barrier gives wrong impression about the unlimited applications and endless possibilities of using whole genome sequence data in routine clinical microbiology laboratory, which can be directly applied to routine microbiology laboratory. However, a good number of research articles has also been recently published describing a powerful, user-friendly and publicly accessible web-tools with direct applications in clinical microbiology laboratory [9][10][11][12]. Only basic knowledge of bioinformatics is needed for the run of these tools and for the interpretation of the generated reports. These tools are extremely useful in microbial characterization and genotyping, and with more minor customization it will become part of the routine microbiology workflow [13].
In this study, we describe original whole genome data used for the study of the molecular epidemiology of Acinetobacter baumannii in tertiary referral hospital in Saudi Arabia. DNA sequencing and data analysis were all done in clinical microbiology departments with no special bioinformatics trained staff. The sequencing results and data analysis will be presented as simple as possible and no complex terms will be mentioned. The aim of this study is to encourage microbiologist to start using these rapidly evolving tools for uncovering the fascinating world of microbial genomics.

Acinetobacter baumannii isolates
During an apparent outbreak of multidrug resistant A. baumannii during 2013, 40 clinical isolates from two hospitals in Makkah, Saudi Arabia, were studied using next generation whole genome sequencing. A. baumannii clinical isolates were obtained from both medical and surgical wards including different intensive care units at King Abdullah Medical City and Al-Noor Specialized Hospital in Makkah, Saudi Arabia. Identification and susceptibility testing in the two hospitals were done routinely using Siemens MicroScan® WalkAway®-96 Plus System (Siemens, Germany). Clinical isolates were stored at -20°C in 10% glycerol peptone water. DNA sequencing and data analysis were done in the Department of Microbiology, College of Medicine, Umm Al-Qura University, Makkah, Saudi Arabia.

DNA extraction and genome sequencing
Bacterial cells from fresh cultures were used for DNA extraction. Cells were harvested from overnight cultures and washed using sterile Tris EDTA buffer (TE) pH 8.0 in 2 mL screw cap tubes and then resuspended in 500 µl TE buffer. The cell wall was disrupted using 0.1 mm glass beads in BioSpec Mini-Beadbeater-16 (BioSpec Inc., USA) for 5 minutes then cooled in ice for additional 5 minutes. Aqueous layer containing DNA was separated from proteins and cell debris using two Phenol/Chloroform (1:24 pH 8.0) extractions. DNA was then precipitated by iso-propanol, washed by 70% ethanol, dried at room temperature and re-suspended into 35 µl TE buffer pH 8.0. The quantity and quality of the isolated DNA was determined using Qubit ® (Invitrogen, Applied Bio systems, USA), and Agilent Bio analyzer 2100 using 1000 DNA Chip (Agilent Inc., USA).

Library preparation for DNA sequencing
A. baumannii DNA libraries for whole genome sequencing were prepared using Illumina NexteraXT Library Preparation Kit and samples were barcoded using NexteraXT Index Kit (Illumina Inc., USA). DNA sequencing libraries were prepared using 1 ng input genomic DNA, and validated and quantified directly without normalization using Agilent Bio analyzer 2100 High Sensitivity DNA Chip (Agilent Inc., USA). A. baumannii genomes were sequenced in Illumina MiSeq using pair ends protocol and version-3 600 cycles kit. The quality of the pair ends sequence reads were checked by FastQC before sequence assembly (BaseSpace Labs, Illumine Inc., USA).

Genome assembly
De novo assembly of A. baumannii genomes were done using DNASTAR SeqMan NGen 12.3.1 (DNASTAR, Madison, USA) using default settings, which include terming of low quality sequences ends.

Species identification and sequence-based typing
Assembled genomes were used for 16s rRNA based species identification and Multi Locus Sequencing Typing (MLST). 16s species identification and MLST were done using SpeciesFinder 1.0 Server and MLST 1.8 Server from the Center for Genomics Epidemiology [10,14].

Predication of antibiotics resistance mechanisms
In this study, antibiotics resistance mechanisms were predicted using multiple available tools. The antibiotics resistant genes were predicted using ResFinder Server from the Center for Genomics Epidemiology and SRST2 (BaseSpace Labs, Illumine Inc., USA) [11,15]. Antibiotics resistant genes, in selected strains of A. baumannii, were also predicted using Resistance Gene Identifier (RGI), which is designed and developed by the laboratories of Drs. Gerry Wright and Andrew G. McArthur of McMaster University [16,17]. The RGI provides a preliminary annotation of DNA or protein sequence(s), based upon the data available in the Comprehensive Antibiotic Resistance Database (CARD). With all above mentioned tools, only genes with >80% template coverage and minimum match percentage of 90% were reported.

Results
MicroScan® WalkAway®-96 Plus System was able to identify all isolates as A. baumannii. The antimicrobial susceptibility testing showed resistance of all isolates to meropenem and imipenem. Half of the isolates were resistant to colistin with (minimum inhibitory concentration of equal to or higher than 4 μg/ml. Thirty-nine isolates were successfully sequenced and only one isolate failed sequencing. The summary of the de novo assembly is shown in table 1. Based on the 16s rRNA sequences, 36 isolates were correctly identified as A. baumannii using SpeciesFinder tool [14]. One isolate was found to be Stenotrophomonas maltophilia and another isolate was identified as Acinetobacter species. One of the isolates was identified with low confidence as Serratia marcescens (only 33% of reads were aligned to species level). The three non-A. baumannii isolates were excluded from analysis in this study.
The new novel sequence types were designated as ST1286, ST1287 and ST1288 (Table 2). ST218 was found in more than half (4/7) of A. baumannii isolates from AL-Noor Hospital, in which two of the remaining isolates had two different novel sequence types and the third one was found to be ST195, which was the major prevalent sequence types as described above. No clear correlations were noticed between sequence types and certain wards, specimens, or infection date. No correlation was identified between sequence types and biotypes as determined by the MicroScan® WalkAway®-96 Plus System (Table 2).
Bioinformatics tools predicted the presence of large number of resistant genes known to confer resistant to a wide range of major antibiotics classes. Resistant genes to Aminoglycoside, Beta-lactam, Fluoroquinolone, Macrolide, Lincosamide, Phenicol, Sulphonamide and Tetracycline were detected in most of the isolates (Table 3 and  4). OXA-51-like carbapenemases (OXA-66) was found in all isolates, followed by OAX-23, which was found in more than 90% of the isolates. OXA-40-like (OXA-72) was found in only 5 strains. No other Carbapenem-hydrolyzing OXA-type, NDM-type, VIM-type, or IMPtype carbapenemases were detected in our study population (Table 3 and 4).

Discussion
Good results are always obtained when comparing genomic data with phenotypic results generated by commercially available microbiology automated identification and antimicrobial susceptibility testing systems such as Microscan or Vitek2. These systems provide acceptable and reliable data of species identification and antibiotics  susceptibility profiling for routine clinical setting and offer excellent assistance in patient management. However, in case of atypical strains or in case of hospital outbreaks, these phenotypic data usually has limited resolution required for understanding complex resistant phenotypes, outbreak clones, colonization dynamics and species identity of poorly differentiated organisms. In routine hospital outbreak investigation, many molecular biology techniques are used to understand the genetic basis of resistant to a single antibiotic and/ or to determine the genotype(s) responsible for the outbreak. These molecular tools include DNA sequencing of tens and hundreds of target genes to search for resistant and typing markers to resolve the resistant mechanisms and to determine the genotype of each outbreak isolate.
In the presence of currently available next generation technology, huge sequence data become available for each clinical isolates, which can provide immediate microbiology diagnostic solutions. In the current study, we demonstrate power of currently available bioinformatics tools that are capable of analyzing whole genome sequence data and provide total clinically relevant data within acceptable short time frame that can influence patient care. Species identification and antimicrobial susceptibility testing, which are the most important and routine duty of the clinical microbiology laboratory, can be determined directly as soon as the bacterial genomic data become available from the DNA sequencing platform [11,15]. However, there is still need for better bioinformatics tools that directly handle DNA sequence data and perform a sequence of automated bioinformatics workflow followed by automated data interpretation tools to generate easily understandable clinical reports.
Whole genome sequence data can also be used to answer questions beyond the routine clinical microbiology daily needs. This is typically useful in case of outbreak investigations, when genetics relation between different clinical isolates from the same species need to be determined. In this study, some tools have been used to study the clonal relationships between relatively large number of clinical isolates   Table 3. Antibiotics resistant genes in Acinetobacter baumanniias. SRST2 version 1.0.0 (Illumina BaseSpace) was used to predict the resistant genes using ARG-ANNOT database. Only genes detected with >90% coverage are reported.  of A. baumannii from two closely related hospitals. Using draft genome sequence, therefore assembled contigs, multi locus sequence types were determined using web based free tool from the Center for Genomics Epidemiology [10,14]. Similar tool, SRST2, is also available from the Illumina BaseSpace, which is an application that reports the presence of sequences types and/or reference genes from a database of sequences for virulence genes, resistance genes, and plasmid replicons.

Resistant Gene
In this study, 40 clinical isolates were multiplexed in one MiSeq sequencing run using version 3 pair-end library with 600 sequencing cycles. Only one samples failed sequencing, but the remaining samples produced sequence data enough for full isolates characterization (Table  1). Larger number of bacterial genomes can be studied in one batch using sequencing platforms with higher data output such as NextSeq and HiSeq. Therefore, for big clinical microbiology larger sample size can still be sequences and genomics reports can be generated within the same time-frame. In routine microbiology, no single test can be used to produce comparable data with similar power. In this study, only species, antibiotics resistant genes and sequence types were determined. However, using the same genome sequence data many other features can be studied using many freely available and userfriendly tools. The advancement of sequencing technology foster the development of several tools for immediate virulence genes detections, plasmids profiling, serotypes predictions and much more [19][20][21].
The presence of multiple sequence types in our A. baumannii isolates indicate that apparent resistant outbreak was not caused by a single clone. The most prevalent sequence types was ST195 accounting for 47% of all A. baumannii isolates. Similar results were recently reported from the same region [22,23]. ST195 and ST557 were reported by Alyamani and his group from isolates collected form the same city [22], while ST195 and ST208 were reported by study done by Zowawi et al. [23] in isolates representing the Arabian Gulf region [23]. ST195, ST208 and ST218 were found to be closely related to each other with only difference in one allele. ST195 in A. baumannii was also reported from many other different regions such as India, China and Malaysia [24][25][26]. In addition to the known sequence types, three novel sequence types were found among seven isolates collected from both hospitals ( Table 2). Zowawi et al. [23] also reported three novel sequence types in the Arabian Gulf region study [23]. In this study, these novels sequence types were curated and assigned to new sequence types (ST1286, ST1287 and ST1288) Oxford scheme at the A. baumannii MLST database.
Using genome sequence data, a wide range of antimicrobial resistance genes to major antibiotics classes were predicted, which were in consistent with the phenotypic data obtained by Microscan. Different carbapenem resistant genes were reported in different studies in our region [22,23,[27][28][29][30][31]. However, most of these studies used PCR based detection, which need careful design to insure coverage of all carbapenems resistant genes. In addition, by using PCR-based detection of resistant markers it is difficult to use the term "molecular characterization" for even a single class of antibiotics. Therefore, the power of whole genome sequence remains unbeatable in the screening of all acquired and naturally occurring resistant mechanisms not only for carbapenems, but also for all resistant mechanisms to all known antibiotics classes. In our study, large number of resistant mechanisms were identified (Tables 3,4 and 5). Only whole genome sequence data was used for the prediction of these antibiotics resistant mechanisms. Many user friendly bioinformatics tools were tested for prediction of antibiotics resistant mechanisms, which were nicely consistent with each other (Tables 4 and 5). One of these tools is the Illumia BaseSapce SRST2, which can provide clinically relevant antibiotics resistant data within acceptable time frame that can influence patient care.
In classical molecular hospital epidemiology, antibiogram data with genotyping results are usually combined to trace infections source and to understand colonization patterns in patients, healthcare workers and hospital environment. However, when whole genome sequence data become available from clinical isolates, better hospital molecular epidemiology data with high resolution will help in identifying complex outbreak dynamics and evolution. With genome data, unlimited features can be studied and proper microbial molecular characterization can be achieved.
In conclusion, next generation sequencing data is transforming clinical microbiology routine services. In near future, genomics based characterization will replace number of currently used microbiology techniques such as routine bacterial identification, susceptibility testing and serotyping. Microbial genotyping, which normally carried out in case of hospital outbreaks investigation or as part of research projects, will be part of routine the clinical microbiology reports. In near future, new hospital patients admission will routinely be screened for all known antibiotics resistant mechanisms instead off only being screened for carbapenems or methicillin resistance.