ABSTRACT
Next-generation sequencing (NGS) of bacterial genomes has recently become more accessible and is now available to the routine diagnostic microbiology laboratory. However, questions remain regarding its feasibility, particularly with respect to data analysis in nonspecialist centers. To test the applicability of NGS to outbreak investigations, Ion Torrent sequencing was used to investigate a putative multidrug-resistant Escherichia coli outbreak in the neonatal unit of the Mercy Hospital for Women, Melbourne, Australia. Four suspected outbreak strains and a comparator strain were sequenced. Genome-wide single nucleotide polymorphism (SNP) analysis demonstrated that the four neonatal intensive care unit (NICU) strains were identical and easily differentiated from the comparator strain. Genome sequence data also determined that the NICU strains belonged to multilocus sequence type 131 and carried the bla CTX-M-15 extended-spectrum beta-lactamase. Comparison of the outbreak strains to all publicly available complete E. coli genome sequences showed that they clustered with neonatal meningitis and uropathogenic isolates. The turnaround time from a positive culture to the completion of sequencing (prior to data analysis) was 5 days, and the cost was approximately $300 per strain (for the reagents only). The main obstacles to a mainstream adoption of NGS technologies in diagnostic microbiology laboratories are currently cost (although this is decreasing), a paucity of user-friendly and clinically focused bioinformatics platforms, and a lack of genomics expertise outside the research environment. Despite these hurdles, NGS technologies provide unparalleled high-resolution genotyping in a short time frame and are likely to be widely implemented in the field of diagnostic microbiology in the next few years, particularly for epidemiological investigations (replacing current typing methods) and the characterization of resistance determinants. Clinical microbiologists need to familiarize themselves with these technologies and their applications.
INTRODUCTION
Research in the field of pathogen biology has been transformed over the last several decades with the introduction of whole-genome sequencing, beginning with the complete sequencing of the Haemophilus influenzae genome in 1995 (1). This has led to significant developments in the study of molecular epidemiology, virulence, antimicrobial resistance, and vaccinology and in understanding complex microbial communities. More recently, the development of high-throughput (or “next-generation”) sequencing technologies has meant, for the first time, that these methods fall within the financial and technical grasp of a medium or large diagnostic microbiology laboratory (2).
Next-generation sequencing (NGS) methods are also being used in smaller-scale local projects to determine epidemiology in outbreak settings (2, 3), examine the development of resistance mutations during antibiotic use in a single patient (4, 5), and identify bacteria in place of using 16S rRNA sequencing (6). However, the uptake of these technologies into the diagnostic laboratory setting has been slow. Here, we conducted a pilot project to assess the feasibility and practicability of applying NGS methods to address common clinical questions in a diagnostic microbiology laboratory setting.
Outbreak description.An outbreak of extended-spectrum beta-lactamase (ESBL)-producing Escherichia coli was suspected in the neonatal intensive care unit (NICU) of the Mercy Hospital for Women, Melbourne, Australia. Multidrug-resistant Gram-negative strains have not been detected previously in this NICU. Subsequently, a screening program was implemented which screened all babies in the NICU for ESBL E. coli by the use of rectal swabs in a 48-h period. The rectal swabs were cultured directly onto selective chromogenic medium (chromID ESBL agar, bioMérieux), and positive colonies were further identified and had antibiotic susceptibility testing performed on the Vitek 2 Compact system (bioMérieux). Aside from the index case, three out of 33 neonates (9%) were found to be positive for ESBL E. coli carriage by screening.
The first case (E. coli isolate BPH0657) was from a blood culture from the index case, a twin male born at 26 weeks gestation, who developed fatal sepsis and meningitis at 16 days postbirth. He was treated empirically with intravenous cefotaxime and gentamicin, according to NICU protocol, and died prior to the culture results becoming available. The second case (E. coli isolate BPH0530) was cultured from an eye swab from the twin brother of the index case, who was also positive by rectal swab screening. The first two cases (twins) were born to a paraplegic mother, who was managed by a tertiary spinal unit, and who had a history of recurrent urosepsis.
The two other NICU isolates (E. coli BPH0532 and BPH0658) were detected in neonates during rectal swab screening (asymptomatic colonization). The four isolates had the same biochemical profiles and antibiograms, consistent with an ESBL-type pattern, as follows: ampicillin resistant, cefoxitin susceptible, ceftriaxone resistant, ceftazidime resistant, ceftazidime-clavulanate susceptible, gentamicin resistant, tobramycin resistant, amikacin susceptible, ciprofloxacin resistant, and cotrimoxazole susceptible.
Our laboratory was asked to perform clonality testing to determine if this was a clonal outbreak. As we did not have established methods for clonality testing of Gram-negative organisms, we explored the utility of NGS to investigate this putative outbreak. All four NICU isolates were analyzed.
For comparison, another E. coli strain (BPH0659) with the same antibiogram was selected for sequencing. This organism was cultured from a fecal sample of an adult patient in an adjacent intensive care unit (ICU), as no other ESBL E. coli had previously been isolated from the NICU.
MATERIALS AND METHODS
Cultures and DNA extraction.A single colony of each isolate was selected and cultured in tryptone soya broth (Oxoid) overnight at 37°C on a shaker. DNA extraction was performed on the broth cultures using the DNeasy Kit (Qiagen).
Genome sequencing and data analysis.Sequencing was performed using the Ion Torrent personal genome machine (Life Technologies, Guilford, CT) with 316 chips and 100-bp sequencing chemistry. De novo genome assembly and read mapping were performed with the CLC Genomics Workbench v5.1 (CLC bio A/S, Denmark), using the fully sequenced uropathogenic E. coli strain S88 (GenBank accession no. NC_011742) as the reference (7). Artemis was then used to explore the resulting FASTA files from the contigs of the partially de novo-assembled genomes (8).
Epidemiological analysis.For epidemiological analysis, another implementation of read mapping was used to compare the four outbreak isolates to all publicly available E. coli complete genome sequences (7, 9–26) (see Table S1 in the supplemental material). The reads from all genomes were aligned with the E. coli S88 reference using SHRiMP 2.2 (27). Single nucleotide polymorphisms (SNPs) were identified using Nesoni v0.70, which compares the aligned reads of each genome against the reference to construct a tally of putative differences at each position, including substitutions, insertions, and deletions (Victorian Bioinformatics Consortium). Phylogenetic analyses were performed using a distance method, based on pairwise comparisons of conserved SNPs among all strains. Split decomposition analysis was employed using uncorrected pairwise (p) distances with bootstrapping as implemented in SplitsTree4 (28). Published E. coli multilocus sequence typing (MLST) primer sequences (www.mlst.net) were used to determine the relevant sequences from the partially assembled genomes, and the sequences were analyzed using the online MLST database (www.mlst.net).
Resistance determinants.No automated method to screen NGS data for a wide range of antimicrobial resistance determinants is freely available. Therefore, we manually explored the partially assembled genomes for signature DNA sequences derived from important Gram-negative resistance determinants (29–43) and plasmid replicon types (44) using the search function in Artemis (8) (see Table S2 in the supplemental material). A list of primers was compiled from the literature with PubMed using the search terms “multiplex PCR,” “Escherichia coli,” and “resistance.” Papers with published primer sequences were included. This search was extensive but not exhaustive.
When primer sequences were detected in the interrogated genomes, the sequence between the forward and reverse primers was selected and submitted to a BLAST (Basic Local Alignment Search Tool) nucleotide search on the NCBI database (http://blast.ncbi.nlm.nih.gov/). Results of the BLAST search were used to confirm the presence and exact type of resistance determinants present. In addition to manual searches for primer sequences, a local BLAST database was generated in CLC Genomics Workbench (CLC bio A/S, Denmark) to screen for resistance determinants in the partially assembled genomes.
RESULTS
Sequencing and genome assembly results.There was a 5-day turnaround time from bacterial culture to sequence generation and preparation for analysis. The cost for consumables alone was approximately $300 per strain. Output from the Ion Torrent personal genome sequencer comprised moderate-quality sequencing data for all isolates (Table 1). De novo genome assembly and read mapping (using the E. coli S88 strain) resulted in 88 to 89% of the reference genome coverage in the outbreak strains and 84% in the comparator strain. Depth of coverage was between 33- and 67-fold (Table 1). The sequence data have been submitted to the NCBI Sequence Read Archive under accession no. SUB142165.
Sequencing and de novo genome assembly quality parameters (Ion Torrent PGM)a
Epidemiologic analysis.All four outbreak strains were found to belong to multilocus sequence type (ST) 131, whereas the comparator strain was similar to strains of ST-23. Global SNP analysis revealed 100% homology between the four outbreak isolates, with no SNPs detected, and showed very little homology between those isolates and the local comparator strain (BPH0659) (Fig. 1). This definitively demonstrates that these strains were nosocomially spread within the NICU and that these strains were significantly different from other circulating ESBL E. coli strains in our hospital. When the genome sequences of the outbreak strains were compared to all published fully and partially sequenced E. coli genomes, they were found to be most closely related to uropathogenic E. coli strains (Fig. 1). Of note, the global phylogeny of E. coli genome sequences demonstrated clear clustering based on clinical groupings, such as enterohemorrhagic E. coli (EHEC) and laboratory strains.
Phylogenetic analysis of E. coli strains. Phylogenetic analysis of the four outbreak E. coli strains (BPH0530, BPH0532, BPH0657 [index case], and BPH0658) and local comparator strain (BPH0659) compared to publicly available fully and partially sequenced strains, inferred by split decomposition analysis based on single nucleotide polymorphisms (SNPs). This analysis demonstrates that the four outbreak strains are identical by SNP analysis, cluster with uropathogenic strains of E. coli and a neonatal meningitis strain (S88), and differ significantly from the local comparator strain. (See Table S1 in the supplemental material for details about other strains and references). EHEC, enterohemorrhagic E. coli; EPEC, enteropathogenic E. coli; ETEC, enterotoxigenic E. coli.
Resistance determinants.blaCTX-M-15 was detected in the four outbreak strains, which is consistent with the ESBL phenotype, and blaTEM-1 was detected as well. In addition, the genes for aminoglycoside resistance (aadA1), tetracycline resistance (tetA), and low-level trimethoprim resistance (dfrA1) were detected. Point mutations associated with quinolone resistance were detected in gyrA (ΔS83L and ΔD87N) and parC (ΔS80I and ΔE84V) in all four strains. Plasmid type IncFIA was detected in the outbreak isolates.
Despite having the same antibiogram as the outbreak strains, the comparator strain (BPH0659) carried antimicrobial resistance genes that differed significantly from those in the outbreak strains. Although all strains possessed blaCTX-M-15, blaTEM-1, tetA, and dfrA1, the comparator strain also harbored blaOXA-1, qnrS1 (plasmid-mediated quinolone resistance determinant), aacIb-cr, and sul1, which were not present in the outbreak strains.
DISCUSSION
Here, we have conducted a pilot project to determine if NGS technology might be applied to a clinical infection control question, and we identified the challenges that need to be overcome before application of this technology in a diagnostic microbiology laboratory becomes routine. We have successfully conducted a local outbreak investigation, assessed our strains in the context of worldwide epidemiology, and characterized the resistance genes of our strains using a single test methodology. In our hands, we estimate the sequencing cost to be approximately $300 per strain (excluding analysis), compared to a locally available pulsed-field gel electrophoresis (PFGE) cost of approximately $150 per strain and the multilocus sequence typing (MLST) cost of approximately $120 per strain. However, with multiplexing, sequencing costs of less than $100 per isolate are possible with platforms such as the MiSeq benchtop sequencer (Illumina Technologies). We also estimate our real-world turnaround time to be as little as 5 days from a positive culture to sequence completion (prior to data analysis, depending on the clinical question being investigated). This time might decrease to as little as 24 h with the advancement of NGS technologies.
Using SNP analysis of the E. coli core genome, we have confirmed our four outbreak isolates to be identical, substantiating the hypothesis of a secondary spread of this ESBL E. coli strain within the NICU. The isolates have also been identified as ST-131, a successful E. coli clone commonly associated with multidrug resistance, particularly due to the presence of blaCTX-M genes and especially blaCTX-M-15. E. coli ST-131 strains have been recently described as the worldwide pandemic clone, most commonly causing community-onset antimicrobial-resistant infections, particularly urinary tract infections (45). This clone has only recently been reported in our region, and this information adds to other epidemiologic information regarding the spread of this strain in Australia; the rapid recognition of this clone in the NICU further enhanced concern regarding the outbreak.
As our NICU has had a stringent policy of restricted antimicrobial use, this outbreak was the first time that such a resistant enteric Gram-negative organism had been isolated in the unit. Together with the increasing prevalence of these organisms, these cases demonstrate the increasing need to consider maternal risk factors for colonization with resistant Enterobacteriaceae (including medical history, prior antibiotic therapy, and travel history) when managing a septic neonate.
We have also characterized the resistance genes for this outbreak strain, yielding a blaCTX-M-15 ESBL gene (the most common ESBL gene worldwide) (46), as well as fluoroquinolone resistance mutations and the genes encoding aminoglycoside, tetracycline, and low-level trimethoprim resistance. Sequencing has also been used to demonstrate the presence of significantly different antimicrobial resistance genes to discriminate between two strains with the same antibiograms.
As the costs of NGS continue to fall, personal (benchtop) genome sequencers (compact and relatively low-cost platforms suitable for the diagnostic laboratory setting) become more widely available, and turnaround times become more rapid, whole-genome sequencing of pathogens will very soon be within the reach of the routine diagnostic laboratory (2). However, as noted by others, the manipulation and interpretation of data are more likely to be the rate-limiting steps in NGS application than is genome sequencing (2, 47, 48). Although the situation is improving, there are currently few user-friendly bioinformatics software platforms available for use by diagnostic microbiology scientists and doctors, who might not have extensive knowledge of genomics and bioinformatics (47, 49). There is also an urgent need for scientists and clinical microbiologists to increase their understanding of genomics before these technologies can be applied to clinical questions, and before sequencing results and limitations can be accurately conveyed to clinicians (49) (Table 2).
Potential current and future applications of NGS in the diagnostic microbiology laboratory, and limitations to be addressed before widespread NGS implementation
Perhaps the most promising application for NGS technologies is in molecular epidemiology, offering the ultimate in high-resolution genomic epidemiology. It has the potential to offer real-time, portable, digital, and clinically relevant molecular typing of isolates in outbreak investigations, at costs that will very soon approach those of the older, more labor-intensive typing methods (2, 50). However, further studies are required to examine the rates and modes of genetic evolution of different pathogens before large-scale application is available in this area. In the medium term, it is important to ensure that sequencing data are backwards compatible with current typing methods, such as with MLST in the case of E. coli (51). There is also a need to collect more sequence data on less-common organisms, which otherwise might be neglected in sequencing studies (2).
Of course, not every multidrug-resistant organism or outbreak will require the use of NGS, especially in the short term before this technology becomes more commonplace in the diagnostic laboratory. However, we have demonstrated here its potential utility in a common clinical scenario and have identified some of the challenges that we face as a community of scientists and clinicians before its widespread implementation. Strong partnerships between experts in the fields of sequencing, genome assembly and annotation, molecular epidemiology, and bioinformatics will be required to create user-friendly, streamlined workflows before NGS can be successfully applied in diagnostic laboratories.
ACKNOWLEDGMENTS
We thank the scientists from the Austin Hospital microbiology laboratory for their assistance and Maree Sommerville for providing epidemiologic data.
B.P.H. is supported by a fellowship from the National Health and Medical Research Council (NHMRC), Australia.
FOOTNOTES
- Received 18 December 2012.
- Returned for modification 18 January 2013.
- Accepted 6 February 2013.
- Accepted manuscript posted online 13 February 2013.
Supplemental material for this article may be found at http://dx.doi.org/10.1128/JCM.03332-12.
- Copyright © 2013, American Society for Microbiology. All Rights Reserved.