ABSTRACT
Whole-genome sequencing has taken a leading role in epidemiologic studies of tuberculosis, but thus far, its real-time clinical utility has been low, in part because of the requirement for culture. In their report in this issue, Votintseva et al. (A. A. Votintseva, P. Bradley, L. Pankhurst, C. del Ojo Elias, M. Loose, K. Nilgiriwala, A. Chatterjee, E. G. Smith, N. Sanderson, T. M. Walker, M. R. Morgan, D. H. Wyllie, A. S. Walker, T. E. A. Peto, D. W. Crook, and Z. Iqbal, J Clin Microbiol 55:1285–1298, 2017, https://doi.org/10.1128/JCM.02483-16 ) present a new method for extracting Mycobacterium tuberculosis DNA directly from smear-positive respiratory samples, making it feasible to generate drug resistance predictions and phylogenetic trees in 44 h with the Illumina MiSeq. They also illustrate the potential for a <24-h turnaround time from DNA extraction to clinically relevant results with Illumina MiniSeq and Oxford Nanopore Technologies MinION. We comment on the promise and limitations of these approaches.
The views expressed in this Commentary do not necessarily reflect the views of the journal or of ASM.
TEXT
Whole-genome sequencing (WGS) is becoming a mainstay in epidemiologic studies of tuberculosis (TB), with numerous studies demonstrating greater resolution of transmission with WGS than with classical molecular typing methods (reviewed in reference 1). While accurate resolution of transmission networks is important for public health, it should be noted that countries with the greatest burden of TB often lack the resources to conduct such epidemiologic investigations. In such high-incidence, resource-limited settings, diagnosis and treatment remain the greatest priority for TB control. A role for WGS in the clinical management of TB has been recently discussed (2, 3). As comparably inexpensive and rapid tests for the detection of Mycobacterium tuberculosis are already available, the application to drug resistance prediction is where WGS may ultimately have the greatest impact. We have therefore focused this commentary on this facet of care.
Inadequate access to drug susceptibility testing (DST) and long delays in reporting of drug resistance pose serious challenges to TB control. Phenotypic DST is currently the “gold standard” for determining resistance to antibiotics but can take weeks to months for results, as specimens must be sent to centralized laboratories for culture. Given these delays, rapid molecular tests such as Xpert MTB/RIF (Cepheid Inc., Sunnyvale, CA, USA) and line probe assays (LPAs) have been endorsed by the World Health Organization to complement culture-based DST. Although Xpert MTB/RIF and LPAs have enhanced the early identification of drug resistance, these tests can miss resistance-conferring mutations that are outside the locus of interest (e.g., the rpoB I491F mutation found to cause rifampin resistance is not detected by Xpert MTB/RIF [4]). They are also unable to differentiate between nonsynonymous mutations and synonymous mutations (5), the latter of which are not thought to cause resistance, resulting in false-positive predictions of resistance.
In contrast to these targeted, rapid molecular tests, WGS can theoretically identify all mutations at once and predict the functional effect of these mutations, i.e., whether or not they would likely cause antibiotic resistance. Numerous mutations in the pncA gene and inconsistent, unreliable DST (6) suggest that WGS is the best approach for diagnosing resistance to the drug pyrazinamide. This drug is not only a cornerstone of first-line regimens but also is critical to novel drug combinations (e.g., BPaZM [bedaquiline, pretomanid, pyrazinamide, and moxifloxacin]) in late-phase trials (7). In addition, WGS offers the potential to detect novel drug mutations; this is of particular interest as new antimicrobials become available and are included in treatment regimens for multidrug-resistant TB (7). With WGS, several resistance-connoting mutations have already been identified for bedaquiline and delamanid (8). With the potential to diagnose and predict drug resistance (as well as delineate transmission networks for public health) all in a single test, WGS therefore has the potential to revolutionize TB control programs. To do so, several obstacles must first be overcome in terms of speed (i.e., reducing the turnaround time [TAT] to results), accuracy of predicting phenotypic resistance, and resource requirements.
First, to have the greatest clinical utility and meet the needs outlined by key stakeholders in the TB community (9), the TAT needs to be short, ideally with results available on the same day as testing. Because of several technical difficulties with performing WGS on clinical samples (reviewed in reference 10), thus far, WGS has been conducted predominantly on M. tuberculosis DNA extracted from culture. By this approach, a head-to-head comparison (11) between WGS and conventional DST found no difference in the TATs when real-life delays (as would be experienced in clinical practice) were considered; the median TAT from positive mycobacterial growth indicator tube (MGIT) to WGS-based resistance reports was 31 days (interquartile range [IQR], 21 to 60), while the median time from a positive MGIT result to a DST report was 25 days for conventional DST (IQR, 14 to 32). To reduce losses to follow-up and potentially have an immediate impact on patient care, WGS would ideally be applied directly to clinical samples, as is done with Xpert MTB/RIF.
Thus far, only two studies that attempted WGS on such samples, with various degrees of success, have been published (12, 13). In a proof-of-concept study published in 2015, Doughty et al. extracted M. tuberculosis DNA directly from clinical samples and sequenced it with Illumina MiSeq (Illumina, San Diego, CA); however, while TB could be diagnosed, the M. tuberculosis DNA obtained was insufficient for drug resistance prediction because of contamination with human DNA (12). A second study published the same year by Brown et al. (13) performed targeted enrichment by using oligonucleotide baits to capture M. tuberculosis DNA prior to sequencing with Illumina MiSeq; this approach yielded a ≥20× depth of coverage (the number of sequencing reads that map to a given position in the reference genome, on average) and >98% genome coverage (the percentage of the reference genome with at least one read mapped to it) for 20/24 smear-positive, culture-positive samples (83%). These quality control parameters are in line with early studies that performed WGS of DNA extracted from culture and were sufficient for identification to the species level and drug resistance prediction. However, this method was expensive and had technical requirements potentially beyond the capacity of most microbiology laboratories.
In this issue of the Journal of Clinical Microbiology, Votintseva et al. (14) present a new approach for extracting M. tuberculosis DNA from direct respiratory specimens without enrichment. This method was faster than that described in reference 13 (the TAT, including MiSeq sequencing, was estimated at 44 h for the method of Votintseva et al. and 50 h for that of Brown et al.) and less expensive (96 versus 203 Great Britain Pounds per sample, respectively, for reagents, extraction, and sequencing). All direct samples were correctly identified as M. tuberculosis complex (95% were identified to the species level as M. tuberculosis). The overall quality control metrics were lower than those in reference 13, with a depth of coverage of >12× and ≥90% genome coverage for only 21/37 (57%) of the smear-positive, culture-positive samples. As direct samples were partitioned for routine clinical use prior to analysis, it is possible that this influenced the quantity of M. tuberculosis DNA available; however, sample volume was not associated with the yield of DNA in univariate and multivariate models.
In addition to Illumina MiSeq, the authors also tested and timed two other sequencing technologies, Illumina MiniSeq and Oxford Nanopore MinION (Oxford Nanopore Technologies, Oxford, United Kingdom). For these platforms, DNA was extracted from cultured Mycobacterium bovis BCG by a previously reported protocol (15). Pure BCG DNA was then sequenced, as well as BCG DNA that was experimentally added (spiked) at specified concentrations (ranging from 5% to 15%) to smear-negative, culture-negative sputum. A PCR amplification step using a new protocol developed by the authors was performed prior to sequencing with MinION. The TAT from DNA extraction to complete resistance prediction (and phylogenetic placement) was 16 h for MiniSeq, while the TAT to these results for a single sample of M. bovis BCG DNA spiked at a 15% concentration was estimated at 12.5 h for a MinION 9.4 flow cell (actual sequencing was performed over 48 h in total). An advantage of MinION is that data can be analyzed in real time; M. bovis BCG DNA was detected and correctly identified at 1 h while sequencing was still ongoing. Somewhat disappointingly, the authors did not test their novel extraction protocol for direct sequencing of M. tuberculosis in combination with either MiniSeq or MinION. While the authors estimated the potential TAT for M. tuberculosis with the MinION R9.4 flow cell based on the single M. bovis BCG run, to properly assess the performance of the direct sample DNA extraction protocol in combination with this technology, future studies are needed that use real (rather than experimentally generated) respiratory samples obtained from patients positive for M. tuberculosis. Nonetheless, this study serves as a valuable proof of concept, demonstrating the potential for WGS as a same-day test.
In addition to the TAT, the accuracy of WGS-based prediction is also a current barrier to implementation; to be financially feasible for most TB control programs, WGS would optimally serve as a replacement test, eliminating the need for culture altogether. To do so, we need sufficient WGS quality to accurately identify (call) single-nucleotide polymorphisms (SNPs; single base pair changes compared to a reference). The frequency and type of sequencing errors vary across platforms because of differences in the underlying sequencing chemistry (e.g., Illumina MiSeq has an error rate of 0.8%, PacBio RS has an error rate of 12.9%, and MinION has an error rate of 5 to 30% [David Dolinger, Foundation for Innovative New Diagnostics [FIND],Geneva, Switzerland, personal communication]). Such errors must be accounted for in the analysis, with only high-quality base calls retained. In the case of MinION, which Votintseva et al. applied—for the first time ever reported—to Mycobacterium (14), the authors identified a systematic A-to-G error bias in 1D reads, which must be accounted for in the bioinformatic analysis.
In addition to such platform-specific considerations, overall quality control parameters such as depth of coverage and genome coverage also matter when assessing sequencing results. For such parameters, the minimum thresholds necessary for clinical use still need to be determined. With less depth of coverage (and/or less genome coverage), resistance-connoting mutations are more likely to be missed. Using lower thresholds for depth may also reduce our power to detect populations with mixed resistance profiles or rule out false-positive SNPs due to sequencing or mapping errors. Previous studies performing WGS of DNA from cultured M. tuberculosis have required a minimum 8× to 20× depth of coverage at the specific locus of interest to confidently call a SNP (with a minimum average of 40× to 50× across the genome currently recommended). Given the technical difficulty of sequencing from direct samples, these thresholds may not be feasible for this approach. In reference 14, Votintseva et al. required a >3× depth to allow resistance predictions to be made, a threshold met by only 24/37 (65%) of the direct samples of M. tuberculosis sequenced by Illumina MiSeq. Of the 96 possible predictions made for first-line drugs, 92 were concordant with phenotypic DST (96%); the 4 discordant predictions (for rifampin and pyrazinamide) were made on the basis of samples from a single patient that had various phenotypes for these drugs. When experimentally spiked M. bovis DNA was sequenced with MinION, the expected pncA H57D mutation was confidently called for all but the lowest concentration of BCG DNA (5%). However, while this resistance mutation was correctly classified by using a 3× threshold, it is important to note that deeper sequencing was required to confidently exclude SNPs at 174 other loci known to be associated with resistance. Given the consequences of both false-positive and false-negative resistance calls, before WGS can be implemented in the diagnostic work flow, quality control thresholds used must therefore undergo thorough validation and ideally be standardized for reproducible clinical use (16, 17).
In addition to being influenced by WGS data quality, our ability to accurately predict phenotypic resistance is also currently limited by our knowledge of resistance-connoting mutations. The potential for interaction between different mutations, both compensatory and resistance connoting, as well as the strain genetic background (i.e., lineage) (18), further complicates this understanding. While several databases have curated mutations from the literature (e.g., TBDReaMDB [19] and TBDR [16]), it is difficult to evaluate the causality of such mutations and assess for potential interactions absent complete WGS and phenotypic data at the isolate level. Studies using WGS to predict drug resistance (20–22) have shown that the sensitivity and specificity of this approach vary substantially by drug, with the lowest accuracy for second-line anti-TB drugs. As phenotypic DST for second-line drugs is typically performed only if first-line drug resistance is detected, the numbers of isolates with both genotypic and phenotypic data for these drugs is small. To improve our knowledge of genotypic-phenotypic concordance, large WGS databases with corresponding DST results, incorporating both drug-sensitive and drug-resistant isolates, are therefore needed (16). In response to this, several multinational programs have been initiated (e.g., the CRyPTIC Project at http://modmedmicro.nsms.ox.ac.uk/cryptic/ and ReSeqTB [23] at https://platform.reseqtb.org ). These programs are facilitating the collection and curation of WGS data with corresponding phenotypic DST, as well as critical clinical outcome data, from both the public and private sectors, which will enhance our understanding of causal resistance-connoting mutations. Until such time, if WGS is used to predict resistance, phenotypic DST should be performed in parallel—a prospect that is likely cost prohibitive for most TB control programs.
In addition to issues with speed and accuracy, the substantial resources required for WGS are other obstacles to implementation. A large capital investment is necessary to establish the infrastructure for WGS, which may be particularly difficult in developing countries. Sequencing platforms themselves are expensive; as of January 2017, the Illumina MiniSeq and MiniSeq “desktop sequencers” cost approximately $50,000 and $100,000, respectively (excluding consumables). Platforms with even higher throughput, such as PacBio RS P6-C4 (Pacific Biosciences, Menlo Park, CA, USA) and Illumina HiSeq X, cost $675,000 and $1 million, respectively (Dolinger, personal communication). Aside from the platforms themselves, laboratories must also have allocated space, continuous, uninterrupted power supplies, and highly trained personnel to calibrate, operate, and support these platforms (24, 25). Oxford Nanopore Technologies' MinION, recently used to sequence Ebola strains in Guinea (24), has been proposed as a potential solution to some of these implementation issues. The Oxford Nanopore MinION platform itself costs only $1,000 (price as of 3 February 2017 [Oxford Nanopore Technologies], including two flow cells and a starter kit of reagents). However, a maximum of 12 samples can be sequenced at once with the MinION 9.4 flow cell, and as these flow cells are not reusable, a new flow cell must be used for each additional test (at a cost of ∼$500 apiece when purchased in bulk). In terms of laboratory requirements, MinION is more easily transportable and requires less lab space (24, 25) but still requires human intervention for sample preparation. Thus, regardless of the next-generation sequencing platform used, most of the required resources are currently available only in a reference laboratory setting, prohibiting the decentralization of WGS for TB diagnostics (26).
In addition to these infrastructure requirements, the bioinformatic analysis of WGS data has been a major bottleneck in the widespread use of this method. Recently, substantial advances in this area have been made with the development of rapid, user-friendly tools such as PhyResSE (27), TBProfiler (22), TGS-TB (28) (which uses KvarQ [29]), and Mykrobe Predictor (20) to facilitate resistance prediction. These tools can take raw sequencing files from cultured M. tuberculosis DNA all the way through to resistance profiles (though one needs some understanding of bioinformatic concepts for interpretation). Both Mykrobe and KvarQ can be run offline, though, to our knowledge, Mykrobe is the only rapid resistance tool compatible with the readily transportable MinION platform; this is ideal for use in regions with limited internet access (though periodic updates to include novel resistance-connoting mutations from the literature would be needed). To utilize these tools with the direct sequencing protocol proposed in reference 14, human DNA must be removed a priori; as this involved mapping to a human reference genome, future software iterations will need to automate this step for easier use by frontline technicians without bioinformatic expertise.
In conclusion, WGS has the potential to become the future of TB DST (30)—provided these key considerations in terms of speed, accuracy, and resource requirements are addressed. Studies like that described in reference 14 provide the essential methodological advances and proof of concept needed to help move WGS to the clinical arena and ultimately make real-time sequencing of M. tuberculosis a reality. To fully evaluate the different DNA extraction protocols for direct sequencing, in combination with different sequencing platforms, future studies are needed that employ these methods with the same samples in a head-to-head comparison. Additional studies are also needed to validate bioinformatic thresholds for clinical use and assess the performance of direct sequencing on different specimen types (e.g., smear-negative, culture-positive samples and nonrespiratory samples). Given that countries with the greatest burden of TB are those with more limited resources, lower-cost, decentralized, simpler platforms are essential for any meaningful scale-up.
ACKNOWLEDGMENTS
We thank Marcel A. Behr, McGill University, for his critical review of this report and David Dolinger, FIND, for providing current costing data for next-generation sequencing platforms and discussing issues in comparing these technologies.
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
FOOTNOTES
- Accepted manuscript posted online 15 March 2017.
For the article discussed, see https://doi.org/10.1128/JCM.02483-16.
- Copyright © 2017 American Society for Microbiology.