Previous Article | Next Article ![]()
Journal of Clinical Microbiology, June 2003, p. 2417-2427, Vol. 41, No. 6
0095-1137/03/$08.00+0 DOI: 10.1128/JCM.41.6.2417-2427.2003
Biology and Biotechnology Research Program, Lawrence Livermore National Laboratory, Livermore, California 94551
Received 7 February 2002/ Returned for modification 14 April 2002/ Accepted 8 February 2003
|
|
|---|
|
|
|---|
TaqMan fluorogenic assays have been shown to rapidly (within 2 h) and successfully identify four viruses that can infect humans, hepatitis viruses B and C (6, 9), Puumala hantavirus (1), and West Nile virus (2), as well as the fungal plant pathogens (12). Not only were these TaqMan assays more rapid than other methods, but they also had sensitivities equal to or greater than that of nested PCR (3).
TaqMan is a fluorogenic probe-based PCR assay in which, situated between two PCR primers, there is an internal oligonucleotide probe with a fluorescent label attached at the 5' end and a quenching molecule that suppresses the fluorescent reporter at the 3' end. During DNA replication in the PCR process, the internal oligonucleotide hybridizes to the template and is digested by the 5'-3' endonuclease activity of the Thermus aquaticus (Taq) DNA polymerase as the PCR primer is extended. The internal oligonucleotide is digested only if DNA replication occurs, separating the fluorescent and quencher molecules. PCR products are detected within minutes by monitoring the increase in fluorescence that occurs exponentially with successive PCR amplification cycles. Thus, TaqMan assays require the design of three nucleic acid probes: two PCR primers and one internal oligonucleotide located between the two primers. One might think that empirical results argue forcefully for the development of TaqMan probes, or signatures, to distinguish any pathogens that pose threats to humans, livestock, or crops. Such signatures would require two things. First, they must be uniquely species specific to preclude false-positive results. Second, they must be species-wide (capable of detecting all known strains of a given species) to ensure positive results for all strains of that species, thus preventing false-negative results. Thus, we define a signature, for the purposes of this paper, as three probes suitable for a TaqMan assay that uniquely identify all strains of a given species. Some argue that species-wide signatures are unnecessary and that only common or currently circulating strains that cause human infection must be detected. However, because our primary focus is bioterrorism, we require species-wide signatures, as it is possible that a terrorist may release an old strain or engineer something into a strain that does not normally cause human infection. While we do have the ability to tailor our signatures to whatever level is required, our focus on defense against bioterrorism demands that we find species-wide signatures in any case.
While the empirical studies mentioned above developed TaqMan signatures for the strains and species tested, they do not demonstrate that the probes are species-wide or species specific compared to the sequences of a wide selection of other organisms. We note that it may also be desirable to develop strain- or serotype-specific signatures for exact identification purposes. Signatures specific to antibiotic resistance, virulence mechanisms, or other features would be useful for the detection of engineered organisms. These topics are not addressed in this paper.
Here, we present results of analyses to determine whether TaqMan signatures that are both species-wide and unique for human immunodeficiency virus (HIV) and hepatitis A, B, C, and E viruses could be developed. The results obtained with these species are presented to illustrate the complexity of the process of extracting high-quality signatures for many widely divergent viruses with high mutation rates (7). Our charter is to develop signatures for a number of viral and bacterial pathogens listed as threats by the Centers for Disease Control and Prevention and other government agencies, but we are prevented from discussing these organisms for obvious reasons of national security.
To briefly preview our results, we found strains of hepatitis A virus to be sufficiently similar that two signatures that were both species-wide and unique could be determined computationally. Thus, TaqMan assays are good candidates for detection of this species, as a single assay could detect all (sequenced) strains. Strains of hepatitis B, C, and E viruses and each of the subtypes of HIV, in contrast, differed sufficiently in that multiple TaqMan signatures would be required to span all clusters of strains in a given species. Considering the cost of TaqMan probes, we conclude that for divergent viruses like HIV, the requirement for multiple signatures to recognize all strains makes TaqMan assays economically unfeasible for large-scale use. However, TaqMan assays could still be feasible in a clinical setting, in which laboratory tests routinely cost $10 to $30.
|
|
|---|
|
View this table: [in a new window] |
TABLE 1. Genbank sequence identification (genomic identity) numbers used to find species-wide probes
|
Sequences suitable for TaqMan assays must satisfy a number of additional specifications; for example, forward and reverse primers and the internal probe must be contained within a stretch of not more than 300 bp (5). Other requirements for the design of primers and probes are detailed at http://www.appliedbiosystems.com/support/techtools/pcropt/. We used the Primer3 program from the Massachusetts Institute of Technology (http://www-genome.wi.mit.edu/genome_software/other/primer3.html) to select our probes and internal oligonucleotides. Considerable effort went into tuning its many parameters to optimize performance for our needs.
We refer to a combination of forward and reverse primers and a probe that satisfy all TaqMan assay specifications as a "candidate signature." On the basis of these guidelines, we developed code to select all acceptable TaqMan candidate signatures from a given gestalt file. We also searched the reverse complement of the gestalt file for probe sequences, since TaqMan assay restrictions on the G:C ratio might be satisfied by the reverse complement of a sequence but not the sequence itself. Some candidate signatures overlapped; for example, one candidate signature could share primers with another signature, but the probe of one would extend 2 bases longer than that of the other. Candidate signatures in which only the probe differed by virtue of being the forward versus the reverse complement of a sequence were also counted as different candidate signatures.
Once we identified suitable candidate signatures, i.e., those that were species-wide, we used the Vmatch software developed by S. Kurtz (S. Kurtz, personal communication [http://www.techfak.uni-bielefeld.de/
kurtz/]) to find whether these potential signatures were unique compared to the sequence data available from GenBank. We compiled the GenBank sequence data into two database files, all_virus and all_microbes. The 137-Mb file all_virus contains 549 distinct species or strains, and the 632-Mb file all_microbes includes 194 species or strains of a wide range of species. To date, we have not included fungi in our analyses, although we hope to do so in the future.
The Vmatch software first builds an efficient computer data structure (suffix tree) that represents all possible substrings of the sequence contained in all_virus and all_microbes (the files that represent our present database of a representative wide variety of viral and microbial DNA sequences). Then we eliminated any of these substrings of 18 bases or more from our candidate signature sequences. The value for 18 bases was chosen to be appropriate for the TaqMan assay. For an all_virus and all_microbes database that completely represented all organisms found in nature, the probability of a false-positive result for a candidate signature would be reduced to 0. The advantage of using an analysis with Vmatch software over an analysis with BLAST software is that Vmatch scales well for large amounts of sequence data. The Vmatch software uses a custom virtual memory scheme to avoid the exhaustion of memory for jobs with large amounts of data. Moreover, the Vmatch software outputs data that are easier to parse than the output from the BLAST program to find all unique stretches of sequence. All signatures that our colleagues aim to use in field surveillance are validated empirically in rigorous screening experiments. Since the signatures that we present here will not be used to detect bioterrorist releases, we do not plan to test them empirically.
Modify the pipeline when multiple signatures are required for species-wide identification. (i) Second approach: PHYLIP clustering. In some cases, the method described above yields no potential signatures. Therefore, we modified the process to determine if we could subdivide the strains of a species, each subset of which could be identified by at least one signature. We aimed to find a minimal set of candidate signatures that would represent all the strains of a target species. That is, all strains must be represented by at least one candidate signature. Some strains may be represented by more than one candidate signature.
Our first approach was to construct phylogenetic trees by using the PHYLIP software package (http://evolution.genetics.washington.edu/phylip.html), based on a number of measures of DNA distance (parsimony, maximum likelihood, fraction of shared bases or shared codons between each pair of strains, etc.). We used these trees to identify clustered subsets of strains. Then we created separate gestalt files of sequences that were conserved within each subset of strains. We hoped to identify a set of two to three signatures, the combination of which would identify all strains of the species under consideration. This phylogenetic clustering approach did not yield acceptable results, so we attempted an alternative solution.
(ii) Third approach: finding all shared candidate signatures, pruning the list, and specifying a minimal set. Our third approach is diagrammed in Fig. 1. For this approach, we again used the gestalt files generated for each pair of strains within a species to find candidate signatures shared by that pair. Then we searched for the primer and probe sequences of each candidate signature in all the other strains of that species. This generated an exhaustive list of candidate signatures and the strains containing each signature.
![]() View larger version (29K): [in a new window] |
FIG. 1. Flowchart outlining our computational approach for finding potential signatures from DNA sequence data.
|
Next, we searched for a set of these candidate signatures of minimal size (minimal set) such that at least one candidate signature in this set would be present in every strain of the target species. First, we selected the two or three candidate signatures from the pruned list whose union maximized the strains represented. Then we added to this union another candidate signature that again maximized the strains represented and so on, until a minimal set was identified. Although this algorithm is not guaranteed to provide the smallest possible minimal set, it will in most cases identify something close to it. We verified that this technique did specify minimal sets for HIV-A, HIV-B, and HIV-C by examining all possible combinations of two, then three, and then four, etc., candidate signatures until we identified the smallest possible minimal set. For HIV-30, however, this algorithm indicated that a minimal set contains nine candidate signatures, which would have required an exhaustive search across the computationally unfeasible 669, which is equal to 5.6 x 1010, possible combinations to verify that it was indeed the true minimal set.
Finally, we used the Vmatch software to test whether all candidate signatures in the minimal set were unique to the species in question and yielded potential signatures comprising a minimal species-wide and species-specific set. Signatures for species of national security concern that will be used for detection in the field are rigorously tested by our colleagues in the laboratory (E. A. Vitalis, L. E. Danganan, J. R. Avila, A. Hubbell, L. Ott, T. A. Kuczmarski, L. Radnedge, N. K. Montgomery, C. L. Strout, T. R. Slezak, R. Meyers, and P. M. McCready, Int. Conf. Emerg. Infect. Dis., p. 163. 2002). Here, we present the results for the particular species evaluated as examples to illustrate our computational methods, but we will not use these assays in the field. Therefore, we are not planning to test these potential signatures in the laboratory.
|
|
|---|
Second approach: PHYLIP clustering. The second approach, in which we attempted to predict clusters based on phylogenetic subgroups (PHYLIP clustering), turned out to be unreliable. Some strains fell within a cluster but shared no candidate signatures satisfying all TaqMan requirements, while other strains were on a branch outside of a cluster yet still shared candidate signatures with members of that cluster (Fig. 2). Consequently, this "searching-by-hand" method was tedious and unfeasible as a means of finding a minimal set of signatures.
![]() View larger version (27K): [in a new window] |
FIG. 2. Phylogenetic tree describing the DNA sequence relationships among the HIV strains used in these analyses and constructed by using the dnapars and drawgram programs in the PHYLIP package. For those strains that share TaqMan assay results with the strain with genomic identity (gi) number 11095910 (enlarged text, fifth from right), the numbers of (potentially overlapping) assays that the pair shares are indicated. The strain with genomic identity number 11095910 shares TaqMan signatures with phylogenetically distant strains and does not necessarily share signatures with many of the more closely related strains, nor do the numbers of signatures shared follow any distinct pattern. Thus, phylogenetic proximity does not directly relate to whether strains share potential TaqMan signatures.
|
Third approach: finding a minimal set of candidate signatures. The third approach was successful at identifying a minimal set of signatures for each target species. Table 2 summarizes the total number of candidate signatures shared by at least two strains, the number of candidate signatures in the pruned list, and the number of signatures in the minimal set for detection of each species.
|
View this table: [in a new window] |
TABLE 2. Number of candidate assays found and size of minimal set
|
|
View this table: [in a new window] |
TABLE 3. Minimal sets of candidate TaqMan assays for different groups of HIVa
|
|
View this table: [in a new window] |
TABLE 4. Minimal sets of candidate TaqMan assays for four species of hepatitis viruses
|
|
|
|---|
Without this care to select signatures that are conserved among strains, one runs the risk of false-negative results in a detection setting, with obvious, potentially calamitous consequences in a bioterror threat situation. The sequences of the forward primer, reverse primer, and probe used by Weinberger et al. (9) matched the sequences of only 17, 15, and 22 of the 44 strains, respectively, that we used in our analyses of hepatitis B virus. The signature used in the assay for hepatitis C virus published by Morris et al. (6) also failed to be conserved among all 27 strains that we examined: the sequences of only 24, 23, and 22 of the strains matched the forward primer, reverse primer, and probe sequences, respectively.
Thus, our computations illustrate that strains of some species have diverged so much that multiple TaqMan signatures are required to preclude false-negative results in a detection assay. At a cost of approximately $2.30 per reaction mixture for a single-probe reaction or $2.50 per reaction mixture for two-probe reactions (12), a requirement for more than two to three TaqMan analyses with different signatures makes this technique unfeasible for large-scale use. Since a maximum of three assays may be carried out in a single reaction (with potential declines in the sensitivity of a triple-assay reaction compared to that of a single-assay reaction), our analyses suggest that a minimum of three reactions would be necessary to verify the presence of HIV, requiring nine signatures. The total cost for this would be over $7.50. In recent deployments with the signatures that we have developed, tens of thousands of assays were performed. Typically, samples are collected every 4 to 12 h from tens of detectors placed in numerous locations throughout a city. We anticipate widespread use of TaqMan assays in future deployments, perhaps spanning many cities nationally and internationally. Clearly, with hundreds of thousands of assays to be performed, it is essential that costs be kept down, and even with volume pricing, the cost for the detection of a single species is far too high. We are also investigating the use of primers containing degenerate bases, although the drawback of this approach is that degenerate bases decrease the sensitivity and selectivity of the reaction. Nevertheless, our laboratory colleagues have found assays containing degenerate bases to be successful.
Instead of using a minimal set of multiple TaqMan assays or TaqMan assays with degenerate bases, alternative techniques should be considered (Fig. 3), such as custom DNA GeneChips (Affymetrix, Santa Clara, Calif.) or bead-based assays. GeneChips are a commercial product from Affymetrix that use the hybridization of unknown nucleic acids that have been amplified and labeled with fluorescent dyes to probes immobilized onto a solid surface. Hybrids are detected by the mapping of fluorescent signals on the surface to specific probes. GeneChips use 20- to 25-mers for highly specific hybridization to detect a sequence (4, 10). Although DNA microarrays that use longer probes of 500 to 5,000 bases may work to distinguish groups of closely related species, they may be less appropriate for strain or species identification, since some hybridization to nonidentical sequences occurs. This would be important, for example, if only one of a group of related species is virulent or drug resistant. Bead-based assays are another alternative (11). For example, the LabMAP system is commercially available from Luminex Corp. (Austin, Tex.). Bead-based technologies use uniquely color-coded microspheres onto the surface of which a capture probe with a specific antibody or complementary oligonucleotide is embedded. Binding of an analyte to a probe is detected via a fluorochrome-labeled reagent, which is measured by flow cytometry or microfluidics and lasers. Although other potential techniques for pathogen identification exist (enzyme-linked immunosorbent assays; other antibody tests with, for example, polystyrene optical fibers; and mass spectrometry, to name a few [www.spectrum.ieee.org/WEBONLY/publicfeature/oct01/bio2.html]), we limit our discussion to TaqMan, GeneChips, and bead-based assays.
![]() View larger version (26K): [in a new window] |
FIG. 3. Comparison of the TaqMan assay, bead-based assays, and GeneChips for pathogen detection in terms of costs per reaction (A), numbers of assays per reaction (B), the cost per assay if the maximum number of assays per reaction are performed (the value for GeneChips is $0.002) (C), start-up costs (D), sensitivity in a pure sample (number of genome copies) (E), and speed of detection after the DNA is prepared, that is, the amplification and detection steps (F).
|
The ability to perform many assays in a single reaction has the additional advantage that one may test for the presence of multiple species in a single reaction. Thus, GeneChips in particular may be especially appropriate for situations in which one does not have any prior knowledge of the pathogens that may be present, since they make an expansive search possible in a single reaction. Ideally, a single "pathogen chip" can be designed to discriminate among a number of different bacteria, viruses, and fungi. Using software similar to that which we have described, one could find 25-mers that discriminate at the level of the genus, species, or strain. This would enable clinicians to identify not only the species but also where that strain originated on the basis of strain-specific sequences. With chips that can hold up to 400,000 different 20-mers, this is a feasible aim. Such a chip might also contain spots to determine whether various antibiotic resistance genes or virulence genes are present.
At present, the first disadvantage of GeneChips is expense: the start-up costs of purchasing the equipment and making custom masks may surpass $200,000, depending on the institution, although with midi or mini arrays, this price could be reduced. Then, the cost per reaction is another $400 for chips with a large number of assays, again with some variation, depending on the institution and scale. With advances in technology, prices may drop. This compares with a start-up cost for TaqMan analyses of $47,000 for equipment and $2.30 per reaction for a single-probe assay, one-target reaction or $2.50 per reaction for a dual-probe assay, two-target reaction (Applied Biosystems) (12).
The second disadvantage of GeneChips compared to the TaqMan assay is the time required to get an answer. Rapid pathogen identification facilitates efforts to minimize exposure. The analysis time required for PCR amplification and detection is about 7 h for GeneChips but is only 1 to 3 h for the TaqMan assay (W. J. Wilson and K. S. Venkateswaran, personal communication). Bead-based assays can also be completed relatively rapidly (2 to 4 h).
Bead-based assays with DNA, for example, assays with Luminex microspheres, are both relatively rapid and relatively cheap, and in a multiplex reaction one can assay for up to 100 analytes, with only minor increments to the cost per reaction ($3.00 for a single analyte and an additional $0.25/analyte when multiplexing is used [K. S. Venkateswaran, personal communication]). However, this is 30 to 35% more expensive than a TaqMan reaction if one performs only one to three assays. In addition, one can test for protein as well as DNA by bead-based assays, unlike the TaqMan assay. Bead-based assays may be appropriate when one wants to detect whether any of a moderate number of species or toxins are present or when the species under consideration is too divergent for feasible detection by the TaqMan assay with only one or two probes. At present, the sensitivities of microsphere assays are limited by the extent of PCR amplification (Venkateswaran, personal communication). Such assays for HIV have been shown to have a lower sensitivity of 500 RNA molecules, or 250 virions (totaling approximately 10 fg of DNA) (8).
In conclusion, we described computational methods that are both species-wide and species specific for the identification of potential signatures for pathogen detection. These methods minimize the expensive and time-consuming in vitro work required to verify those signatures. Computational methods also distinguish species too divergent for feasible detection by the TaqMan assay, such as HIV and hepatitis E virus. We also discussed the relative merits of GeneChips and bead-based assays as alternative methods of pathogen detection.
Information provided by T. Z. DeSantis, K. S. Venkateswaran, and W. J. Wilson was very helpful. We are grateful to S. Kurtz for generously providing us with software. Many thanks to W. J. Wilson and T. Z. DeSantis for comments on drafts of the manuscript.
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»