Previous Article | Next Article ![]()
Journal of Clinical Microbiology, August 2005, p. 3811-3817, Vol. 43, No. 8
0095-1137/05/$08.00+0 doi:10.1128/JCM.43.8.3811-3817.2005
Copyright © 2005, American Society for Microbiology. All Rights Reserved.
Dan J. Kuyper,2
Peter C. Iwen,1*
Hesham H. Ali,2
Dhundy R. Bastola,1 and
Steven H. Hinrichs1
Department of Pathology and Microbiology, University of Nebraska Medical Center, Omaha, Nebraska 68198-6495,1 Department of Computer Science, University of Nebraska at Omaha, Omaha, Nebraska 68182-01162
Received 9 November 2004/ Returned for modification 20 December 2004/ Accepted 17 April 2005
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
DNA probe assays for identification of a select group of Mycobacterium species from culture have been widely accepted (19). These probes have been shown to be sensitive and specific for the identification of the most common Mycobacterium species (19). However, they are limited in application and are not capable of identifying all mycobacterial species that may be encountered in the clinical laboratory (32).
An alternative approach to the use of DNA probes is sequence analysis of specific genetic elements (39). Three major genomic targets in the Mycobacterium genus have been investigated and include the 16S rRNA gene, the heat shock protein 65 gene (hsp65), and the recombinase A gene (recA) (2, 4, 33, 34, 38, 39). The 16S rRNA gene has been the target most widely used; however, the presence of identical or highly similar 16S rRNA sequences limits the use of this target for differentiation (4, 8, 18, 21, 27, 35, 36, 42). More recently, the internal transcribed spacer 1 (ITS-1) region sequence located between the 16S rRNA and the 23S rRNA genes has been proposed as an alternative target to the 16S rRNA gene due to a high level of expression and to the greater sequence variability among species and strains (12). Several ITS-1 sequence-based assays have been successfully developed as alternative approaches for the identification of Mycobacterium species (10-12).
Utilization of the ITS-1 sequence for species identification requires the availability of a reliable database for comparison studies and a computational approach to sequence alignment analysis (3, 5). The most popular approach currently used for sequence comparison analysis employs a BLAST search of the GenBank database (National Center for Biotechnology Information [NCBI], Washington, D.C.) (http://www.ncbi.nlm.nih.gov/BLAST/). Although this approach is valuable, the absence of a specific validation system to limit the number of low-quality sequences due to sequencing errors and the presence of improperly or ambiguously named sequences contributes to unreliability of the database. In addition, the large size of the GenBank database makes certain operations such as running optimal alignment algorithms impractical due to time constraints (1, 3).
To overcome these challenges, an algorithm-based method to rapidly and reliably identify Mycobacterium species was developed using the ITS-1 region as a molecular target. In addition, the algorithm was developed for evaluation of new sequences. The system, called MycoAlign, was investigated for functionality under clinical circumstances. The evaluation included three components: a comparison with the results generated using the GenBank BLAST with the 16S rRNA region as a target, comparison with the results of conventional culture testing, and analysis of discrepant isolates by use of an independent database.
| MATERIALS AND METHODS |
|---|
|
|
|---|
|
DNA extraction, target amplification, and sequencing. An inoculating loopful of mature culture on L-J medium was removed and subjected to DNA extraction. Genomic DNA was extracted from the isolates by the glass bead agitation method as previously described (28). The crude DNA extract was purified using a QIAmp blood kit (QIAGEN Inc., Valencia, Calif.) according to protocols provided by the manufacturer. Both of the 16S rRNA and ITS-1 region targets were amplified from all isolates for sequence-dependent identification. The hypervariable segment of the 16S rRNA (approximate 500 bp) was amplified using the previously described primer set 5'-TGG AGA GTT TGA TCC TGG CTC AG-3' and 5'-TAC CGC GGC TGC TGG CAC-3' (13). The hypervariable region of the ITS-1 region (approximately 250 to 350 bp) utilized by the MycoAlign custom database was amplified using a newly specified pan-Mycobacterium primer set. The forward primer ITS-A1 (5'-GAA GTC GTA ACA AGG TAG CCG-3') amplified from the 3' end of the 16S rRNA, while the reverse primer ITS-A6 (5'-G ATG CTC GCA ACC ACT ATC CA-3') amplified from within the ITS-1 target. The PCR assay for each assay was performed using 5 µl of template DNA (10 ng/µl) in a total reaction volume of 50 µl to include PCR buffer (20 mM Tris-HCl [pH 8.4] and 50 mM KCl); 0.1 mM (each) dATP, dGTP, dTTP, and dCTP; 1.5 mM MgCl2; 0.3 µM (each) primer; and 1.5 U of Platinum Taq High-Fidelity DNA polymerase (Gibco BRL, Life Technologies, Gaithersburg, Md.). Amplification was performed on a Stratagene Robocycler model 96 thermocycler (Stratagene, La Jolla, CA), starting with an initial denaturation step at 95°C for 10 m, followed by 35 cycles, each cycle consisting of a denaturation step at 95°C for 1 min, an annealing step at 64°C for 1 min, and an extension step at 72°C for 1 min. An additional extension step at 72°C for 7 min was performed after the last cycle. Ten microliters of amplicon was loaded onto a 2% agarose gel and subjected to electrophoresis to evaluate the size of the PCR products. PCR products were purified before being sent for sequencing using a QIAquick PCR purification kit (QIAGEN Inc., Valencia, Calif.). Purified PCR products of either 16S rRNA or ITS-1 targets were sequenced at the Eppley Molecular Biology Core Laboratory (University of Nebraska Medical Center, Omaha, NE) using the same forward and reverse PCR amplification primers for both targets.
Sequence source. The National Center for Biotechnology Information GenBank database (NCBI, Washington, D.C.) (http://www.ncbi.nlm.nih.gov/) and the Ribosomal Differentiation of Medical Microorganisms database (RIDOM, Würzburg, Germany) (http://www.ridom-rdna.de) were examined for the availability of Mycobacterium species ITS-1 sequences. When identified, the ITS-1 sequences were analyzed by the validation program as described in the custom database design section before incorporation into the MycoAlign database. A total of 74 Mycobacterium species ITS-1 region sequences were obtained from GenBank, 23 were obtained from the RIDOM database, and 1 sequence (M. nebraskense) was generated as a result of this study (23). The GenBank sequences include those of the following species (accession no.): M. abscessus (AJ314870), "M. acapulcensis" (AF191094), M. asiaticum (AB026703), M. africanum (AB026699), M. avium subsp. avium seq. I/Mav-A (AB026690), M. avium subsp. avium seq. I/Mav-D (L07858), M. bohemicum (AJ277282), M. botniense (AJ012756), M. bovis (L26328), M. celatum (AF375990), M. chelonae seq. I/Mche-A (AJ291582), M. chelonae seq. I/Mche-B (AJ291583), M. chelonae seq. II/Mche-C (AJ291584), M. chimaera (AJ548480), M. conspicuum (X92668), M. diernhoferi (AJ314877), M. farcinogenes (Y10384), M. flavescens seq. I/Mfla-A (AJ291586), M. fortuitum subsp. fortuitum seq. I/Mfo-A (AJ291587), M. fortuitum subsp. acetamidolyticum seq. I/Mfo-B (AJ291588), M. fortuitum seq. II/Mfo-C (AJ291589), M. fortuitum seq. III/Mfo-D (AJ291590), M. fortuitum seq. III/Mfo-E (AJ291591), M. fortuitum seq. IV/Mfo-F (AJ291592), M. fortuitum seq. IV/Mfo-G (AJ291593), M. gastri (AB026697), M. genavense (Y14183), M. gilvum (AJ314876), M. gordonae seq. I/Mgo-A (L42258), M. gordonae seq. II/Mgo-B (L42259), M. gordonae seq. III/Mgo-C (L42260), M. gordonae seq. IV/Mgo-D (L42261), "M. habana" (X74056), M. holsaticum (AJ310470), M. intracellulare seq. Min-A (AB026691), M. intracellulare seq. Min-B (Z46423), M. intracellulare seq. Min-C (Z46424), M. intracellulare seq. Min-D (Z46425), M. kansasii seq. I/Mka-A (AB026695), M. kansasii seq. II/Mka-B (L42263), M. kansasii seq. III/Mka-C (L42264), M. lentiflavum (AF317658), M. leprae (AL583920), M. malmoense (AB026696), "M. manitobense" (AY082001), M. marinum (AJ315572), M. microti (L26329), M. montefiorense (AF330038), M. palustre (AJ308603), M. parascrofulaceum (AY337279), M. peregrinum seq. Mpe-A (AY291594), M. peregrinum seq. Mpe-B (AJ291595), M. phlei (AJ291596), M. porcinum (AJ291598), M. saskatchewanense (AY208857), "M. savoniae" (AJ48836), M. scrofulaceum seq. Mscro-A (AB026702), M. senegalense (Y10385), M. shimoidei (AJ005005), M. simiae seq. I/Msi-A (AB026694), M. simiae seq. I/Msi-C (Y14187), M. simiae seq. I/Msi-D (Y14188), M. smegmatis seq. Msm-A (AJ291599), M. smegmatis seq. Msm-B (U07955), M. szulgai (X99220), M. tokaiense (AY642533), M. triviale (X99221), M. tuberculosis (L15623), M. ulcerans (X99217), M. vaccae (AJ291600), M. vanbaalenii (X84977), M. xenopi seq. I/Mxe-A (X14190), M. xenopi seq. II/Mxe-B (A14191), and M. xenopi seq. III/Mxe-C (X14192). The RIDOM sequences included the following: M. avium subsp. silvaticum seq. I/Mav-A, M. avium subsp. paratuberculosis seq. I/Mav-A, M. flavescens seq. I/Mfla-B, M. haemophilum, M. intracellulare seq. V/Mac-A, M. intracellulare seq. III/Mac-D, M. intracellulare seq. II/Mac-I, M. intracellulare seq. III/Mac-J, M. intracellulare seq. III/Mac-K, M. intracellulare seq. IV/Mac-L, M. kansasii seq. IV/Mka-D, M. kansasii seq. V/Mka-E, M. kansasii seq. VI/Mka-F, M. mucogenicum, M. novocastrense, M. peregrinum seq. Mpe-C, M. scrofulaceum seq. Mscro-B, M. septicum, M. simiae seq. II/Msi-E, M. terrae seq. I, M. terrae seq. II, M. terrae seq. III, and M. triplex.
Custom database design and validation. The MycoAlign system was implemented in a PostGreSOL relational database using JAVA programming language on a Linux platform. The sequence analysis program used an optimal algorithmic approach as described by Pevzner (26). The algorithm was modified to incorporate prioritized filtering criteria that determined the acceptability of a sequence for inclusion in the database. The filtering criteria for input sequences included not having more than two ambiguous bases (N) in the entire sequence, having no more than six continuous stretches of any of the four nucleotide bases (C, A, T, G), and the presence of the 5'-end and 3'-end recognition sequences. Input sequences were first analyzed for the end recognition patterns. These recognition patterns were 5'-CACCTCCTTTCT-3' as start sequence and 5'-GGGGTGTGG-3' as end sequence for the ITS-1 target analysis. If these recognition patterns were identified, then the two other validation parameters for limiting ambiguous bases and long stretches of nucleotides were evaluated. Sequences that passed the validation conditions were subsequently included in the customized database.
Unknown sequence evaluation. Unknown sequences submitted for analysis by the MycoAlign system were first validated by fulfillment of the validation parameters. Once the parameters were fulfilled, the sequence underwent alignment analysis against the sequences within the database. Results were given in terms of relative similarity for percentage of identity (RI%) to any of the known mycobacterial sequences in the customized database. The organism identification with the highest RI% value was used for comparison with other methods.
Comparison with 16S rRNA sequence identification. The taxonomic identification of a Mycobacterium species by the MycoAlign method was compared in tabular form with identification generated from 16S rRNA sequences. The submitted 16S rRNA sequence was evaluated using a nucleotide-nucleotide BLAST analysis against NCBI GenBank database sequences. The sequence was filtered for low complexity.
Discrepancy resolution. In instances when phenotypic culture identification of the clinical isolate did not match the MycoAlign sequence-based identification, the isolates were reevaluated by phenotypic tests and the sequences of both molecular targets were retested using a second independent database analysis tool from the University of Würzburg, Würzburg, Germany (Ribosomal Differentiation of Medical Microorganisms [RIDOM]) (available at www.RIDOM.de) (14-16).
| RESULTS |
|---|
|
|
|---|
Conventional versus sequence-dependent identification of referenced strains. A total of 18 of 22 (81.8%) referenced strains were correctly identified to the species level using the ITS-1 sequence as the target for sequence analysis (Table 1). The ITS-1 target was unable to differentiate between M. tuberculosis and M. bovis and between M. marinum and M. ulcerans. In comparison, only 14 of 22 (63.6%) referenced strains were correctly identified to the species level when the 16S rRNA sequence was used as a target. Sequence analysis using the 16S rRNA target was unable to differentiate between M. tuberculosis and M. bovis, between M. chelonae and M. abscessus, between M. kansasii and M. gastri, and between M. marinum and M. ulcerans.
Conventional versus sequence-dependent identification of clinical isolates. Of 159 phenotypically identified Mycobacterium species, the MycoAlign sequence-dependent identification tool using the ITS-1 target matched the conventional identification of 149 species (Table 2). All 10 discrepant results (comparing the ITS-1 to the conventional approach) were also discrepant using the GenBank 16S rRNA gene target. In seven cases, the identification between ITS-1 and 16S rRNA target matched, while in the other three cases, the identification did not match. In one other case, the 16S rRNA identified a phenotypically identified M. terrae as M. nonchromogenicum. The ITS-1 identification (M. terrae) was compatible with the phenotypic result.
|
|
| DISCUSSION |
|---|
|
|
|---|
One major challenge is the development and characterization of a validated database for sequence comparison analysis. Limitations of currently available public databases have been widely discussed and include the presence of sequencing errors and ambiguous bases within the database sequences as well as incorrectly identified sequences (3, 6, 40). These errors may not be evident to most users but can affect the accuracy of the search process; consequently, discrepancies and inconclusive results have been reported (25, 40). To overcome the limitations within public databases, several curated databases have been created, including the MicroSeq 500 database (Applied Biosystems, Foster City, Calif.) and the RIDOM database from the University of Würzburg, Würzburg, Germany (25, 36). These custom database identification systems have been shown to be useful; however, they do lack ITS-1 sequence information for the more unusual Mycobacterium species as well as for the recently described species (5, 40).
In the creation of the MycoAlign system, a series of steps was incorporated to overcome limitations of sequence validation in addition to incorporating an automated updating tool in the MycoAlign software program. This allowed for periodic automated searching of the GenBank database for the recently deposited sequences of newly established Mycobacterium species. New sequences identified for inclusion into the MycoAlign custom database were subjected to restricted validation conditions to ensure quality.
The MycoAlign assay was shown to be reliable for the identification of Mycobacterium species and illustrated the greater utility of choosing the ITS-1 hypervariable sequence target over the 16S rRNA sequence as a target for sequence-based differential identification. This was not a surprise, since earlier studies by Park and others had shown that mycobacterial strains could be identified only to the group or complex level using 16S rRNA sequence as a target but that they could be further differentiated to the species level using the ITS-1 target (12, 24, 36, 41). In the present study, the MycoAlign system was able to differentiate between M. abscessus and M. chelonae and between M. kansasii and M. gastri when the ITS-1 target was used. These isolates were not differentiated when using GenBank analysis and the 16S rRNA gene target. In addition, it was also possible to identify several mycobacterial isolates at the subspecies or strain level.
Although the MycoAlign system showed greater capability than other analysis systems based on 16S rRNA sequences alone, the algorithm was still unable to differentiate between some clinically relevant mycobacterial species. Most notable was the inability to differentiate M. tuberculosis from M. bovis and M. marinum from M. ulcerans. This finding was consistent with studies that showed these latter species or subspecies to be highly related, with identical sequences for both the 16S rRNA and ITS-1 region targets (10, 12, 29).
In the evaluation of identifications of clinical isolates by conventional culture methods versus the MycoAlign approach using the ITS-1 target, 10 discrepancies were noted whereas 11 discrepancies were noted using the 16S rRNA target. Retesting using expanded biochemicals and the RIDOM sequence analysis tool showed the MycoAlign identification was correct in seven cases. In two of the three remaining ITS-1 discrepant cases, the GenBank, RIDOM, and MycoAlign approaches were in agreement and conflicted with the culture identification. In the case of one isolate identified as M. xenopi by conventional culture methods, MycoAlign and RIDOM were in agreement but differed from the 16S rRNA sequence identification. Since conventional assays were designed primarily to detect and identify M. tuberculosis and other clinically important nontuberculous mycobacteria, it was not unexpected that this method failed to identify all the species correctly (17, 20). The increasing number of newly defined mycobacterial species and the "difficult-to-identify" variants of known species represent a significant challenge for conventional approaches (9).
During evaluation of the discrepant Mycobacterium species identification results between the conventional and the sequence-based test analysis, a difference in RI% of more than five units was noted when using the MycoAlign program for two samples (Table 3). In each case, the sequence identification using the 16S rRNA gene sequence and GenBank BLAST analysis, the ITS-1 region sequence and MycoAlign, and RIDOM system analysis gave the same identification for the isolates. In one case identified as "M. acapulcensis" by ITS-1 sequence analysis, the RI% value decreased from 78% in the original test to 64% in the repeat test, while in another case identified as M. scrofulaceum, the RI% decreased from 93% to 87%. Upon further evaluation of the MycoAlign algorithm, it was noted that minor changes within the sequence (base pair change or the addition of a no-read base) had a significant effect on the value. This "rigidity" of the database had originally been allowed for greater accuracy in species and strain identification. It appears that an RI% value < 98% suggests that the organism may be new to the database. In both of the discrepant cases, neither the "M. acapulcensis" nor the M. scrofulaceum isolate was identified with certainty using the ITS-1 region target, suggesting that these isolates may be unrecognized species with sequences not available in the MycoAlign database. Additional testing of phenotypically similar strains will help to resolve this issue. Additional testing of isolates has now shown the ability of the MycoAlign system to identify new species (23).
Contamination of clinical specimens by nonmycobacterial species is a known problem that not only reduces the accuracy of the diagnostic process but also extends the time required to separate to purity prior to retesting (7). This problem was addressed in the MycoAlign system through the use of Mycobacterium-specific primers in the amplification reaction. Whereas universal 16S rRNA primers generated products from mixed samples or samples containing only nonmycobacterial species, a product was generated with the ITS-1 primers only in those samples containing mycobacterial DNA. The value of this capability was clearly demonstrated by the identification of a mixed clinical isolate as Nocardia flavorosea by use of 16S rRNA sequence, whereas the ITS-1 region sequence analysis identified M. septicum as the Mycobacterium species present in the sample.
This study showed the MycoAlign identification system to be a reliable alternative to conventional phenotypic methods for the identification of Mycobacterium species. It also confirmed the superiority of the ITS-1 sequence over the 16S rRNA sequence as a target for sequence-based species identification. Additional evaluation of the computational software with the addition of new ITS-1 sequences as they become available and undergo validation will allow for increased discriminatory power of the MycoAlign system in the future.
Nucleotide sequence accession numbers. The following sequences have been deposited in the GenBank database (accession no.) as a result of this study: Mycobacterium nebraskense ATCC BAA-837 complete 16S rRNA gene (AY368456) and 16S-23S rRNA intergenic spacer region (AY368458).
| ACKNOWLEDGMENTS |
|---|
A.M.M. was supported by a research fellowship from the Egyptian government.
In the interest of full disclosure, the authors acknowledge that the commercial application of this approach is being explored.
| FOOTNOTES |
|---|
Present address: Department of Animal Medicine, Faculty of Veterinary Medicine, Assiut University, Assiut, Egypt. ![]()
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Antimicrob. Agents Chemother. | Clin. Microbiol. Rev. |
|---|---|
| Clin. Vaccine Immunol. | ALL ASM JOURNALS |
|---|