ABSTRACT
Mycobacteria include a large number of pathogens. Identification to species level is important for diagnoses and treatments. Here, we report the development of a Web-accessible database of the hsp65 locus sequences (http://msis.mycobacteria.info) from 149 out of 150 Mycobacterium species/subspecies. This database can serve as a reference for identifying Mycobacterium species.
TEXT
Included among the mycobacteria are a large number of clinically important pathogens, both obligate (e.g., M. tuberculosis and M. leprae) and opportunistic (e.g., M. avium, M. kansasii, etc.). The impact of mycobacteria on human morbidity and mortality is hard to overstate. Although tuberculosis (TB) is arguably one of the most important infectious diseases in the world, the incidence of disease due to nontuberculous mycobacteria (NTM) has been steadily increasing worldwide (6, 27, 42, 48, 59) and has likely far surpassed TB in the United States (7). Thus, the American Thoracic Society and the Infectious Diseases Society of America have recommended identifying the clinically significant NTM to the species level upon the diagnosis of nontuberculous mycobacterial diseases (17).
The Mycobacterium genus currently includes 150 species/subspecies (http://www.bacterio.cict.fr/m/mycobacterium.html), and the number has been increasing exponentially (Fig. 1), making identification difficult and challenging. Current identification based on biochemical tests of culture is slow and inadequate to differentiate among closely related mycobacteria, especially for those mycobacteria that are biochemically inert and slowly growing. In contrast, molecular identification methods based on PCR and nucleotide sequencing dramatically shorten the detection time and improve the accuracy of identification. The most common genomic loci used in molecular identification are the 16S rRNA gene (25), 16S-23S rRNA internal transcribed spacer (15), hsp65 (40, 47), and rpoB (22). Almost all recent publications on new mycobacterial species compared sequences from multiple loci to those of established species, and hsp65 is always included. A previous study has shown a 99.1% agreement between identification using hsp65 sequencing and a conventional method combining Accuprobes, biochemical test panels, or 16S rRNA gene sequencing (33), suggesting that hsp65 sequencing is an effective method for identifying mycobacterial species. It was also suggested that completeness of the sequence database was critical for this identification method (33). To our knowledge, there is no publicly accessible hsp65 sequence database that covers all currently validated mycobacterial species. Laboratorians and researchers have to rely on the sequences deposited in public databases, such as GenBank, EMBL, and DDBJ. The vast majority of these mycobacterial entries are from uncharacterized strains or undetermined species, and problematic sequence entries, such as base errors, incomplete sequences, invalid or misidentified species, and even species-strain mismatches, are frequently present. This makes identification by searching public databases onerous and error prone. To facilitate the taxonomic identification of mycobacterial isolates, we have developed a Web-accessible database of mycobacterial hsp65 sequences from 149 species/subspecies, excluding the problematic GenBank entries mentioned above.
Numbers of approved Mycobacterium species/subspecies from 1896 to 2010.
The type strain hsp65 sequences of 147 mycobacterial species/subspecies were downloaded from GenBank. M. lepraemurium and M. leprae do not have type strains due to difficult cultivation. We chose M. lepraemurium TS130 and M. leprae TN as their reference strains and obtained their hsp65 sequences with the GenBank sequence accession numbers AY550232 (34) and NC_002677 (11). Multiple identical GenBank sequences from the type strain of the same species were combined into a single entry in our database (Table 1). We trimmed the sequences to 401 bp, which corresponds to nucleotide positions 165 to 565 of the M. tuberculosis H37Rv hsp65 gene and can be amplified and sequenced using primers Tb11 and Tb12 (47). Entries that did not completely cover this 401-bp hsp65 locus were excluded from our database. Using previously described methods (13), we determined and verified 47 hsp65 sequences and submitted them to GenBank (bold accession numbers in Table 1). As a result of our verification, the incorrect sequences from M. asiaticum, M. flavescens, M. intracellulare, M. porcinum, M. senegalense, M. septicum, and M. szulgai in GenBank were excluded from our database, and only verified sequences of these species were adopted. Sequence similarities were determined by MEGA 5.02 (http://www.megasoftware.net/). Currently, there are a total of 143 unique sequences from 149 species/subspecies in this database. Identical sequences are found among the three M. avium subspecies, between two M. fortuitum subspecies, and among four M. tuberculosis complex members (i.e., M. bovis, M. caprae, M. microti, and M. tuberculosis). A BLAST server based on this database has been developed (http://msis.mycobacteria.info). Query sequences are accepted in FASTA format by copying and pasting or file uploading and then searched against the database using NCBI's BLASTN program. The output results will show 20 best hits to suggest their taxonomic categories at species level. The pairwise alignments and the percentages of identities are shown in the results as well. PhyML 3.0 (18) was used to generated a maximum-likelihood phylogeny of these 149 Mycobacterium species/subspecies (Fig. 2) that is the most complete so far, covering 99.3% of the Mycobacterium genus (the only missing species is M. pinnipedii, an M. tuberculosis complex member). Slowly growing mycobacteria (SGM) and rapidly growing mycobacteria (RGM) are clearly separated, except that the slowly growing M. tusciae, M. hiberniae, M. nonchromogenicum, and M. triviale are grouped with the RGM.
List of the Mycobacterium species/subspecies, the reference strains (all are type strains except M. leprae TN and M. lepraemurium TS130), and the GenBank sequence accession numbers for their hsp65 sequences from published studies used to generate the database
Maximum-likelihood phylogeny of Mycobacterium genus. PhyML 3.0 with default settings (18) and iTOL (30) were used to generate the circular phylogenic tree rooted with Nocardia farcinica (strain IFM 10152). Rapidly growing mycobacteria are in blue, while slowly growing mycobacteria are in red. The scale bar is equivalent to 0.02 substitution/site.
The popularity of species identification using hsp65 sequences has resulted in a large number of mycobacterial hsp65 sequences being deposited in public repositories. McNabb et al. also developed an in-house database, including 111 Mycobacterium species (34). The accuracy and coverage are crucial for the database to become a viable solution for species identification. Here, we report a publicly accessible hsp65 database with 99.3% coverage of the entire Mycobacterium genus, of which 47 species/subspecies have been verified in our laboratory (boldface in Table 1) and 40 entries are supported by multiple GenBank sequences and thus are considered confirmed sequences (underlined in Table 1). A 97% identity was previously suggested as a criterion for identifying a species using hsp65 sequences (34). However, pairwise comparison of these 149 species identified 219, 121, and 45 instances of sequence similarity greater than 97%, 98%, and 99%, involving 94 (63.1%), 82 (55.0%), and 42 (28.2%) species/subspecies, respectively (see the table in the supplemental material). This makes the identification of species with less than 100% matches challenging. Because the interspecies similarities vary from group to group in the phylogeny and change as the number of species/subspecies increases, investigators need to be cautious when assigning species/subspecies for these isolates. Further research is needed to establish a more reliable criterion and validate our database with clinical and environmental isolates. Nevertheless, our database provides the most comprehensive phylogenetic information on the mycobacterial hsp65 locus that can facilitate species identification in this genus.
ACKNOWLEDGMENTS
We thank Jack Crawford for thoughtful suggestions and helpful comments.
FOOTNOTES
- Received 22 December 2010.
- Returned for modification 26 January 2011.
- Accepted 18 March 2011.
- Accepted manuscript posted online 30 March 2011.
↵† Supplemental material for this article may be found at http://dx.doi.org/10.1128/JCM.02602-10.
- Copyright © 2011, American Society for Microbiology