Previous Article | Next Article ![]()
Journal of Clinical Microbiology, April 2003, p. 1363-1369, Vol. 41, No. 4
0095-1137/03/$08.00+0 DOI: 10.1128/JCM.41.4.1363-1369.2003
Copyright © 2003, American Society for Microbiology. All Rights Reserved.
Research Service,1 Clinical Microbiology Laboratory,2 Infectious Diseases Section, VA Medical Center West Los Angeles,3 Department of Medicine,4 Department of Microbiology, Immunology, and Molecular Genetics, University of California at Los Angeles School of Medicine, Los Angeles, California 900735
Received 30 September 2002/ Returned for modification 2 December 2002/ Accepted 7 January 2003
|
|
|---|
|
|
|---|
Early identification schemes of GPAC depended on the microscopic appearance, colonial morphology, and carbohydrate fermentation reactions (28). However, these tests proved to be of limited value for this group of bacteria that are often pleomorphic and usually asaccharolytic. Gas-liquid chromatography was introduced for the detection of volatile fatty acid end products of metabolism, but most GPAC produce a very limited range of volatile fatty acids. In the 1980s, Ezaki et al. (7) found that proteolytic enzyme profiles could distinguish clearly and reproducibly among recognized species of GPAC; this contributed to the development of several commercial preformed enzyme kits, such as RapID ANA (Innovative Diagnostic Systems, Atlanta, Ga.) and Rapid ID 32A (API-bioMerieux, Basingstoke, United Kingdom). These commercial kits represent a considerable advance in identification methods in terms of speed, simplicity, and discrimination (16, 23). However, these kits are designed to identify as wide a range of anaerobes as possible, and they contain many tests of little relevance for the identification of GPAC. Furthermore, databases accompanying the kits are often incomplete or inaccurate, especially with a rapid increase of newly described species. The challenge now is to develop a more reliable classification and identification scheme, so that most clinical strains can be allocated to clearly defined, phylogenetically valid species.
Genotypic identification is emerging as an alternative or complement to established phenotypic methods. The 16S rRNA gene is the most widely accepted gene used for bacterial classification and identification (35). Signature nucleotides of 16S rRNA genes allow classification even if a particular sequence has no match in the database, since otherwise unrecognizable isolates can be assigned to phylogenetic branches at the class, family, genus, or subgenus levels. This has contributed greatly to the discovery of new species of GPAC (8, 17, 21), and its variable regions have been used to design probes for detecting clinically significant GPAC (38, 39). Although the direct sequencing of amplified DNA from the 16S rRNA gene should allow unambiguous, definitive identification and provides information on the taxonomic relatedness of new species, its use for species identification based on public 16S rRNA databases is not without limitations (4, 25, 34). There are multiple problems with public database sequences, such as base errors, ambiguous base designations, and incomplete sequences, that may not be evident to users and will often lead to misidentification. Objective, clean 16S rRNA sequence data for GPAC are important to determine the relationship of clinically relevant GPAC species.
In the present study, we determined the nearly complete 16S rRNA gene sequence data (>1,400 bp) for 13 type strains of established GPAC species. The sequence data obtained were compared to those in public sequence repositories such as GenBank. Based on the sequence data of the reference strains obtained in the present study, we evaluated 16S rRNA sequencing identification of GPAC by reidentifying a collection of 156 clinical isolates of GPAC that had previously been identified by phenotypic tests and represented the most commonly isolated clinical GPAC species.
|
|
|---|
|
View this table: [in a new window] |
TABLE 1. List of type strains and clinical isolates of GPAC species used in this studya
|
16S rRNA gene sequencing. Genomic DNA was extracted and purified from cells in the mid-logarithmic-growth phase by using a QIAamp DNA Mini Kit (Qiagen, Inc., Chatsworth, Calif.). The 16S rRNA gene fragments were amplified as previously described (3). Briefly, two subregions of 16S rRNA gene were amplified by using two pairs of primers. The two subregions were defined as follows: region A was defined as 899-bp sequences between primer 8UA and 907B, and region B was defined as 711-bp sequences between primer 774A and 1485B. PCR was performed for 35 cycles of 30 s at 95°C, 30 s at 45°C, and 1 min at 72°C, with a final extension at 72°C for 5 min. PCR products were excised from a 1% agarose gel after electrophoresis and purified by using a QIAquick gel extraction kit (Qiagen) and were sequenced directly with a Biotech Diagnostics Big-Dye sequencing kit on an ABI 377 sequencer (Applied Biosystems, Foster City, Calif.).
Sequence data analysis. The sequencing data was analyzed as follows: (i) assembly of the reverse and forward sequences into a consensus sequence, (ii) editing of the consensus sequence to resolve discrepancies between the two strands by evaluation of the electropherograms, and (iii) comparison of the consensus sequences with GenBank sequences by using Ribosomal Database Project (RDP-II; Michigan State University, East Lansing) (18) and the basic local alignment search tool (BLAST) (1). Analysis of GPAC clinical isolates was performed by comparing the sequences against the sequences of type strains determined in the present study, as well as related sequences retrieved from GenBank. For an accurate determination of species similarities, all 5' and 3' ends were cut to identical positions along the gene, at Escherichia coli bp 49 to 1470. The newly determined sequences were aligned with their related sequences by using the program CLUSTAL W (13). The resulting multiple sequence alignment was corrected manually using the program GeneDoc (24). A phylogenetic tree was constructed by using the neighbor-joining algorithm PAUP 3.1.1 (D. L. Swofford, PAUP: phylogenetic analysis using parsimony [1993]).
|
|
|---|
98%) versus GenBank sequences because of abundant ambiguities, sequence gaps, and sequence errors (Table 2). For example, for the sequences of the type strains of Anaerococcus prevotii, Micromonas micros, and Peptoniphilus asaccharolyticus the best matches given by a BLAST search against GenBank were uncharacterized Peptostreptococcus sp. clone KL-59-7-12 (97.8%), Peptostreptococcus sp. oral clone FG 014 (99.6%), and Peptostreptococcus sp. strain S1 (97.9%), respectively, because the sequences of these type strains in the GenBank database were not of good quality. In the case of Peptoniphilus indolicus, the best sequence match given by both BLAST and RDP-II was a Peptostreptococcus sp. strain S1 instead of the GenBank sequence of Peptoniphilus indolicus type strain (GenBank accession no. D14147) and, indeed, the GenBank sequence of Peptoniphilus indolicus (D14147) was not even shown on the BLAST match list due to the low similarity between this two sequences. For the type strains of A. lactolyticus, A. tetradius, A. vaginalis, Finegoldia magna, and Peptoniphilus lacrimalis, although the best matches given by both BLAST and RDP-II were the sequences of their corresponding strains in GenBank, the sequence similarities between the sequences in the present study and those corresponding sequences in GenBank were very low (sequence similarities of
98% as determined by BLAST; similarity scores of
0.916 as determined by RDP-II). |
View this table: [in a new window] |
TABLE 2. Comparison of 16S rRNA sequences of GPAC species obtained in this study with the sequences in GenBank
|
Comparison between a conventional method and 16S rRNA sequencing for identification of GPAC clinical isolates. A total of 156 isolates, representing six clinically common GPAC species, were subjected to 16S rRNA sequence analysis. The breakdown of 156 clinical strains was as follows: 131 strains had a sequence with high similarity (>99%) to the type strains of an established species, 12 strains had a sequence similar to that of a Peptostreptococcus sp. oral clone in GenBank, and 13 strains had eight unique sequences that were distinct from any sequence of established species and from uncharacterized strains in the GenBank database. A comparison between sequence-based identification and phenotypic identification showed that 88 strains (56%) had concordant results between the original identification and the 16S ribosomal DNA sequencing identification, and the other 68 (44%) isolates had a molecular identification that was discordant with the original identification (Table 3). Two clinically significant species, F. magna (n = 36) and M. micros (n = 33), were consistently identified correctly by both phenotypic and genotypic methods. Totals of 56% (15 of 27) of Peptostreptococcus anaerobius, 5% (1 of 20) of A. prevotii, and 30% (3 of 10) of A. tetradius were also identified correctly by both methods. Among the 88 discrepant isolates, 19 strains of Peptoniphilus asaccharolyticus and 19 strains of A. prevotii that were originally identified by phenotypic testing were reidentified as Peptoniphilus hareii by sequencing. Six isolates of A. vaginalis were misidentified as A. tetradius based on phenotypic tests, twelve isolates that were first identified as Peptostreptococcus anaerobius were determined to be 99% similar to an uncharacterized oral clone CK 035 and only 98.0% similar to Peptostreptococcus anaerobius. 16S rRNA sequence analysis indicated that 25 of 88 strains that were misidentified to species level by phenotypic tests may be novel species or subspecies since they had low sequence similarities against both the GenBank database and the sequences determined in the present study (Table 3). These 25 strains had nine unique sequences; these were assigned a phylogenetic position by building a phylogenetic tree with their related sequences (Fig. 1).
|
View this table: [in a new window] |
TABLE 3. Comparison of 16S rRNA sequencing identification with phenotypic identification
|
![]() View larger version (36K): [in a new window] |
FIG. 1. Phylogenetic tree indicates the phylogenetic relationship of the nine novel species with their related established species, including one representative strain for each species. Sequences were determined in our laboratory unless indicated by a GenBank accession number. The tree was rooted by using E. coli as the outgroup sequence. Boldface type indicates possible novel species. The parenthetic percentage values indicate the 16S rRNA sequence similarities with corresponding species. Some of the genera in the phylogenetic tree are abbreviated as follows: P., Peptostreptococcus; F., Finegoldia; M., Micromonas; Pn., Peptoniphilus; C., Clostridium.
|
|
|
|---|
The type strain sequences in GenBank may provide misleading identification results for clinical isolates (as shown in Table 3). For example, 33 clinical isolates of M. micros that were identified by 16S rRNA sequencing based on comparison to the sequence of the M. micros type strain obtained in the present study (sequence similarity of >99%) were given the highest match with the oral clone FG 014 (sequence similarity of >99%) as determined by BLAST search against GenBank, despite the fact that their type strain sequences were present in the database. Similarly, 6 strains of A. vaginalis, 36 strains of F. magna, and 3 strains of A. tetradius that were identified to the species level by showing high sequence similarities (>99%) with their corresponding type strain sequences determined in the present study were only given low sequence similarities (<97%) to corresponding GenBank sequences determined by both BLAST and RDP-II. Although there is no an accepted cutoff value of 16S rRNA sequence similarity for species definition (12, 31), it is apparent from the results of studies of numerous diverse taxa that the majority of recognized species that have been examined to date differ in their 16S rRNA sequence from related species of the same genus in at least 1% of the sequence positionsand typically more. Our 16S rRNA sequence data showed that the recognized GPAC species within a given genus have up to 8.0% or greater average divergence. The average species-species pair showed 12.9 and 8.0% sequence divergence within the genera Peptoniphilus and Anaerococcus, respectively, whereas the most similar pair (Peptoniphilus asaccharolyticus and Peptoniphilus indolicus) of the genus Peptoniphilus exhibited 98.2% sequence similarity. This indicates clearly that the recognized species of GPAC are typically separated by good evolutionary distances. Although the species of the genera Peptoniphilus and Anaerococcus, as currently constituted, may have to be reallocated to more than these two genera, in the present study we used 99% similarity (BLAST similarity of
99% and RDP similarity score of
0.99) as a suitable cutoff for identification at the species level. If an isolate showed a genetic difference of
1.00 and
2.00%, we reported it as closely related to its best match; thus, the 12 isolates that had a 98.0% sequence similarity with Peptostreptococcus anaerobius might be the same species or a subspecies of Peptostreptococcus anaerobius. In cases in which the genetic difference is >2%, the isolate was reported as a unique isolate that may represent a novel taxon (Table 3). The reason we used only type strains was to eliminate any possible errors of species identification due to initial strain misidentification. We have been aware of the presence of sequences in GenBank that belong to strains of the same species sharing very low similarity (<85%) with each other and with their corresponding type strains. For example, the GenBank sequences of two Peptoniphilus asaccharolyticus strains, GIFU 7717 and GIFU 7946, had only 86.4 and 84.3% sequence similarities with the sequence of their type strain, respectively.
The purpose of performing sequence similarity searches with the 16S rRNA sequence of type strains of GPAC was to evaluate the accuracy of results obtained by programs such as BLAST and RDP-II. We found that even when the same sequence was compared against the same database (GenBank) by using different programs (BLAST and RDP-II), different similarity results were obtained (Table 2), resulting in the assignment of different identities. This is because the similarity scores obtained depend on the length of the sequences under analysis and on the number of gaps introduced in the query sequence to optimize the similarity. Also, BLAST searches against all available sequences, whereas RDP-II acquires only select GenBank sequences and incorporates them into their own database against which searches are made. In the present study, for accurate identification, we felt that all of the sequences included in the similarity search should be cut to the same length and, in addition to similarity search through internet available programs such as BLAST and RDP-II, the sequences should also be analyzed by multiple sequence alignment followed by manual correction.
The 16S rRNA sequencing method for identification of clinical isolates of GPAC based on accurate type strain sequences determined in the present study was evaluated by blindly reidentifying a collection of 156 GPAC isolates previously identified by phenotypic testing. The 16S rRNA sequencing identification proved to be more accurate than the phenotypic identification. We found this approach to be efficient in the majority of cases, with 92% (143 of 156) of isolates being identified to the species level, in contrast to 56.4% being identified by phenotypic identification, which is biased by errors and the variability of character expression. By 16S rRNA sequence analysis, even the 13 isolates that could not be identified to the species level could be assigned to a phylogenetic position (Fig. 1). Comparison between the 16S rRNA sequencing identification and phenotypic identification showed that two clinically significant species, F. magna (n = 36) and M. micros (n = 33), were consistently identified correctly by both methods. However, all Peptoniphilus harei isolates were misidentified as Peptoniphilus asaccharolyticus or A. prevotii by biochemical tests. Peptoniphilus harei was recently distinguished from Peptoniphilus asaccharolyticus by Murdoch and Mitchelmore (22); it resembles Peptoniphilus asaccharolyticus biochemically. Phenotypically, it is differentiated from Peptoniphilus asaccharolyticus by cell and colony morphology, which may be very subjective. Our results showed that sequence-based identification was better able to distinguish heterogeneous A. prevotii isolates (9, 17) than phenotypic identification. The observations of the present study indicate a greater clinical importance of Peptoniphilus harei than would be indicated by phenotypic tests. Although 16S rRNA sequencing provides the advantage of accuracy for GPAC identification compared to phenotypic identification, cost is a critical issue in the evaluation of 16S rRNA sequence-based analysis as a diagnostic tool. The initial cost of equipment can be recovered quickly with savings in personnel, time, and ultimately in health care costs. Furthermore, driven in part by the technology underlying the human and microbial genome projects, sequencing costs will probably continue their rapid trend downward, bringing this technology within the reach of many microbiology laboratories.
In summary, 16S rRNA sequencing proved to be an accurate identification method for GPAC species. It not only allows proper identification of isolates but also rapid recognition and classification of previously undescribed organisms. More efforts should be made to complete 16S rRNA databases with high-quality sequences.
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»