Previous Article | Next Article ![]()
Journal of Clinical Microbiology, November 2008, p. 3766-3771, Vol. 46, No. 11
0095-1137/08/$08.00+0 doi:10.1128/JCM.00213-08
Copyright © 2008, American Society for Microbiology. All Rights Reserved.

Department of Microbiology and Immunology, Haukeland University Hospital, Bergen, Norway,1 Section for Microbiology and Immunology, the Gade Institute, University of Bergen, Bergen, Norway,2 iSentio Ltd., Thormøhlensgate 51, Bergen, Norway3
Received 2 February 2008/ Returned for modification 4 April 2008/ Accepted 23 August 2008
|
|
|---|
|
|
|---|
One of the remaining problems for this approach is samples containing more than one bacterial species. For these samples, direct sequencing results in mixed chromatograms containing two or more fluorescent signals in positions where the 16S rRNA genes differ. The problem can be solved by separating the products from the first PCR by cloning or using gradient gel electrophoresis, but these methods are labor-intensive and not suitable for routine diagnostics.
We have therefore designed an algorithm that sorts out the ambiguous signals from mixed chromatograms in order to identify the different contributing bacteria. The algorithm was implemented in the RipSeq computer program (iSentio) and was successfully used to analyze sequence data from mixed bacterial suspensions.
|
|
|---|
Reading the chromatogram. Direct 16S rRNA gene sequencing of polymicrobial samples results in mixed chromatograms containing two or more fluorescent signals in positions where the 16S rRNA genes differ for the bacteria present in the sample (Fig. 1A). Correct reading of these chromatograms is complicated by two factors. (i) In a chromatogram, there will always be some noise, i.e., low-intensity signals originating from the baseline or from nonspecific primer binding. Including these in the base calling can decrease the specificity of subsequent analysis. To avoid them, we used a cutoff value on the y axis (see Results). (ii) The migration of a DNA fragment through the capillary gel in the sequencing reaction is dependent on its number of bases but also, to some degree, on its base composition. Therefore, a fragment of n bases from bacterium A will migrate with a different speed than a fragment of n bases from bacterium B. This results in a relative displacement of the fluorescence peaks in the corresponding sequence position (Fig. 1B). The reading algorithm had to be able to distinguish between a base displacement and a new position. Also, with large displacements, it can be difficult to decide whether a signal from bacterium A corresponds to the signal in position Y or in position Y + 1 from bacterium B. The degrees and relative directions of displacements fluctuate through the sequence. In monobacterial chromatograms, although the distance between any two successive peaks varies, the average peak distance, D (length of chromatogram/number of bases) is very stable (this is valid for chromatograms from the ABI 3730 and 3100 sequencers). We used this average peak distance in our mixed chromatograms and divided them into N (length of trimmed chromatogram/D) blocks with size D. All fluorescence peaks within a block were determined to belong to the same sequence position. In a mixed chromatogram, in positions where the different bacteria have identical bases, the peak position of the resulting fused fluorescent signal represents a compromise between the contributing signals. These peaks are referred to as anchor peaks (Fig. 1B). To optimize block positioning, the center of the first block was defined by the first anchor peak in the trimmed sequence, and base calling started at this point. Every time a new anchor peak was detected, the block positioning was readjusted accordingly (Fig. 1B).
![]() View larger version (37K): [in a new window] |
FIG. 1. (A) Example of a mixed chromatogram. The chromatogram was obtained by direct 16S rRNA gene sequencing of pus from a brain abscess containing Streptococcus intermedius and A. aphrophilus. In positions where the 16S rRNA genes present differ, double peaks appear (arrows). The base compositions of the respective 16S rRNA genes are given under the chromatogram. (B) Examples of anchor peaks and sequence blocks. The chromatogram was obtained by direct 16S rRNA gene sequencing of a bacterial mixture containing E. coli and C. gingivalis (2 reverse). Different base compositions of equal-length DNA fragments resulted in the relative displacement of signals in corresponding sequence positions (black arrows). To achieve correct positioning of bases in such areas, the chromatograms were divided into equal-size blocks ( ) based on the average peak distance in monobacterial chromatograms. All bases inside a block were defined to the same position. Block positioning was made dependent on anchor peaks (red arrows) representing the fusion of signals from the different bacteria in positions where these were identical. The base calling was started on the first anchor peak of the trimmed sequence. Every time a new anchor peak was detected, the block positioning was adjusted correspondingly. In this example, it is adjusted to the left (red circle). The 16S rRNA gene sequences for the respective genes are given under the chromatogram.
|
![]() View larger version (22K): [in a new window] |
FIG. 2. Search principle. The mixed query sequence was divided into pieces of size w. For each piece, all possible base combinations (words) were constructed and compared to the subject sequences. In this example, from the random piece n, it is possible to create eight different words. (A) An unrestricted matching procedure for these eight words against one of the sequences in the database. Three of them have a similarity above the cutoff with different parts of the subject sequence and are recognized as hits. Only word 5 (red circle) represents a relevant hit. Words 1 and 4 have by chance significant similarity to other parts of the subject sequence and contribute to reduced search specificity. (B) A restricted matching procedure. The area (w + 2x) on the subject sequence that corresponds to the area of piece n on the query sequence is estimated based on the hypothetical binding site for the forward primer (F-PBS) and the distance d from the start of the query sequence to the beginning of piece n. It is made slightly larger than the piece size (w) to secure maximum sensitivity. Matching of the words from piece n is now restricted to this specific area (window), and the nonspecific hits from words 1 and 4 are avoided.
|
Interpretation of results. The results were presented as a list sorted with the highest similarity percentage on top. To decide which ones and how many to include in the final answer, a cutoff value had to be established. In addition, the risk that bacteria similar to those actually present in the sample could reach the cutoff score by chance was addressed.
Bacterial strains and preparation of mixed bacterial samples. The following bacterial strains were used: Aggregatibacter aphrophilus (ATCC 7901), Bacillus subtilis (NCTC 6633), Bacteroides fragilis (ATCC 25285), Capnocytophaga gingivalis (ATCC 33624), Eikenella corrodens (clinical sample), Enterococcus faecalis (ATCC 29212), Eschericia coli (ATCC 25922), Haemophilus influenzae (ATCC 49247), Klebsiella pneumoniae (ATCC 13883), Moraxella catarrhalis (ATCC 8176), Morganella morganii (ATCC 25830), Proteus vulgaris (ATCC 6380), Pseudomonas aeruginosa (clinical sample), Staphylococcus aureus (ATCC 29213), Staphylococcus haemolyticus (laboratory strain), Stenotrophomonas maltophilia (ATCC 17666), Streptococcus agalactiae (ATCC 12386), and Streptococcus pneumoniae (ATCC 49619).
From the 18 different bacteria, suspensions of 2.0 McFarland standard were prepared in sterile water and mixed as described in Tables 1 and 2. The compositions of mixtures were mostly based on what is typically found in human clinical samples.
|
View this table: [in a new window] |
TABLE 1. Description of bacterial mixtures containing two bacteria and results from the RipSeq software analysisa
|
|
View this table: [in a new window] |
TABLE 2. Description of bacterial mixtures containing three bacteria and results from the RipSeq software analysisa
|
The remaining 19 samples were supplied with 300 µl of acid-washed glass beads (size,
106 µm; Sigma Aldrich), and lysis was performed by a 45-s run at a speed of 6.5 m/s in a FastPrep instrument (Qbiogene). This gave homogeneous signal intensities for all bacteria except S. pneumoniae, which still tended to give low peak signals.
After lysis, all tubes were centrifuged for 5 min at 13,000 rpm, and the supernatant was used directly as a template in the first PCR.
PCR. The following primers were used: forward primer, 5'-CGG-CCC-AGA-CTC-CTA-CGG-GAG-GCA-GCA-3'; reverse primer, 5'-GCG-TGG-ACT-ACC-AGG-GTA-TCT-AAT-CC-3'.
The resulting product had a size of approximately 460 bp, covering the variable areas V3 and V4 of the 16S rRNA gene (5).
The PCR products were electrophoresed through an agarose gel containing ethidium bromide, and bands were visualized by UV transillumination.
Sequencing. The PCR products were purified using the Qiaquick PCR purification kit (Qiagen) and sequenced using the ABI Prism Big-Dye sequencing kit and a 3730 DNA Analyzer (Applied Biosystems).
Identification. The resulting mixed chromatograms were first displayed using the Chromas Pro software (Technolysium Ltd.) to define left and right trimming and signal cutoff values. Subsequently, they were read and matched against a reference database using the RipSeq software (iSentio) and the algorithm described above. The database used contained 326 16S rRNA gene sequences representing 261 of the most common human pathogens, human-colonizing bacteria, and bacterial contaminants of human samples. The solution sequences were collected from GenBank and were mainly sequences of type strains.
The RipSeq program (iSentio) is available as a commercial Web service.
|
|
|---|
|
View this table: [in a new window] |
TABLE 3. Impact of word size and window size on test results for mixtures containing two different bacteria
|
The results were presented as a list with the highest-scoring subjects on top. The challenge was to decide which ones and how many to include in the answer. Close inspection of the chromatograms usually gave an indication of how many but could also be misleading. We therefore adopted an empirically set cutoff at
99.3% similarity. All subjects reaching this cutoff were included in the final answer (rule 1). If the answer included two bacteria from the same genus, both above the cutoff, only the higher-scoring species was chosen, although we could not exclude the possibility that both were present (rule 2). This rule was also applied to bacteria from different genera known to have very similar 16S rRNA genes, e.g., some Citrobacter/Enterobacter/Klebsiella spp. or E. corrodens/Kingella denitrificans. Based upon our experience with the method, the finding of two bacteria with similarity in the sequenced area of >95% should be interpreted with caution. Typically, they could be easily distinguishable if they were the only bacteria present in the mixture, but if they were together with a third bacterium, only the one with the higher score should be accepted.
To better illustrate the principles for interpretation, two examples are given. The interpretation of chromatogram 15 forward was simple. Only S. agalactiae and A. aphrophilus scored above the cutoff (according to rule 1), and they were not assumed to have similar 16S rRNA genes. The interpretation of chromatogram 13 forward was more complex. S. pneumoniae scored 100% and was included in the answer. Other streptococci also scored higher than 99.3% but lower than 100% and were therefore excluded (according to rule 2). K. pneumoniae, Klebsiella oxytoca, and Enterobacter aerogenes all scored above the cutoff. K. oxytoca scored lower than K. pneumoniae and was excluded (according to rule 2). A pairwise alignment was performed between K. pneumoniae and E. aerogenes. This showed 98.2% similarity, and consequently, only K. pneumoniae was included in the final answer (according to rule 2).
The detailed results for the mixed samples containing two bacteria using parameter combination C are presented in Table 1. An overview of the parameter combinations is given in Table 3. All 23 samples (combined interpretation of forward and reverse chromatograms) were correctly identified to the species level. If we look at the individual chromatograms, correct identification to the species level was achieved in 40 out of 46. Correct identification to the genus level was achieved in 46 out of 46. It was not possible to differentiate between E. coli and Shigella spp. This was not considered an error, as their sequences are identical with the primers used here. The results for the triple chromatograms (24-F to 28-F plus 24-R to 28-R) are presented in Table 2. These chromatograms were run with a piece size of 20, together with a window size of 20 + (2 x 2) (combination B from Table 3). If we look at each chromatogram individually, 5 out of 10 were correctly identified to the species level, whereas 9 out of 10 were correctly identified to the genus level. On the sample level, three out of five were identified to the species level and five out of five to the genus level.
To achieve the above-mentioned results, a single manual correction of the base calling was performed in chromatograms 24 reverse and 25 reverse. For the remaining 54 chromatograms, no corrections were necessary.
|
|
|---|
In our opinion, the most important feature of direct 16S rRNA gene sequencing is the possibility to analyze samples collected after the administration of antibiotics. Foci in internal organs are often difficult to discover in the first place, and once they are discovered, sample collection frequently requires complicated invasive procedures. Consequently, initiation of antibiotic treatment cannot await sample collection, and cultivation often yields a negative answer. In polymicrobial infections, cultivation can actually yield misleading answers if the antibiotics administered have affected the involved bacteria unequally, permitting some to grow and not others.
Because of the need to perform susceptibility testing, nucleic acid-based identification cannot replace cultivation. Cultivation is also more sensitive than PCR-based detection when samples contain viable bacteria. Large volumes of sample material can be cultivated in liquid media, with a theoretical sensitivity of 1 CFU/sample. In sequencing, the sensitivity is limited both by the method of DNA extraction and by the number of cycles that can be run in the first PCR before contaminant DNA in the reagents gives a false-positive result (4).
Some publications have expressed that primer cross-reactivity to eukaryotic DNA is a concern when bacterial 16S rRNA genes are amplified directly from a human clinical sample (7). This results in mixed chromatograms. The RipSeq algorithm can in part solve this problem with its ability to read mixed chromatograms and extract the information from the bacterial DNA. However, interference from human DNA decreases the differentiating power of the 16S rRNA gene and sometimes makes it impossible to distinguish between bacterial species with very similar 16S rRNA genes. Primers should therefore be selected carefully, and this was the main argument when we chose our primers. Although they do not target the most variable parts of the 16S rRNA gene, in our experience, they do not cross-react with human DNA.
When direct sequencing is used on polymicrobial samples, additional issues have to be addressed. The main challenge of bacterial identification on the basis of a mixed chromatogram from the 16S rRNA gene is the enormous number of possible combinations in relation to the relatively short variable segments upon which differentiation is dependent (1). This applies in particular to species with very similar 16S rRNA genes. If, for example, the 16S rRNA genes of bacteria A and B are different only in position P and bacterium A is mixed with bacterium C, which is identical to bacterium B in position P, the algorithm will fail to decide whether the mixture contains bacteria A and C or B and C (or A, B, and C). However, the chance for this to happen in both the forward and reverse chromatogram is small, because the alignments of the two sequences normally will be different. This is exemplified in Table 1 for samples 11, 13, 17, 18, 19, and 21. When three bacteria are mixed, the chance for this to happen is substantially higher, as exemplified by sample 24 in Table 2.
The difference in the sequenced area between S. pneumoniae and Streptococcus mitis/Streptococcus oralis/Streptococcus intermedius is only a single base. Reliable differentiation between these species based on the 16S rRNA gene alone is probably not possible, even from pure culture (9). The difference between S. pneumoniae and Streptococcus sanguinis is 6 bases. Staphylococcus capitis/Staphylococcus caprae/Staphylococcus epidermidis are identical in the sequenced area and differ from S. aureus in three positions. Staphylococcus warneri, Staphylococcus lugdunensis, and Staphylococcus intermedius differ from S. aureus in three, four, and six positions, respectively. The difference between B. subtilis and Bacillus licheniformis is 5 bases.
As expected, the smallest differentiating margins were seen between some species within the genus Staphylococcus, between some species in the genus Streptococcus, and between some species in the genus Enterococcus. In some situations, the inability to distinguish between different species within these genera may not be clinically problematic because bacteria with similar 16S rRNA genes frequently have similar susceptibilities to antibiotics and similar clinical importance. It becomes a problem, however, when it is not possible to distinguish between, e.g., S. aureus and a coagulase-negative staphylococcus or between S. pneumoniae and other members of the S. mitis group. In these situations, other genes can be sequenced in addition, such as sodA or rpoB (8-10). The limitation that lies in the variability of the 16S rRNA gene is the main reason for us to say that in chromatograms containing more than three different species, specificity will not be sufficient to obtain reliable answers on a routine basis. This needs to be taken into consideration when deciding which samples to test. We expect suitable samples to include aspirates from abscesses in internal organs, except those related directly to the intestine, as well as aspirates from deep cutaneous, subcutaneous, muscular, and retroperitoneal abscesses. Body fluids, like cerebrospinal fluid, synovial fluid, pleural fluid, and bile, may also be relevant. The often serious nature of infections in these foci, in combination with the resources put into collecting the specimens in the first place, can justify the extra cost of the sequencing procedure. Samples from open wounds or other areas that communicate directly with mucous membranes or skin are not likely to be suitable. Swabs, small-needle biopsy specimens, or very small amounts of aspirate are probably not suitable, since the dilution rate in the DNA extraction procedure will become very high. If a sample that contains more than three species is sequenced, two situations might occur. (i) The 16S rRNA genes of all bacteria present are successfully sequenced and expressed in the chromatogram. The resulting answer from the RipSeq analysis contains more than three species and should be rejected. (ii) Because of unequal concentrations of the different species, only three or fewer are detectable in the chromatogram, giving an acceptable but incomplete answer. We have found that this occurs if the difference in concentrations exceeds 1:10, with some variations dependent on the affinity of the primers for the different targets and the number of copies of the 16S rRNA gene in the respective bacteria (3). Differences in concentration are in theory not a problem in cultivation, but competition for nutrients, different growth speeds, similar-looking colonies, and swarming may camouflage the presence of some species.
The triple chromatograms were more complex than those containing only two bacteria, and consequently, there was a higher risk for erroneous base calling. To compensate for this, more relaxed search parameters could have been beneficial. However, the higher number of words that could possibly be derived from each piece increased the risk for random hits against nonrelevant sequences in the database. Consequently, the samples had to be run with increased stringency to achieve sufficient specificity. In a clinical sample, the number of bacteria is a priori unknown. We therefore suggest an approach in which the samples are first run with parameter combination C. If the resulting answer yields one or two bacteria, the answer is accepted. If the answer yields three or more bacteria, the chromatogram will have to be reanalyzed using the more stringent parameter combination B (Table 3) and only this answer accepted.
If the respective bacteria were present in unequal concentrations, or unevenly lysed, correspondingly large differences in signal intensities were seen, and the cutoff value had to be set lower to detect all the relevant peaks. A cutoff lower than 30 generally led to the inclusion of a high number of nonspecific signals, which was the situation for chromatograms 11 forward and 19 reverse. In addition, it made base calling more vulnerable to relative base displacements in corresponding positions by reducing the number of anchor peaks.
We have shown in this study that it is possible to analyze a mixed chromatogram containing up to three different bacteria. How well this will work on clinical samples and to what degree it will provide valuable information remain to be established in a study of patient samples. Preliminary results from our laboratory indicate that the method is a valuable supplement to culture, in particular when the patient has received antibiotics prior to specimen collection.
We gratefully acknowledge Harald G. Wiker and Lars Haar for critical comments and suggestions on the manuscript.
A patent application (B. Karlsen, Ø. Sæbø, and Ø. Kommedal, patent PCT/NO2007/00314) has been filed for several aspects of the algorithm described in this article. The RipSeq program will be accessible through a commercial Web service partly owned by the authors.
Published ahead of print on 3 September 2008. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2010 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»