Previous Article | Next Article ![]()
Journal of Clinical Microbiology, December 2004, p. 5757-5766, Vol. 42, No. 12
0095-1137/04/$08.00+0 DOI: 10.1128/JCM.42.12.5757-5766.2004
Copyright © 2004, American Society for Microbiology. All Rights Reserved.
Yong Zhu,1,
Hiroyuki Ogata,2 and
Didier Raoult1*
Unité des Rickettsies, IFR 48, CNRS UMR 6020, Faculté de Médecine, Université de la Méditerranée,1 Information Génomique et Structurale, CNRS UPR2589, Marseille, France2
Received 13 April 2004/ Returned for modification 28 June 2004/ Accepted 9 August 2004
|
|
|---|
|
|
|---|
Another drawback of the methods listed above is the empirical choice of target sequences. Over recent years, full-genome sequencing has been performed on an increasing number of bacteria. This has provided useful information for taxonomic, evolutionary, and phylogenic purposes, but although it is the most detailed form of genotyping, full-genome sequencing is clearly not adapted to strain typing. The availability of full-genome sequences, however, enables the rational selection of target sequences in molecular studies. We speculated that the areas of the genome containing the most-variable sequences in closely related bacterial species would also be the most variable among strains of the species and hence the most useful for differentiation of strains. In plants, it has been demonstrated that noncoding sequences were superior to genes for the phylogenetic and genotypic classification of species (7, 10, 17, 18, 56, 61). We speculated that noncoding sequences, which should not be subject to selection pressure, would be appropriate for typing bacterial strains.
To test our hypotheses, we compared the genome sequences of two closely related bacteria and attempted to develop a rational method for selecting sequences suitable for strain typing. We used Rickettsia conorii in our study for several reasons, including (i) the availability and colinearity of the R. conorii and Rickettsia prowazekii genome sequences, which facilitate comparisons of various types of sequences, (ii) the absence of effective phenotypic (43) or genotypic (47-49, 51) typing methods at the strain level, and (iii) the availability in our laboratory of a large collection of R. conorii isolates. Because our purpose was not a taxonomic study of rickettsiae closely related to R. conorii but rather the development of a genotyping tool at the strain level for R. conorii sensu stricto (type strain, Malish), we did not include in our study the Astrakhan fever rickettsia, Israeli spotted fever rickettsia, and Indian tick typhus rickettsia, whose taxonomic status is uncertain and which can easily be differentiated from each other and from R. conorii sensu stricto on the basis of specific nucleotide substitutions within the ompA, ompB, and sca4 nucleotide sequences (45, 48, 51).
|
|
|---|
|
View this table: [in a new window] |
TABLE 1. Sequences of primers, amplicon sizes, annealing temperatures, and accession numbers used in this study
|
![]() View larger version (10K): [in a new window] |
FIG. 1. Genome fragments selected for comparison of variability between R. prowazekii and R. conorii. Open boxes, ORFs present in both genomes; wavy lines, remnant in one Rickettsia species from a gene present in the other species; trisected box, gene split in R. conorii but not in R. prowazekii; solid lines between boxes, intergenic spacers exhibiting low interspecies variability; dashed lines between boxes, intergenic spacers with high interspecies variability.
|
![]() View larger version (23K): [in a new window] |
FIG. 2. Degree of intergenic spacer variability between R. conorii and R. prowazekii genomes as estimated by BLASTN scores. The most variable spacers exhibited low scores, whereas the most conserved spacers had high scores. White diamonds represent the spacers amplified and sequenced in our study that were not discriminatory among R. conorii strains; black diamonds represent the discriminatory spacers. Dashed horizontal line, BLASTN score of 75.
|
|
View this table: [in a new window] |
TABLE 2. R. conorii strains included in our study
|
PCR amplification and sequencing. The primers used to amplify the fragments described were obtained from Eurogentec (Seraing, Belgium) and are listed in Table 1. Their specificity was predicted by using the BLAST software (1). Genomic DNA was extracted from the rickettsial cultures by using the QIAamp tissue kit (QIAGEN, Hilden, Germany) according to the manufacturer's instructions. PCRs were carried out in a PTC-200 automated thermal cycler (MJ Research, Waltham, Mass.). Two microliters of the DNA preparation was amplified in a 50-µl reaction mixture containing 50 pM each primer, 200 µM (each) dATP, dCTP, dGTP, and dTTP (Invitrogen, Gaithersburg, Md.), 1 U of eLONGase polymerase (Invitrogen), 2 µl of eLONGase buffer A, and 8 µl of eLONGase buffer B. The following conditions were used for amplification: an initial 3 min of denaturation at 94°C was followed by 40 cycles of denaturation for 30 s at 94°C, annealing for 30 s at various temperatures given in Table 1, and extension for 1 min at 68°C. Amplification was completed by holding the reaction mixture for 3 min at 68°C to allow complete extension of the PCR products. PCR products were purified by using a QIAquick Spin PCR purification kit (QIAGEN) as described by the manufacturer. Sequencing reactions were carried out by using the dRhodamine Terminator cycle sequencing ready reaction kit with Amplitaq polymerase FS (Perkin-Elmer, Coignieres, France) as described by the manufacturer. For all PCR products, sequences from both DNA strands were determined twice. Sequencing products were resolved by using an ABI 3100 automated sequencer (Perkin-Elmer). Sequence analysis was performed by using the ABI Prism DNA sequencing analysis software package (version 3.0; Perkin-Elmer). Sterile water was used as a negative control in each assay.
Sequence analysis. In order to estimate "in silico" (by computer simulation) the relative variability of coding and intergenic sequences between the two genomes, we selected 102 pairs of putative orthologous intergenic sequences and 770 pairs of orthologous coding sequences that were reciprocal best hits with a significant BLASTN E-value of <0.001 and exhibited small size differences (<20%). Then we calculated the mean (± standard deviation [SD]) nucleotide sequence similarity for noncoding and coding sequences.
DNA sequences obtained in our study and those from the genome of R. conorii strain Malish (accession no. NC_003103) were aligned by using CLUSTALW software, version 1.81 (1). For split and remnant genes, for which the lengths of the sequences compared could be quite different, we considered consecutive gaps as a single mismatch. Percentages of similarity among sequences were determined by using the MEGA 2.1 software package (31). Phylogenetic relationships among R. conorii strains were inferred from both the intergenic spacer sequences and the multigene sequences by using the MEGA 2.1 software package (31). Distance matrices were determined under the assumptions of Kimura by using complete deletion analysis and were used to infer dendrograms by the unweighted pair group method with arithmetic means (UPGMA) available in the MEGA 2.1 software package (31).
Statistical analysis.
Student's t test was used to compare the nucleotide sequence similarity means of noncoding and coding sequences and of coding genes, split genes, remnant genes, and intergenic spacers. Multispacer typing (MST) and multigene typing were compared by using the
2 test. STATA software (version 7.0; Stata Corporation, College Station, Tex.) was used for statistical analysis.
Nucleotide sequence accession numbers. The sequences reported in this paper have been deposited in GenBank (accession no. AY515514 to AY515518, AY518488 to AY518510, AY345057 to AY345070, AY345072 to AY345076, AY345079 to AY345081, AY345083, AY345084, AY518297 to AY518300, AY521231, AY345100, AY345085 to AY345087, AY345089, AY465118, AY345091, AY345092, AY345097, AY345099, AY518302, AY518303, AY518472 to AY518487, AY428738 to AY428750, AY462116, and AY497559).
|
|
|---|
In the 38 strains of R. conorii for which we determined sequences in this study, the sequences of all 4 conserved coding genes, 23 split genes, and 5 remnant genes were identical to those of the genome of R. conorii strain Malish (Seven) (Table 1). Analysis of ompA sequences enabled us to identify three genotypes: one comprising 36 strains which had 100% similarity with the genome of R. conorii strain Malish (Seven), one containing R. conorii strain Moroccan, and one consisting of R. conorii strain M1 (Table 1). Overall, by using multigene typing with coding genes in R. conorii (16S rDNA, gltA, ompB, sca4, and ompA), we were able to classify the 39 strains of R. conorii we compared into three genotypes.
PCR amplification of the 52 intergenic spacers we studied in the 38 test strains of R. conorii yielded product sizes consistent with those of the genome of R. conorii strain Malish (Seven) (Table 1) except for the dksA-xerC spacer, for which the amplicon length ranged from 100 to 549 bp in different strains of R. conorii (Fig. 3). Only 4 spacers, all of which belonged to the 25 variable spacers (Fig. 2), had nucleotide differences in the R. conorii strains we studied: dksA-xerC, mppA-purC, rpmE-tRNAfMet, and tRNAGly-tRNATyr (Table 3). Seven nucleotide differences within the mppA-purC spacer enabled R. conorii strains to be classified into five genotypes (Table 3). By using differences at two positions in the rpmE-tRNAfMet spacer, strains could be classified into two genotypes (Table 3). The tRNAGly-tRNATyr spacer enabled the identification of two genotypes based on three nucleotide mutations. The dksA-xerC spacer was found to contain variable numbers of 63- to 102-bp repeat units designated R1 to R5, depending on the strain (Fig. 3 and 4). The percentage of nucleotide sequence similarity between repeats ranged from 50% between R1 and R4 to 98% between R1 and R2. Their G+C contents ranged from 11.8% for R5 to 17.5% for R3 (Fig. 3). Differences observed in the dksA-xerC spacers enabled us to classify the R. conorii strains we studied into 15 genotypes (Table 3). The number, type, and arrangement of repeat units for all strains tested are detailed in Fig. 4. The two batches of R. conorii strain Malish (Seven) exhibited identical spacer sequences. By combining the results obtained from analysis of the four variable spacers, we were able to identify 27 genotypes of R. conorii (Table 3). However, identification of these 27 genotypes was obtained by combining the results from the dksA-xerC, mppA-purC, and rpmE-tRNAfMet spacers only. The tRNAGly-tRNATyr-based typing did not provide any additional genotype. Therefore, this spacer was removed from the MST analysis. Twenty-three of the 27 genotypes contained only a single strain, while the remainder contained four strains (2 genotypes), three strains (2 genotypes), or two strains (1 genotype). MST identified a significantly greater number of genotypes than multigene typing (27 of 39 versus 3 of 39; P < 102).
|
View larger version (9K): [in a new window] |
FIG. 3. Repeated sequences making up the dksA-xerC intergenic spacers of R. conorii strains.
|
|
View this table: [in a new window] |
TABLE 3. Results of MST for R. conorii strains
|
![]() View larger version (56K): [in a new window] |
FIG. 4. Number and repartition of VNTRs in the dksA-xerC intergenic spacers of 39 R. conorii strains. *, strain Malish (Seven) used for genome sequencing; ¶, strain Malish (Seven) with 60 more culture passages.
|
When sequences from the dksA-xerC, mppA-purC, and rpmE-tRNAfMet variable spacers were concatenated, three clusters could be differentiated. One contained all 5 sub-Saharan African strains; another was composed of 16 strains, mostly from France; and the third cluster included 13 strains from various geographical locations, including strains Moroccan and M1 (Fig. 5). Five strains of R. conorii did not group within any of the clusters.
![]() View larger version (34K): [in a new window] |
FIG. 5. Unrooted tree showing the phylogenetic relationships among the 39 R. conorii strains studied, inferred from sequence analysis of the combination of the dksA-xerC, mppA-purC, and rpmE-tRNAfMet intergenic spacers using the UPGMA method. The geographic origins of strains are given either in their names or in parentheses. *, strain Malish (Seven) used for genome sequencing; ¶, strain Malish (Seven) with 60 more culture passages.
|
|
|
|---|
Prior to our work, there was no genotyping method described for rickettsiae at the strain level. The development of a typing method for rickettsial strains has become crucial with the classification of R. prowazekii as a potential agent of bioterrorism. We used a rational technique rather than an empirical strategy to search for the most suitable genome fragments for this purpose. Comparison of the R. conorii and R. prowazekii genomes, which exhibit a high degree of colinearity, enabled us to compare the interspecies variability of coding genes, degraded genes, conserved intergenic spacers, and variable spacers. By in silico analysis, we found that variable intergenic sequences were more variable than coding genes, degraded genes, and conserved spacers (P < 102 in all cases). It has been suggested that intergenic spacer sequences are an important source of genome plasticity because they do not undergo selection pressure (13). For the rickettsiae, it has been suggested that most of the intergenic sequences of R. prowazekii and R. conorii consist of decayed genes that are no longer active but have not yet been totally eliminated from the genome (42). To date, the intergenic spacer used most widely in work with bacteria has been the 16S-23S rDNA spacer. In many bacterial species (24, 25, 32, 36, 46, 55) this spacer has been shown to have great variability, not only in its sequence and length but also in the number of alleles per genome (33, 53). However, studying the 16S-23S rDNA spacer of rickettsiae is not possible, because the 16S rDNA gene is separated from the 23S and 5S rDNA genes, which are tightly linked together (5).
When we compared the overall variability of intergenic spacers and coding sequences between the genomes of R. prowazekii and R. conorii, we found that spacers were significantly more variable than genes (P < 102). We were surprised, however, to observe that there were both highly variable spacers and spacers that were as conserved as coding sequences. R. conorii and R. prowazekii are estimated to have diverged from a common ancestor 80 million years ago (42), and it would seem probable that sequences which were not under selection pressure during this period would have been more variable than coding sequences. For prokaryotes, variations in the conservation of spacers has been reported, but the role of conserved spacers is incompletely understood (39, 44). For eukaryotes, recent studies on comparative genomics have shown that in yeasts (29) and mammals (11) some intergenic spacers are highly conserved at the species level. Some of these spacers include regulatory motifs, but the function of many remains unidentified. The factors responsible for the heterogeneity of intergenic spacers in rickettsial genomes have yet to be determined. Nevertheless, our results emphasize the importance of comparing genomes in order to select variable sequences instead of targeting sequences presumed to be variable.
At the intraspecies level, we confirmed that the 25 variable spacers we studied were the best targets, with 4 being highly variable and a combination of 3 enabling us to identify 27 genotypes among the 39 strains of R. conorii we studied. As predicted by in silico analysis, conserved genes were highly conserved among the R. conorii strains in our experiments. Only one of the coding genes we studied exhibited interstrain variability. This was ompA, which encodes a high-molecular-weight, surface-exposed protein and is one of the most variable coding genes in the spotted-fever-group rickettsiae (16). The variability we found in the gene was limited, however, and enabled us to differentiate only three genotypes. ompA-based genotyping was not congruent with MST, which classified strain M1 in a genotype with other strains. However, because the phylogenetic analysis using MST sequences was congruent with the geographic origins of strains, MST may be more relevant than ompA for R. conorii strain typing. Even when all five coding genes were used in multigene typing, there was less variability than that found with MST (P < 102). One could expect that genes which have undergone degradation since the divergence of R. conorii and R. prowazekii would exhibit higher levels of sequence divergence than conserved genes (3, 4) and thus would be better candidates for strain typing. We found, however, that split genes and remnant genes in R. conorii were also highly conserved and were not suitable for genotyping at the strain level. These findings support our strategy of selecting intergenic sequences with the greatest interspecies variability as targets for strain typing.
One of the 34 human isolates studied, URRCFranceFEe48, was obtained from a patient with malignant boutonneuse fever from Marseilles. Although it is likely that host factors play a key role in the development of severe forms of Mediterranean spotted fever (12), the specific role of strain variation in R. conorii has not been evaluated. The demonstration that this strain was a specific genotype by MST highlights the usefulness of our genotyping method for isolates associated with particular clinical presentations. In addition, the phylogenetic classification inferred from all four spacers was consistent with the geographic distribution of strains (Fig. 5).
In order to estimate the effect of culture passage on MST variation, we compared the intergenic sequences of two batches of R. conorii strain Malish (Seven) with different passage histories. No difference in spacer sequence was found between the two batches. Therefore, MST may be valuable for tracing rickettsial isolates from a single source with a difference in culture history of at least 60 passages.
Among the four variable spacers, the dksA-xerC spacer was composed of 63- to 102-bp repeat units (Fig. 3). One salient feature of the R. conorii genome is the high density of repeated sequences (40). Six hundred fifty-six interspersed repeated sequences, named Rickettsia palindromic elements (RPE), have been identified in R. conorii and represent 3.2% of the genome (2, 41, 42). Such interspersed repeats are usually confined to the intergenic regions of bacterial genomes (59), but in R. conorii the repeated sequences are also present within protein-coding regions (41). The repeats we found in the dksA-xerC intergenic spacer were highly conserved and were present only in this locus. Repeats representing a single locus and showing interindividual length variability are designated VNTRs (variable number of tandem repeats) (59). Changes in the number of repeats in a given genetic locus are an important source of DNA variability in eukaryotes (58). Such markers are well-established molecular targets for pedigree analysis in humans (27) and have also been used for bacteria (28). The VNTRs we found within the dksA-xerC intergenic spacer of R. conorii had two peculiar features. First, they occurred within an intergenic spacer, whereas VNTRs usually appear to be mainly involved in implementing size variation in cell wall- or membrane-associated proteins. This may cause enhanced or diminished exposure of active protein domains on bacterial surfaces (59). In Rickettsia species, VNTRs are known to occur within the ompA gene (59), and their number and arrangement vary in Rickettsia species (19). Second, the G+C content of the dksA-xerC VNTRs was low in contrast with that of the GC-rich RPEs common in intergenic regions of spotted-fever-group rickettsiae (3). In view of the fact that the overall G+C content of Rickettsia species is low, the dksA-xerC VNTRs may be remnants of a decaying rickettsial gene and not imported elements.
Our study demonstrated that in silico identification of the most variable sequences between two closely related bacterial genomes enables the selection of target sequences for strain genotyping. For rickettsiae, we demonstrated that intergenic spacers, in particular those that showed the greatest variability between the genomes of two closely related species, are more suitable for strain genotyping than coding sequences and degraded genes. The combined use of variable spacer sequences, which we named multispacer typing, is significantly more discriminatory than multigene sequencing. The advantages of MST include high discrimination, reproducibility, simplicity of interpretation (because one technique is used rather than a combination of techniques), and ease of incorporation of the data generated into databases that are directly comparable and readily shared by laboratories via the Internet. This technique may be applied for tracking isolates obtained from a wide variety of sources, including isolates from a single strain with different passage histories, and may even be applied directly to clinical specimens. Moreover, this technique may be applicable to other bacteria, in particular those considered potential agents of bioterrorism.
P.-E.F. and Y.Z. contributed equally to this work. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»