Previous Article | Next Article ![]()
Journal of Clinical Microbiology, March 2008, p. 1026-1036, Vol. 46, No. 3
0095-1137/08/$08.00+0 doi:10.1128/JCM.02027-07
Copyright © 2008, American Society for Microbiology. All Rights Reserved.
,
Department of Analytical Microbiology, Centre d'Etudes du Bouchet, BP3, F-91710 Vert-le-Petit, France,1 Université Paris-Sud, Institut de Génétique et Microbiologie, and CNRS, Orsay, F-91405, France2
Received 17 October 2007/ Returned for modification 17 December 2007/ Accepted 14 January 2008
|
|
|---|
|
|
|---|
Shigella sp. strains are currently classified into four species: S. dysenteriae, S. flexneri, S. boydii, and S. sonnei. Three of them are apparently antigenically heterogeneous, comprising several serotypes, whereas S. sonnei is antigenically homogeneous.
Shigella typing relies on phenotypic characteristics, but discrimination between the four species can be difficult. Serotyping is not always able to provide a correct species identification, due to cross-reactions or the absence of agglutination. New serotypes are regularly discovered (28, 49) and are sometimes found to cross-react with Escherichia coli strains.
In addition, the discriminatory power of phenotypic tools and serotyping is limited and requires the manipulation of the live agent. The introduction of DNA-based molecular typing methods, such as ribotyping (5), plasmid profile analysis, restriction fragment length polymorphism (25, 26), and pulsed-field gel electrophoresis (PFGE) (37), has greatly improved the ability of researchers to discriminate between epidemiologically related and unrelated isolates in outbreaks. In the United States, PulseNet, a network of laboratories implicated in food-borne disease surveillance (42), uses PFGE typing coupled with strict quality control procedures in order to ensure interlaboratory reproducibility, but this approach remains labor-intensive for routine clinical strain typing, so cheaper alternatives are actively being pursued. Multilocus sequence typing (MLST) is a very powerful approach, and it provides a clear view of the population structure (52). However, it is not yet appropriate for the routine, first-line genotyping of a large number of isolates. Recently, PulseNet members acknowledged that multilocus variable-number tandem-repeat (VNTR) analysis (MLVA) is a highly promising typing tool, likely to replace PFGE in the coming years (9). MLVA typing is being actively developed by a number of laboratories, together with the associated Internet-based query tools and databases (3, 6, 17, 18), to genotype several bacterial pathogens, in particular, potential biothreat agents, such as Bacillus anthracis (18), Yersinia pestis (18, 31), Legionella pneumophila (32, 33), Salmonella enterica (36), Brucella spp. (19), Francisella tularensis (11), Burkholderia mallei, and B. pseudomallei (43). MLVA is also increasingly recognized as a future reference technique for bacterial genotyping allowing systematic typing of all isolates for a number of other pathogens of high public health interest (including Mycobacterium tuberculosis [17], Pseudomonas aeruginosa [30, 47], Streptococcus pneumoniae [13], and Staphylococcus aureus [40]) (the technique is reviewed in references 21, 44, and 45).
The present work aimed to set up MLVA for the Shigella genus to facilitate epidemiological follow-up, by providing easy-to-use typing tools, in countries suffering from a recrudescence of Shigella outbreaks. Recently, Liang et al. (20) proposed an MLVA for molecular typing of S. sonnei and explored the capability of a 26-VNTR set for detecting clusters of infection. Because the proposed VNTRs have very short repeat units, this assay requires a high precision of DNA length measurement, as provided, for instance, by microcapillary electrophoresis and fluorescent markers. This is a strong limitation for routine typing because of the cost of such equipment and the associated consumables. In contrast, the present investigation favors markers with larger repeat units in order to provide an assay widely accessible to research and public health laboratories, particularly in developing countries.
|
|
|---|
|
View this table: [in a new window] |
TABLE 1. List of strains used in this study
|
Loci with repeat units longer than 9 bp were favored in order to facilitate allele repeat number calling by a variety of DNA amplicon length estimation devices (24) and as a complement to a previous investigation (20). Selected primers pairs were validated in silico with available genome sequence data for Shigella and E. coli strains by using the multiple-PCR-primer BLAST Web service (http://minisatellites.u-psud.fr/) which provides the expected size and sequence of the PCR products. The resulting in silico typing data were integrated into the MLVA.
Some VNTR loci of E. coli O157:H7 described by Keys et al. (12) are present in the Shigella genus genome sequences. Five of them (O157-11, O157-13, O157-25, O157-33, and O157-34), selected for their relative ease of use (repeat unit size, >6 bp) or high rate of polymorphism, were evaluated in this study.
VNTR amplification and genotyping. PCRs were performed as previously described (18) and using an annealing temperature of 60°C. DNA from the S. flexneri 2457T strain was used as an internal control to ensure high-quality size assignments as described previously (36). The PCR products were run on agarose gels, stained with ethidium bromide, visualized under UV, and photographed as previously described (19).
Sequence-based typing. To allow comparison between MLVA and MLST assays, a region of the thrC gene described by Pupo et al. that carries enough information to discriminate the collection in the clusters previously described (35) was selected. Primers were designed with Primer3 software (39) and led to a 351-bp-long amplicon. This region was amplified in all strains of the present collection, and PCR products were sent to MWG-BIOTECH AG (Ebersberg, Germany) for sequencing.
Data analysis. Gel electrophoresis images were analyzed by using the Bionumerics software package, version 5.0 (Applied-Maths, Sint-Martens-Latem, Belgium), as previously described (19). The number of repeats in each allele was deduced from the amplicon size. The resulting data were analyzed as a character data set with Bionumerics software. Clustering analysis was done by using the categorical parameter and the unweighted-pair group method with arithmetic averages (UPGMA) coefficient. The same weight was given to large and small numbers of differences in the repeats at each locus. The categorical parameter was also used to calculate the minimum spanning tree (MST). MST is a convenient complementary tool to cluster multiple isolates and visualize the relative diversity within different lineages. Polymorphism was quantified by the Hunter-Gaston diversity index (HGDI) (7). MLST sequences published by Pupo et al. (35) were imported into the Bionumerics program.
|
|
|---|
|
View this table: [in a new window] |
TABLE 2. List of TRs selected
|
![]() View larger version (29K): [in a new window] |
FIG. 1. Illustration of the MLVA assay setup. The PCR products of an amplification using ms06 primers were loaded on an agarose gel, electrophoresed, and stained with ethidium bromide. Lanes M show a 100-bp-ladder molecular weight marker. Lanes 1 and 7 correspond to the S. flexneri strain 2457T; lanes 2 to 6 and lanes 8 to 12 correspond to strains from our assay. The image illustrates how the number of units can be directly deduced by manual reading. The marker name below the gel provides the repeat unit size of the TR, the expected PCR product size in the S. flexneri 2457T genome, and the corresponding number of units in the S. flexneri 2457T genome.
|
|
View this table: [in a new window] |
TABLE 3. Diversity indexes calculated for MLVA panels and individual markers in the four Shigella species and E. colia
|
Diversity indexes differ within the genus Shigella and the species E. coli (Table 3). As expected, the least variable group of organisms is the S. sonnei genus, with an HGDI of 0.64 when the highly variable locus O157-11 is not included (MLVA14, comprising panels 1 and 2). Locus O157-11 alone has an HGDI of 0.94 for S. sonnei and leads to a general HGDI of 0.97 for this species. S. flexneri, S. boydii, and S. dysenteriae show much greater MLVA14 diversity, correlated with the higher number of serotypes (12, 15, and 10, respectively) in these species. Regarding E. coli, the limited number of strains included in the study does not allow comparison with the others.
By use of the MLVA14 assay (panels 1 and 2), the 89 strains (72 DNA samples tested and 17 sets of sequence data) were differentiated into 65 genotypes. When the O157-11 VNTR is used, the discriminatory power of the assay is significantly increased, with 83 genotypes numbered 1 to 83 in the dendrogram produced (Fig. 2). With the low relative weight given in the clustering analysis to this very highly variable marker, the two clusters are highly similar. All strains are identified by species acronym and genotype number (e.g., Sd#01 or Sf#70). Several strains sharing the same genotype will be additionally differentiated by a letter (e.g., a, b, etc.).
![]() View larger version (32K): [in a new window] |
FIG. 2. Dendrogram with all the typed strains, including the ones corresponding to sequenced strains that were included in the data set by their allele numbers as calculated theoretically from the expected amplicon sizes of all 15 loci based on their genome sequences. Clustering analysis was done using the categorical and UPGMA options. The columns indicate the genotype number, the genus, the species, the serotype, the strain identification number (collection), the cluster number as defined by Pupo et al., and the year and place of isolation.
|
To estimate the validity of the clustering observed by VNTR typing, selected strains were analyzed by sequence typing using published MLST data (35). Strains provided amplification products of the expected size, regarding the segment of the thrC gene. Sequences were imported and aligned in the Bionumerics program, and a cluster analysis was produced (see Table S2 in the supplemental material). A single discrepancy between clusters inferred from known genera and serotypes on one hand and sequence analysis on the other hand was observed due to the incongruence of phylogeny at the different loci. S. dysenteriae CIP 53.134 (D8 in reference 35) fell within cluster 3 when only the thrC sequence data were analyzed. It is the trpC-trpB sequence used in the MLST assay, in particular, which excludes that strain from cluster 3. Subsequently, all strains from the collection for which the thrC locus was investigated were characterized by their relevance to the clusters classified by Pupo et al., as indicated in Fig. 2.
This sequence analysis and comparison with published sequence data show that three of these clusters do correspond to the clusters classified by Pupo et al., and they are numbered accordingly in Fig. 2. Cluster 1 grouped S. boydii serotypes 1 to 4, 6, 8, 10, 14, and 18, S. dysenteriae serotypes 3 to 7, 9, and 11, and S. flexneri serotype 6. Cluster 1 was the most diverse, grouping representatives of all Shigella species except S. sonnei. This cluster could be divided into three subclusters, i.e., SC-1A, SC-1B, and SC-1C. Strains sharing the same serotype are usually discriminated by MLVA15 typing.
Cluster 2 comprises S. boydii serotypes 5, 7, 9, 11, 15, and 16, S. dysenteriae serotype 2, and all S. sonnei serotypes. The two S. boydii serotype 9 displayed the same MLVA15 pattern. The S. sonnei strains are located into two branches, one with the two representatives of S. boydii serotype 7. Three uropathogenic E. coli strains are included in cluster 2, together with an E. coli strain isolated from a patient at Hôpital d'Instruction des Armées, Bégin, France (Ec#47).
Cluster 3 groups all S. flexneri strains (with the exception of S. flexneri serotype 6, located in cluster 1), S. boydii serotype 12, and S. dysenteriae serotype 8 (Sd#56). Sd#56 is weakly associated with subcluster 3A together with S. flexneri serotypes 3, 3a, 5, and X. A second subcluster, tentatively called 3B (Fig. 2), comprises S. boydii serotype 12 and S. flexneri serotypes 1, 2, 2a, 2b, 4b, and Y. S. flexneri serotype 4c is more distantly related. Each strain has a unique genotype, although several serotypes were represented by more than one strain.
Consequently, and with the exception of Sd#56 (Sd8), the composition of the clusters is in remarkable agreement with the report of Pupo et al., although the underlying approaches are quite different (35).
We propose here to define a fourth cluster. Cluster 4 would include all the S. dysenteriae serotype 1 strains, five representatives of the E. coli K-12 strain (including two genomic sequences), and the two E. coli O157:H7 strains included in the study. The E. coli O157:H7 representatives are located in a quite distinct branch, slightly closer to the S. dysenteriae serotype 1 branch than to the E. coli K-12 branch. The eight S. dysenteriae serotype 1 strains are separated into six genotypes, showing the discriminative power of the VNTR panel. Among those strains, two were isolated from patients during an outbreak in a refugee camp in Rwanda in 1994. They show exactly the same MLVA15 pattern. Three of the E. coli K-12 strains were wild-type K-12 and two were derivatives (14). The three K-12 wild types share exactly the same genotype, while derivatives present different patterns at locus ms22, ms32, or O157-11. The O157:H7 strains exhibited two differences among them, at loci ms11 and O157-11.
In very rare instances, a given allele is strongly associated with a specific cluster, at least in the limited collection of strains investigated here. The three-repeat-unit allele at locus ms06 is observed only in cluster 2, and locus O157-33 has more than one repeat unit only in clusters 2 and 4 (associated with the E. coli strains investigated here).
ms25 and ms26 each gave rise to a very large amplicon in two strains and four strains, respectively, and these amplicons correspond to alleles with more than 60 repeat units (compared to the usual allele size ranges of one to three and two to four repeat units, respectively) (see Table S2 in the supplemental material). The six alleles were sequenced. The ms25 sequencing revealed that two different insertion sequences (ISs) were present, IS629 and IS2 for Sf#22 and Sd#26, respectively. IS629 is a member of the IS3 family of transposable elements. The ms26 allele sequencing led to the finding that only one IS, known as IS630, was present in this TR (27), and it was inserted at the same position and orientation in the four strains Sd#33, Sd#34, Sb#35, and Sb#36, confirming the close relationship between the four strains already suggested by MLVA15 clustering, in spite of a different species assignment.
As a complementary analysis, an MST analysis was performed. MST analysis is a convenient tool to cluster multiple isolates and illustrate the relative diversity within different lineages (Fig. 3). This kind of analysis is applicable to categorical data sets. The creation of hypothetical types further minimizes the summed distance of all branches of the tree. The MST was drawn without locus O157-11, which is too variable to address correct relatedness within different lineages, especially since this analysis does not currently allow the assignment of different relative weights to markers. In cluster 1, the three main subclusters are conserved, with only minor changes. The two S. dysenteriae serotype 7 strains, Sd#26 and Sd#27, are located in the vicinity of subcluster A but interspaced by Sb#04 and Sb#05. The composition of cluster 2, with S. sonnei and S. boydii serotype 7 in the same branch, is also found by this approach. Sd#56 is not included in cluster 3, which is in accordance with results obtained from other molecular methods and reflects the absence of amplification at several loci, which is taken into account in this analysis. The only change in cluster 4 is the location of E. coli O157:H7 representatives that are far from other members of the cluster and weakly linked to cluster 2.
![]() View larger version (39K): [in a new window] |
FIG. 3. Shigella population modeling. The number in each circle indicates the corresponding genotype identified in Fig. 2. The empty circle indicates a hypothetical genotype (not present in the population analyzed) created to minimize overall distances between neighboring genotypes. The distance between neighboring genotypes is expressed as the number of allelic changes and is outlined by different shapes of lines: a hatched bold line indicates one change, a full gray line indicates two changes, a black dotted line indicates three changes, and a gray dotted line indicates more than four changes. Gray hatched lines are used to separate subgroups inside clusters. The tree calculation was made without locus O157-11, owing the location of different genotypes in the same circle.
|
|
|
|---|
The purpose of the present study was to develop an MLVA assay for Shigella and evaluate if the resulting data would be in agreement with the complex concept of the Shigella-E. coli genomospecies, as revealed by MLST data. We investigated TR loci predicted to be polymorphic by comparing five Shigella genome sequences. The selection of 15 VNTRs allowed us to discriminate 89 strains, covering the large majority of known diversity in Shigella species and several E. coli strains (77 Shigella spp., 11 E. coli strains, and 1 S. enterica strain), into 83 genotypes. Strains sharing the same serotype are rarely from the same genotype, with difference ranging from one marker (Sd#1 and Sd#2, of S. dysenteriae serotype 5) to five markers (Sd#26 and Sd#27, of S. dysenteriae serotype 7). On the other hand, the two strains of S. dysenteriae serotype 1 isolated in Rwanda in 1994 during a dysentery outbreak in refugee camps are identical.
Through MLVA15 analysis, four main clusters were defined, together with some outliers. Among them, the strains of S. boydii serotype 13 were clearly distinguished from the other Shigella strains. Some cases lacking amplification, as well as unique sizes for several loci, illustrate the wide distance between S. boydii serotype 13 and other Shigella representatives. According to Hyma et al. (8), S. boydii serotype 13 is closely related to a different Escherichia species, E. albertii. It supports the hypothesis of an ancient separation of lineages, S. boydii serotype 13, as well as S. enterica LT2, being a clear outgroup. This finding also corroborated the results of MLST investigations by Pupo et al. (35) and Lan et al. (15).
In order to be able to compare our MLVA15 clustering data with published MLST data, part of the thrC gene was sequenced. In this way, the strains used in this study could be assigned to the different clusters defined by Pupo et al. (35), and it very clearly appears that MLVA15 clustering provides the same view of the Shigella population structure as MLST. Recently, Yang et al. (52) conducted another MLST analysis, based on 23 housekeeping genes. The greater number of genes analyzed allowed them to define three subclusters inside cluster 1. MLVA15 clustering reveals the same fine substructure. Such striking similarity demonstrates the potential usefulness of MLVA for Shigella typing. While the general patterns are highly similar, there are some differences between MLST and MLVA approaches. In the present study, the nine S. sonnei strains are assigned to cluster 2, close to S. boydii serotype 7. Yang et al. (52) placed the single S. sonnei investigated close to cluster 1 by MLST. The S. sonnei strain investigated by Escobar-Paramo et al. (4) was found to be closer to cluster 2 and cluster 3 by analysis of four other housekeeping genes. Pupo et al. (35) placed the single S. sonnei strain they analyzed far from the three clusters. Thus, the definitive location of S. sonnei strains remains undefined. Further MLST analysis of S. sonnei might eventually permit us to revisit the relative evolutionary position of this species.
Considering that Shigella clusters are now believed to have independently arisen several times from E. coli species while acquiring pathogenicity factors (16, 35, 52), the underrepresentation of E. coli investigated here led to the representation of small groups of E. coli strains arising from Shigella clusters, while, in fact, the situation would be the opposite. Such analysis should be done by including a Shigella collection in a larger E. coli collection to obtain a general overview of E. coli/Shigella sp. genomospecies (such work is in progress).
Combining UPGMA cluster analysis and MST provides an overview of Shigella intraspecies relationships. In agreement with previous investigations, we can infer from this combination that Shigella taxonomy is of little phylogenetic value. This is illustrated for instance by S. flexneri serotype 6 being associated with cluster 1 by MLVA or MLST, whereas the other S. flexneri serotypes are assigned to cluster 3 (together with the two serologically untyped S. flexneri strains). Four strains clustered together by MLVA share the same rare IS insertion event at the same position, while they are assigned to different species, reinforcing the fact that current classification does not reflect the genetic relatedness within the Shigella genus. It is now believed that Shigella organisms arose several times within E. coli species (35, 52), leading to three clusters not related to the currently recognized Shigella species taxonomy based upon biochemical and serological characteristics. The three main clusters are also identified here with a fourth corresponding essentially to S. dysenteriae serotype 1.
This study augments the results relative to discrimination power, specificity, and sensitivity of the MLVA approach to the Shigella species. Moreover, the MLVA15 panel shows some capabilities to be applied to E. coli for typing purposes, with a wider range of use than considered before in several published studies, mainly focused on Shiga toxin-producing E. coli (12, 23, 29). Another MLVA assay was validated for O157:H7 outbreak detection (12). It showed high discrimination between the E. coli O157 strains and appeared to have equal sensitivity to that of PFGE and specificity superior to that of PFGE. Lindstedt et al. (22) described a seven-TR panel designed to type the ECOR collection and pathogenic E. coli species. Several Shigella strains were included in the study and typed. However, all serotypes were not available, and in one case, the TR was absent, and in another, the TR was monomorphic. Our panel primarily intended to cover all Shigella species, and its satisfactory performance indicates also good capabilities to discriminate between nonpathogenic E. coli, uropathogenic E. coli, and enterohemorrhagic E. coli strains.
As suggested previously (45), the proposed combination of well-selected independent polymorphic TR loci is highly discriminatory and provides a relevant clustering compared with the currently accepted classification. The panel of 15 markers has been divided into three panels according to the polymorphism of each locus. It is important to keep in mind that the very high discriminatory power of some markers usually results in a very high homoplasy level. For this reason, such markers must be given a lower weight when similarity matrices are calculated. Here, the different weights proposed for each panel were empirically determined, but it is very likely that in future MLVA developments, special attention will be given to the optimization of these coefficients. For this to be done, much larger collections of strains, with detailed epidemiological data, will need to be typed first. It can only be hoped that in the future, funding bodies will support international consortiums of laboratories aimed at producing high-quality data of this kind. VNTR typing could then be widely accessible for research and public health laboratories, particularly in developing countries, where the majority of cases of Shigella occur, since this method is highly suitable for sharing results and for the generation of databases as previously demonstrated (http://mlva.u-psud.fr).
Work on the typing of collections of dangerous pathogens and the making of reference genotype databases was supported by Délégation Générale pour l'Armement (PEA02-36-01).
Published ahead of print on 23 January 2008. ![]()
Supplemental material for this article may be found at http://jcm.asm.org/. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»