Previous Article | Next Article ![]()
Journal of Clinical Microbiology, August 2005, p. 3688-3698, Vol. 43, No. 8
0095-1137/05/$08.00+0 doi:10.1128/JCM.43.8.3688-3698.2005
Copyright © 2005, American Society for Microbiology. All Rights Reserved.
Department of Food Science,1 Department of Population Medicine and Diagnostic Sciences, Cornell University, Ithaca, New York 148532
Received 16 September 2004/ Accepted 17 April 2005
|
|
|---|
|
|
|---|
While serotyping has been widely used to differentiate Salmonella subtypes, this method has limited discriminatory power and does not reveal the genetic relationships of strains within the same or different serotypes (32). More-discriminatory methods for subtyping of Salmonella isolates include phage typing (8) as well as pulsed-field gel electrophoresis (PFGE) (1). Multilocus enzyme electrophoresis (MLEE) has also been used successfully to subtype Salmonella isolates and to study the evolution and population genetics of various Salmonella serotypes (2, 28, 31, 32). However, MLEE is technically difficult and hard to standardize between laboratories and thus does not represent a subtyping method suitable for routine surveillance (34). The advent of automated DNA sequencing technology has led to the development and implementation of DNA sequence-based subtyping techniques, such as multilocus sequence typing (MLST). MLST is based on the concepts of MLEE except that allelic types are determined from nucleotide sequences of housekeeping genes rather than by the electrophoretic mobilities of the enzymes they encode (19). One key advantage of MLST over MLEE and other banding-pattern-based subtyping techniques is that the sequence data generated are nonambiguous and can be readily compared between laboratories, thus facilitating global, large-scale surveillance (19, 36). MLST methods have been used to subtype and explore the evolutionary relationships of a variety of bacterial pathogens, including Campylobacter jejuni, Vibrio cholerae, Listeria monocytogenes, Streptococcus agalactiae, and Salmonella enterica (9, 15-17, 30).
The changing epidemiology of Salmonella infections (21) and the emergence of new Salmonella strains (e.g., multidrug resistant Salmonella serotype Typhimurium DT 104 and multidrug-resistant Salmonella enterica serotype Newport) make it imperative to develop new Salmonella subtyping methods that not only allow for sensitive subtype discrimination but also provide data that can be used for evolutionary analyses of Salmonella. In addition, molecular subtyping methods for Salmonella should also allow for serotype prediction, thus obviating the need for maintenance of specialized serotype reagents for Salmonella. Thus, our goal was to develop an MLST scheme for Salmonella enterica serotypes that (i) provides sensitive subtype discrimination, (ii) reliably predicts Salmonella serotypes, and (iii) provides data that can be used for evolutionary analyses.
|
|
|---|
An additional 41 Salmonella isolates were added to our initial collection of 25 isolates to yield a collection of 66 Salmonella isolates (supplemental Table S2, available at http://www.foodscience.cornell.edu/wiedmann/Sukhnanand%20Supplementary.txt) representing a greater serotype diversity. We specifically used 2001 Public Health Laboratory Information System (PHLIS) surveillance data (6) to identify the five Salmonella serotypes most commonly isolated from human clinical cases, animal clinical cases, and environmental sources. The serotypes represented (in addition to the serotypes represented among the initial 25 isolates) included serotypes Montevideo, Newport, Kentucky, Enteritidis, Dublin, Senftenberg, and Javiana. Bovine and avian isolates representing most of these serotypes were obtained from the Cornell University Animal Health Diagnostic Laboratory Salmonella strain collection. One human Salmonella serotype Senftenberg isolate and six human Salmonella serotype Javiana isolates were obtained from the New York State Department of Health. A three-gene sequence-typing scheme was used to characterize these additional 41 isolates.
Salmonella serotyping of animal isolates was performed at the National Veterinary Services Laboratories (USDA-APHIS-VS, Ames, IA); serotyping of human isolates was performed at the New York State Department of Health.
Gene selection and primer design. A total of seven genes located around the Salmonella chromosome were chosen as targets for the initial development of a seven-gene MLST scheme (Table 1). Primers were obtained from the published literature (3, 17) or designed based on published sequences available in GenBank (Table 2). Primer design was performed using the PrimerSelect software program (DNAStar, Madison, WI). Primers designed for panB, fimA, icdA, spaN, and aceK amplified the complete coding domain sequence, while the published primers for manB and mdh amplified only parts of the respective open reading frames (65.1 and 90.5%, respectively).
|
View this table: [in a new window] |
TABLE 1. MLST target genes and their characteristics
|
|
View this table: [in a new window] |
TABLE 2. Primer sequences
|
DNA sequencing. Salmonella lysates for PCR amplification were prepared as described by Furrer et al. (10). All PCR amplifications were performed with either Thermus aquaticus (Taq) DNA polymerase (Promega, Madison, WI) or AmpliTaq Gold (Applied Biosystems, Foster City, CA). PCR conditions used for the initial set of 25 isolates are listed in Table 3. PCR conditions for the additional 41 isolates were optimized based on the conditions listed in Table 3; for example, annealing temperatures had to be reduced for some serotypes in order to achieve PCR amplification.
|
View this table: [in a new window] |
TABLE 3. PCR conditions
|
If the PCR product yield was insufficient for sequencing, PCR products were cloned into pCR 2.1-TOPO using the TOPO TA cloning kit (Invitrogen, Carlsbad, CA). Plasmids were purified using the QIAquick Plasmid Prep kit (QIAGEN, Inc., Chatsworth, CA), and the presence of inserts was confirmed by EcoRI restriction digestion. Plasmid DNA was quantified with the NanoDrop (Rockland, DE) ND-1000 spectrophotometer, and plasmids were sequenced using M13 forward and reverse primers. Three plasmid inserts were sequenced for each isolate and gene in order to allow for correction of sequence errors due to nucleotide misincorporation during PCR amplification. This cloning-and-sequencing approach had to be used to generate DNA sequences for manB for eight isolates (FSL S5-267, FSL S5-273, FSL S5-280, FSL S5-358, FSL S5-364, FSL S5-365, FSL S5-366, and FSL S5-367) and for mdh for two isolates (FSL S5-273 and FSL S5-291).
MLST typing. While full MLST targeting seven different genes (19, 36) was performed on an initial set of 25 isolates, a more economical three-gene sequence typing scheme was developed and applied to our collection of 66 Salmonella isolates. Allele assignments for individual genes were determined using alignments of the complete coding sequence or available coding region for a given gene. Different alleles were assigned to sequences that differed by at least one nucleotide. Sequence types (STs) were determined from a concatenated code of allele assignments for individual genes.
Evolutionary analyses. Descriptive evolutionary statistics, such as G+C content and percentage of polymorphism, were calculated using the DNASTAR software package or DnaSP (version 3.99) (29). The dN/dS (number of nonsynonymous substitutions per nonsynonymous site/number of nonsynonymous substitutions per synonymous site) ratios for each gene were calculated using the Molecular Evolution Genetics Analysis software program (MEGA, version 2.1) (18). Genes with dN/dS ratios greater than 0.1 were tested for positive selection using the Phylogenetic Analysis by Maximum Likelihood software program (PAML, version 3.13) (37). The RETICULATE software program (13) was used to test for evidence of reticulate evolution by using separate alignments of the concatenated sequences for both the seven-gene MLST (25 isolates) and the three-gene sequence-typing scheme (66 isolates).
To determine the appropriate model of evolution for each gene, we first generated likelihood ratio scores for each gene in PAUP*, version 4.0b10 (Sinauer Associates, Sunderland, MA) using modelblock3 (obtained from http://workshop.molecularevolution.org/software/modeltest/modelblock3.php). Likelihood ratio scores were then exported into the MODELTEST software program (26) to determine the correct model of evolution using a hierarchical likelihood ratio test. Once the correct model of evolution was determined, maximum-likelihood phylogenetic trees were built with and without a molecular clock imposed in PAUP*. Likelihood scores generated from these trees were used to conduct a likelihood ratio test to determine whether the nucleotide substitutions observed for a given gene followed a molecular clock. Phylogenetic trees were built using both the maximum-likelihood and Bayesian methods. Maximum-likelihood trees were constructed in PAUP* using 100 bootstrap replicates. Bayesian phylogenetic trees were built using MRBAYES (version 3.0) (12). For each tree, Markov chains were run at least three independent times to determine the proper "burn-in" time. Posterior probabilities, which represent the probability that a specific node is observed, were recorded. All phylogenetic trees were rooted using sequences from an Escherichia coli O157:H7 strain (GenBank accession no. NC002695) (11), which served as an outgroup. Phylogenetic trees generated from either MRBAYES or PAUP* were viewed using TreeView, version 32 (22). In all trees, the branch length of the outgroup was collapsed by an order of 10 or 100 so as to best view the topology of the tree.
World Wide Web-based data access. Detailed isolate source information, as well as all sequence data and allele assignments from this study, are accessible through the PathogenTracker, version 2.0, website (http://www.pathogentracker.net).
|
|
|---|
|
View this table: [in a new window] |
TABLE 4. Allelic profiles and MLST types for the initial set of 25 Salmonella isolates
|
|
View this table: [in a new window] |
TABLE 5. Genetic characteristics of MLST genesa
|
|
View this table: [in a new window] |
TABLE 6. Allelic profiles and STs for the set of 66 Salmonella isolates
|
![]() View larger version (24K): [in a new window] |
FIG. 1. Phylogenetic analysis of manB alleles. The phylogenetic tree was built on the maximum-likelihood framework using an alignment of manB sequences representing each manB allelic type found among the 66 Salmonella isolates sequenced, with the exception of the manB1 and manB2 sequences, which include those for all three Salmonella serotype Montevideo isolates that showed an manB gene duplication. Bootstrap values of >50.0 are given at the nodes above the branches. The scale bar indicates relative sequence distance. Allele assignments, followed by the number of isolates within each allelic type (in parentheses), and the serotype associated with each allelic type are also shown.
|
Three-gene sequence-typing data for our set of 66 isolates further confirmed that STs were unique for all serotypes other than the one serotype Typhimurium var. Copenhagen isolate discussed above. Within-serotype differentiation was observed for 9 out of 12 serotypes analyzed (Table 6); serotypes Typhimurium, Dublin, Javiana, and Newport each contained three STs.
Evolutionary characteristics of MLST genes. Models of evolution determined for each gene are listed in Table 7 along with results from molecular clock tests. More than half of our genes follow the Hasegawa-Kishino-Yano (HKY) model. This model allows the frequencies of each nucleotide to differ and allows transitions and transversions to have different substitution rates (23). fimA, aceK, and icdA follow variants of the Kimura models (K80, K81), which are constrained versions of the HKY model, i.e., nucleotide frequencies are assumed to be equal (23). Most genes also follow a molecular clock, which implies that the underlying mutation rates for these genes are constant (23).
|
View this table: [in a new window] |
TABLE 7. Evolutionary characteristics of MLST genes based on the initial set of 25 Salmonella isolates
|
Evolutionary relationships among Salmonella serotypes. A concatenated alignment of all seven MLST genes sequenced for 25 isolates was used to test for evidence of reticulate evolution (i.e., recombination and/or repeated mutation) using RETICULATE (Fig. 2). The overall compatibility was 0.99, with a neighborhood similarity score of 0.98; the neighborhood similarity score was significantly higher than that for a randomized matrix, indicating that the overall pattern of compatibility and incompatibility between sites is not random and that the order of sites along the nucleotide alignment has increased clustering of compatible and incompatible sites. Reticulate analysis was also performed for a concatenated alignment of the fimA, mdh, and manB sequences obtained for all 66 isolates (Fig. 3). The overall compatibility was 0.77, with a neighborhood similarity score of 0.72; this neighborhood similarity score was also significantly higher than that for the corresponding randomized matrix. Based on these data, we concluded that only a limited number of incompatible sites was present between and within genes, thus allowing for construction of meaningful phylogenetic trees based on concatenated sequence data.
![]() View larger version (13K): [in a new window] |
FIG. 2. Compatibility matrix for all seven MLST genes. The matrix was constructed using RETICULATE and a concatenated alignment of the seven genes (shown on the plot) for 25 Salmonella isolates. The figure is a plot of all pairwise comparisons of 152 informative sites that are phylogenetically compatible (white squares) or incompatible (black squares). Intragenic comparisons are marked within the triangles, and the number of informative sites for each gene is given in parentheses after the gene name.
|
![]() View larger version (19K): [in a new window] |
FIG. 3. Compatibility matrix for fimA, manB, and mdh. The matrix was constructed using RETICULATE and a concatenated alignment of fimA, manB, and mdh obtained for 66 Salmonella isolates. The figure is a plot of all pairwise comparisons of 69 informative sites that are phylogenetically compatible (white squares) or incompatible (black squares). Intragenic comparisons are marked within triangles, while intergenic comparisons are marked in rectangles. The number of informative sites for each gene is given in parentheses after the gene name.
|
![]() View larger version (23K): [in a new window] |
FIG. 4. Phylogenetic tree of 25 Salmonella isolates based on a concatenated alignment of the seven MLST genes (fimA, mdh, manB, spaN, icdA, panB, aceK) sequenced. The phylogenetic tree was built using the maximum-likelihood method. The branch length of the outgroup was collapsed so as to best view the topology of the tree. Bootstrap values of >50.0 are given at the nodes above the branches. Posterior probabilities of >0.50 are given at the nodes below the branches. The scale bar indicates relative sequence distance. Sequence type assignments, followed by the serotypes for the isolates within each sequence type, are given for all isolates. No sequence data for icdA and panB were available for the five Salmonella serotype Schwarzengrund isolates (since PCR primers did not yield amplification products), and appropriate gaps were introduced into the alignment for the Salmonella Schwarzengrund isolates.
|
![]() View larger version (27K): [in a new window] |
FIG. 5. Phylogenetic tree for all sequence types found among 66 Salmonella isolates based on sequencing of three genes. The phylogenetic tree was built on the Bayesian framework using a concatenated alignment of fimA, manB, and mdh sequences representing all 25 sequence types found among the 66 isolates. The branch length of the outgroup was collapsed so as to best view the topology of the tree. Posterior probabilities of >0.50 are given at the nodes above the branches. Bootstrap values of >50.0 are given in parentheses below the branches. The scale bar indicates relative sequence distance. Sequence type assignments, followed by the number of isolates within each sequence type and their serotypes, are given for all isolates. Serogroups are given in parentheses after the serotype designation.
|
|
|
|---|
A three-gene sequence-typing scheme allows for serotype prediction and for limited subtype discrimination within a serotype. While MLST sequence typing schemes have been published for a variety of different pathogens (5, 9, 15, 16), only limited information is available on the use of MLST methods for subtype discrimination of salmonellae (17). Overall, our data indicate that both a seven-gene MLST and a three-gene sequence-typing scheme allow limited within-serotype discrimination for salmonellae; both schemes allowed for discrimination of only eight STs among our initial set of 25 Salmonella enterica isolates representing five serotypes. Sensitive subtyping of salmonellae with high within-serotype discriminatory ability has been documented for a variety of other subtyping techniques, such as MLEE, phage typing, and PFGE (1, 8, 33), indicating that these methods may provide for more-sensitive subtype discrimination for Salmonella enterica. In addition to a low level of discriminatory power, Salmonella MLST schemes also face the challenge of designing appropriate primers that allow for PCR amplification and sequencing of isolates representing at least the majority of common serotypes. For example, both the panB and icdA primers described here did not allow for amplification of the respective genes in serotype Schwarzengrund isolates. Similarly, Kotetishvili et al. (17) reported that the four primer sets used in their Salmonella sequence-typing study allowed for successful amplification of the respective genes in only 75 to 94% of the Salmonella isolates tested. The lack of MLST Salmonella papers in the primary literature, despite the recent boom in MLST subtyping, might be related to these types of technical issues. However, the availability of genome sequences for various Salmonella serotypes (24) may aid in the design of better universal primers for sequence-based Salmonella subtyping.
Even though sequence typing allowed for only limited within-serotype subtype discrimination, analyses of the seven-gene MLST and three-gene sequence-typing data both provided reliable prediction of serotypes. Only 1 out of 24 three-gene STs contained isolates from more than one serotype; isolates within this ST represented serotypes Typhimurium and Typhimurium var. Copenhagen, two very similar serotypes (27). Even sequencing of seven genes did not allow for differentiation of these serotypes. In other studies, differentiation between serotype Typhimurium and serotype Typhimurium variants has been possible with high-resolution subtyping techniques, such as phage typing (27). Interestingly, as discussed in more detail below, analyses of data from the three-gene sequence typing also allowed the definition of two distinct monophyletic lineages within serotypes Kentucky and Newport, indicating that sequence typing provides improved subtype discrimination as well as relevant evolutionary and biological information beyond that associated with serotyping.
Based on our data reported here, we propose that a three-gene sequence-typing scheme targeting fimA, manB, and mdh allows for prediction of the most common Salmonella enterica serotypes as well as for limited within-serotype discrimination (9 of the 12 serotypes in our study included multiple three-gene STs). The genes targeted in this scheme were shown to offer subtype discrimination equal to that of an initial seven-gene MLST; they include a virulence gene (fimA) as well as housekeeping genes that have previously been shown to be useful for determining phylogenetic relationships between various Salmonella subspecies (mdh [3]) and that have allowed for sensitive subtype discrimination among clinical and environmental isolates (manB [17]). Interestingly, our data also showed that gene duplication of manB has occurred in at least some serotype Montevideo isolates, complicating analyses of manB sequence data for these isolates.
Salmonella serotypes represent both monophyletic and polyphyletic lineages. Unlike banding-pattern-based subtyping methods (e.g., PFGE, ribotyping, randomly amplified polymorphic DNA), DNA-sequencing-based subtyping data can also be used to probe the evolutionary history of the isolates sequenced. Initial analyses for reticulate evolution revealed only limited evidence of reticulate evolution (recombination or repeated mutation) in concatenated alignments of the seven genes sequenced for 25 isolates or the three genes sequenced for 66 isolates. This is consistent with previous studies, which indicated that S. enterica basically shows a clonal population structure (2, 3). Since deviations from neutral selection can also hinder an accurate phylogenetic signal, we also tested for evidence of positive selection among the genes sequenced. dN/dS ratios revealed no evidence of positive selection within four of the housekeeping genes, which were included in our gene selection according to standard MLST practice (19). The virulence genes spaN and fimA as well as manB showed dN/dS ratios of >0.1; however, hypothesis testing of positive selection by PAML found no evidence for significant positive selection within these genes. Since the genes used in both our seven-gene MLST and our three-gene sequence-typing scheme did not show statistically significant evidence for either reticulate evolution or positive selection, we concluded that a concatenated sequence could be used to infer the phylogeny of our Salmonella isolates (14).
Phylogenetic analyses demonstrated that the majority of serotypes included in our study represented monophyletic lineages. Only serotypes Newport and Kentucky represented polyphyletic lineages. This is consistent with a previous report (2), which also showed, based on MLEE data, that some Salmonella serotypes, including serotype Newport, represent polyphyletic lineages. This study (2) also reported that the two major lineages of serotype Newport differ in their frequency of association with disease in humans versus animals. Interestingly, in our study, isolates in one lineage were associated exclusively with isolation from avian sources (STs 12 and 13), while the other lineage (ST 11) represented bovine isolates. The clustering of serogroups in our phylogenetic tree based on the three-gene sequence-typing data also correlated well with a previous phylogenetic analysis of Salmonella enterica isolates based on gene content microarray analyses (25); for example, like our data, the phylogenetic analyses reported by this group (25) also grouped the serogroup D1 serotype Javiana into a distinct lineage separated from the other D1 serotypes Dublin and Enteritidis. This further supports the argument that the evolutionary relationships revealed by analysis of sequence data from our three-gene sequence typing correctly represent Salmonella phylogenetic relationships.
Conclusions. Our data show that a seven-gene MLST as well as more economical three-gene sequence-typing schemes allow for reliable prediction of the most common Salmonella serotypes and allow for some subtype classification within Salmonella serotypes. While further verification of our data on isolates representing a larger serotype diversity will be necessary, our data indicate that sequence-based subtyping may have the potential to replace classical serotyping. Our data also appear to indicate that MLST schemes have a limited ability to allow for sensitive subtype discrimination within Salmonella serotypes, such as that achieved with other subtyping methods such as PFGE and phage typing. In contrast to our findings, Kotetishvili et al. (17) reported that a four-gene sequence-typing scheme allowed for more-sensitive subtype discrimination than PFGE, including sensitive within-serotype discrimination. The data of Kotetishvili et al. (17) indicated that even sequencing of a single gene (manB), which was also used in our study reported here, allowed for sensitive subtype discrimination within many serotypes. These discrepancies between our studies will need to be resolved to allow for a conclusive decision on the value of MLST for Salmonella subtyping. Our observation of manB gene duplication, however, emphasizes the need for careful evaluation of sequencing electropherograms when interpreting DNA-sequencing data for this gene.
We thank Kendra Nightingale and Ruth Zadoks for helpful discussions and Nellie Dumas and the New York State Department of Health for human Salmonella isolates.
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»