| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
,
School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, New South Wales 2052, Australia
Received 3 April 2007/ Returned for modification 6 August 2007/ Accepted 17 August 2007
| ABSTRACT |
|---|
|
|
|---|
| TEXT |
|---|
|
|
|---|
Two population genetics studies showed that serovar Typhi is a highly homogenous clone. Multilocus enzyme electrophoresis of 24 metabolic enzymes revealed only two major electrophoretic types (27), and multilocus sequence typing (MLST) of seven housekeeping genes found only three base substitutions in a total of 3,336 bp analyzed and divided 26 serovar Typhi isolates into four sequence types (13). Therefore, there is insufficient variation for either multilocus enzyme electrophoresis or MLST to be useful for the determination of relationships among isolates or for global epidemiological studies.
To facilitate global epidemiology studies and to establish the evolutionary relationships within the serovar Typhi clone, there is a need for a molecular method that is cheap, discriminative, simple, and reproducible for the large-scale typing of isolates. Single-nucleotide polymorphisms (SNPs) are potential markers and have been used to type several pathogens, including Escherichia coli O157:H7 (42), Bacillus anthracis (25), Mycobacterium tuberculosis (8, 11), and Yersinia pestis (1). The discovery of SNPs is facilitated by the sequencing of more than one genome from the same clone. The completed genome sequences of serovar Typhi strains CT18 and Ty2 (4, 24) allowed us to explore the differences between them and to identify SNPs suitable for typing. We selected 37 SNPs that could be differentiated by the presence or absence of a restriction enzyme site to analyze a collection of worldwide Typhi isolates and showed that SNP typing is a good tool for genotyping and determining evolutionary relationships of global serovar Typhi isolates.
Strains. Seventy-three worldwide serovar Typhi isolates, differing in localities and years of isolation, were obtained from the Salmonella Genetic Stock Centre, University of Calgary, Calgary, Canada, and one isolate was obtained from Imperial College London, London, United Kingdom (Table 1).
|
20 ng), 0.2 µl of each forward and reverse primer (concentration, 30 pmol/µl; Sigma-Aldrich), 0.2 µl of 10 mM (each) deoxynucleoside triphosphates, 2 µl of 10x Taq polymerase PCR buffer (New England Biolabs), 0.125 µl (1.25 U) of Taq polymerase (New England Biolabs), and MilliQ water added to adjust the final volume to 20 µl. The PCR product (15 µl) was digested with 1 U of restriction enzyme at 37°C for 2 h and subsequently run on a 2% agarose gel in Tris-borate-EDTA buffer (31). The 20-µl PCR sequencing mixture contained 1 µl of BigDye (version 3.1; Applied Biosystems), 20 ng of the purified PCR product, 3.5 µl of 5x PCR sequencing buffer (Applied Biosystems), 1 µl of forward primer (concentration, 3.2 pmol/µl; Sigma-Aldrich), and MilliQ water. Unincorporated dye was removed by ethanol precipitation. The sequencing reaction mixtures were resolved on an ABI 3730 automated DNA sequence analyzer (Applied Biosystems) at the sequencing facility of the School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, Australia.
Selection of SNPs for typing. From the comparison of the full-genome sequences of serovar Typhi CT18 (accession no. AL513382) and Ty2 (accession no. AE014613) (4, 24) by using BLAST tools available from the Australian National Genetic Information Service, 253 single-copy genes carrying SNPs with non-insertion-and-deletion variation were found. The number of polymorphisms within these genes ranged from 1 to 8 bases, and the polymorphisms accounted for a total of 285 base substitutions, with the majority of the genes (239) having a single-base substitution. Of the 285 polymorphisms, 111 were synonymous SNPs (sSNPs) and 174 were nonsynonymous SNPs (nsSNPs). It is interesting that the number of nsSNPs was 30% greater than the number of sSNPs since deleterious nsSNPs are expected to be eliminated from populations rather quickly. This circumstance seems to be a general phenomenon, as similar observations in other bacterial clones have been made previously (1, 8, 11, 25). In M. tuberculosis, 65% of the SNPs are nsSNPs (8), while 58% are nsSNPs in B. anthracis (25). The likely explanation is that the time frame is too short for many of the nsSNPs, particularly the mildly deleterious ones, to be removed by purifying selection (28).
Thirty-six genes were selected for SNP typing using the seven most economical 4-base restriction enzymes to discriminate 37 SNPs, of which 17 were sSNPs and 20 were nsSNPs. AluI was utilized to differentiate eight SNPs, HhaI and HaeIII were used for seven each, HpaII was utilized for four, NlaIII and RsaI were used for five each, and TacI was used for one SNP. All except two of the 37 SNPs were the single SNP present in a gene. SNP 18 and SNP 19, an sSNP and an nsSNP, respectively, were located in the same gene, yehU, encoding a putative two-component-system sensor kinase. There were six polymorphic sites scattered along the 1,686 bp of yehU, four of which resulted in a change of the amino acid.
SNP typing. The 73 serovar Typhi isolates were typed for the 37 SNPs, of which 17 SNPs were shared by two or more isolates while 16 and 4 were found to be unique to CT18 and Ty2, respectively. It is unclear why there were four times more unique SNPs in CT18 than in Ty2. An additional SNP was discovered upon analyzing SNP 18. Sequencing of the PCR product revealed that two isolates (CC6 and CC7) had the same base as Ty2 at the site of SNP 18 but that a single base change 360 bases downstream created a new restriction enzyme site. This SNP was designated SNP 38, making a total of 38 SNPs. In SNP 37 (A-for-G substitution), CT18 has an A base according to the genome data (24) but had the same digestion pattern as Ty2, indicating that it has the same base (G) as Ty2, which was confirmed by sequencing. This finding may be due to an error in the CT18 genome sequence (24) or to a mutation in our CT18 isolate. Nevertheless, the base A allele was present in eight isolates.
The 73 serovar Typhi isolates were grouped into 23 SNP profiles (Table 2). Twelve profiles, including the profiles of CT18 and Ty2, were represented by only one isolate. The other 11 profiles were shared by two or more isolates, with SNP profile 10 being the most common, shared by 23 isolates. It is interesting that isolate 422Mar92, belonging to a unique MLST sequence type (sequence type 8) previously thought to be restricted to African isolates (13), also fell into this largest SNP profile.
|
|
A minimum spanning tree (MST) was constructed using Arlequin version 3.1 (5) to visualize the overall relationships of the profiles (Fig. 1). The MST groupings were consistent with the four clusters observed in the maximum-parsimony consensus tree. The MST showed that SNP profile 10 was the ancestral profile, connecting to the out-group by two changes, indicating that cluster III arose first and that the other three clusters emerged from cluster III. Most SNP profiles were linked to only one other SNP profile. However, SNP profiles 4, 16, and 17 showed equal distances from two or more SNP profiles, and alternative connections for these SNP profiles were represented on the tree as networks. The four CCs identified by eBURST were consistent with the phylogenetic clustering with the exception of the results for SNP profile 21. SNP profile 21 was assigned to CC1 by eBURST but belongs to cluster IV. However, there is no real conflict, as SNP profile 21 was the founding member of cluster IV. Cluster I was more homogenous than the other clusters, as it was represented by CC1 only while the other clusters contained more divergent members in addition to CCs.
The phylogenetic tree allowed us to determine whether there was any association of phylogenetic clustering with genome types, defined by the arrangement of I-CeuI fragments (22), and/or phage types, determined based on sensitivity or resistance to Vi phages (3) (Fig. 1 and Table 1). The phage type is largely independent from the genome type, as shown in other studies (15). Nevertheless, the occurrence of a particular combination of genome type and phage type was predominant in two phylogenetic clusters. Most of the isolates belonging to cluster I had genome type 3 and phage type I+IV. Although genome type 3 dominated, there were also other genome types in the cluster. For example, SNP profile 2 contained genome types 22 and 27. However, these two genome types were likely to have been derived from the predominant genome type 3, as each required only a single genomic rearrangement (18). Cluster IV contained all isolates with phage type E1 and its variant E2 used in this study but had no dominant genome type. Clusters II and III had no apparent association with particular genome or phage types. Cluster II contained three genome types and four phage types, while cluster III had the most variable characteristics, with 18 and 16 different genome and phage types, respectively. No association of phylogenetic clusters with the years of isolation and/or the localities of isolation was found. The 19 Canadian isolates analyzed were scattered into three of the four clusters. Isolates from cluster III spanned all five regions represented in the study: Africa, America, Asia, Europe, and Oceania. This finding suggests that major serovar Typhi clones have spread globally. Large-scale typing of isolates from different regions by a genotyping method such as the SNP typing technique developed in this study will help further elucidate any spatial or temporal clustering of serovar Typhi clones.
Parallel changes. The sites of six SNPs, 8, 11, 17, 35, 36, and 37, seem to have undergone parallel changes across two or more independent lineages. For example, the allele base A of SNP 8 supporting cluster II was also present in SNP profile 3 of cluster I, and the two alleles of SNP 37 were present in all four clusters. As alleles were initially deduced from restriction enzyme digestion, we confirmed the base changes of the alleles concerned by sequencing the representative isolates. Thus, the polymorphisms observed resulted from parallel changes due to either mutation or recombination. However, it will be difficult to determine whether the changes were due to mutation or recombination. We have recently shown that recombination is frequent within S. enterica subspecies I (23). Recombination within a serovar may also be frequent, and the parallel changes observed are likely to be due to recombination. It is interesting that four of the six SNPs involved were nonsynonymous, and selection pressure may play a role in driving some of these parallel changes. However, none of the genes is known to be related to virulence. Note that, although the T allele of SNP 25 was shared by all SNP profiles of cluster IV and two profiles of cluster III, this allele is not considered to be the result of a parallel change because the two SNP profiles of cluster III were in direct line with the emergence of cluster IV.
Origin of isolates expressing the z66 flagellar antigen. Although most serovar Typhi isolates are monophasic for the expression of the flagellar antigen encoded by the fliC gene at the H1 locus, some Indonesian isolates have an additional z66 flagellar antigen, which was first described in 1981 (10) and was thought to be encoded by fljB at the H2 locus (19). The z66 flagellar antigen is now known to be encoded by a gene in the fljBA-like operon not located in the H2 locus (12). All the serovar Typhi isolates used in this study were typed for the presence of the z66 flagellar antigen gene by PCR using the primers described by Huang et al. (12). Among the 73 serovar Typhi isolates, 15 were known from serotyping results to express z66, and all 15 were confirmed to be positive for z66 by PCR (Table 1; data not shown). An additional three isolates were found to carry the z66 flagellar antigen gene by PCR. The z66-positive isolates had SNP profile 1, 2, 4, or 5 and were all grouped into cluster I. It appears that isolates expressing flagellar antigen z66 had a single origin in cluster I. However, one isolate from SNP profile 1 and all SNP profile 3 isolates did not have the z66 flagellar antigen. Presumably, these isolates had lost the gene. The presence of the z66 flagellar antigen only in cluster I suggests that serovar Typhi was originally monophasic, having only an H1 antigen, and then gained a new phase 2 flagellin gene-like operon only recently during the divergence of cluster I. This new flagellin gene is more similar to H27 fliC of E. coli than to other H antigen genes of S. enterica (12) and is located on a linear plasmid, as shown recently (2), which further supports the hypothesis of the recent acquisition of the z66 antigen through lateral transfer. The findings in this study suggest that the earlier hypothesis that serovar Typhi first adapted to humans in Indonesia and was initially diphasic, with an H2 locus encoding the z66 flagellar antigen, before spreading globally from Indonesia is less likely to be correct (9).
It was unclear whether isolates carrying the z66 antigen had a selective advantage, as the antigen had not been stably maintained within cluster I. The H antigen is part of the cell surface and one of the targets of the host immune system and, thus, is under intense selection pressure to change (40). The z66 antigen may be an advantage, considering the high incidence of typhoid fever in Indonesia. However, findings from earlier studies showing that z66 isolates are found almost exclusively in Indonesia and do not spread widely across the globe (38, 39) argue against this hypothesis. It is possible that the coexistence of monophasic and diphasic serovar Typhi isolates in Indonesia is a result of balancing selection to maintain the genetic diversity of the flagellar antigen in serovar Typhi (32).
The discriminatory power of SNP typing was determined using Simpson's index of diversity (D) calculated with an in-house program, MLEECOMP (available upon request). The D value for this study was 0.870. We compared the D value of SNP typing to those of MLST (13) and ribotyping (15), both of which used global isolates, as no comparable data set is available for pulsed-field gel electrophoresis, the "gold standard" for the comparison of the powers of typing methods (29). MLST and ribotyping had D values of 0.503 and 0.873, respectively. The SNP typing method developed in this study had a considerably higher discriminatory power than MLST but a power similar to that of ribotyping. However, the power of SNP typing can be increased by incorporating more SNPs, while ribotyping is constrained to detecting variation in the seven regions containing rrn operons only. Furthermore, the variation detected by ribotyping has resulted mostly from genome rearrangement due to rrn recombination rather than mutation (22), and this type of variation cannot be used to determine true relationships.
Minimal SNP set required for differentiating SNP profiles. To reduce the cost of genotyping and/or to facilitate large-scale typing, it would be useful to define a minimal SNP set that can identify all SNP profiles as assigned in this study. We identified 16 SNPs that could be used to classify the 73 serovar Typhi isolates into the same 23 SNP profiles (Table 2). These 16 SNPs require only four of the seven enzymes used in this study, AluI, HaeIII, HhaI, and HpaII, to type 3, 4, 5, and 3 SNPs, respectively. The total enzyme cost for typing an isolate is very small, far more economical than the cost of MLST. Further enhancement, such as automation and PCR multiplexing, can give additional advantage to this approach.
Comparison of approaches to SNP discovery for typing. In this study, we took the advantage of genome sequences available to obtain SNPs for typing. As these SNPs were derived from the comparison of only two genomes, they can reveal only the evolutionary path separating Ty2 and CT18, due to phylogenetic discovery bias (25). Nevertheless, the SNPs used in our study allowed the determination of the position of the last common ancestor of serovar Typhi and the node positions for the SNP profiles, despite the inability to determine the true branch lengths of the SNP profiles other than those of the two profiles representing Ty2 and CT18. A recent study by Rougmanac et al. (30) took a different approach to obtain SNPs, aiming to circumvent the phylogenetic discovery bias problem. The study surveyed 200 gene fragments from over 100 serovar Typhi isolates for variation and found 88 informative SNPs. These SNPs were used as markers to differentiate 481 global serovar Typhi isolates into 85 haplotypes and group them into five major clusters. Despite the large number of SNPs used, each of the five clusters was supported by a single SNP only and there was little resolution of the relationships within a cluster. No parallel changes were detected, contrary to what we observed in this study.
Twenty-nine of the isolates analyzed in our study were also studied by Rougmanac et al. (30). These isolates were distinguished into 17 SNP profiles in this study and into 14 haplotypes in the Rougmanac et al. study (30) (Table 1). Overall, our SNPs offered a slightly higher degree of differentiation for these 29 isolates. However, SNPs from the two studies gave different resolutions to different groups. Our SNPs divided haplotypes 39, 42, 50, and 59 further, while their SNPs distinguished six isolates of SNP profile 10, the largest profile, into individual haplotypes. At the cluster level, it appears that our clusters II and III were subdivisions of their cluster II, as the six SNP profiles falling into cluster II of Rougmanac et al. (30) were grouped into two separate clusters in our study (SNP profiles 6, 7, and 8 in cluster II and 10, 11, and 13 in cluster III). Further studies will be useful to identify a set of SNPs from these two studies that may offer optimal differentiation.
In conclusion, we have shown that SNPs obtained from genome comparisons are valuable markers for typing serovar Typhi isolates. The distinctive advantage of an SNP-based method is that true genetic relationships can be established. We have developed an SNP typing method based on PCR-restriction enzyme digestion, and our method has the advantage of minimal cost for consumables and the need for only basic laboratory equipment for the detection of SNPs.
| ACKNOWLEDGMENTS |
|---|
We thank Ken Sanderson (University of Calgary) and Gordon Dougan (Imperial College London) for generously providing us with the strains. We also thank the anonymous referees for their constructive comments and suggestions.
| FOOTNOTES |
|---|
Published ahead of print on 29 August 2007. ![]()
Supplemental material for this article may be found at http://jcm.asm.org/. ![]()
| REFERENCES |
|---|
|
|
|---|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Antimicrob. Agents Chemother. | Clin. Microbiol. Rev. |
|---|---|
| Clin. Vaccine Immunol. | ALL ASM JOURNALS |
|---|