Previous Article | Next Article ![]()
Journal of Clinical Microbiology, March 2009, p. 596-602, Vol. 47, No. 3
0095-1137/09/$08.00+0 doi:10.1128/JCM.01693-08
Copyright © 2009, American Society for Microbiology. All Rights Reserved.

Usha Srinivasan,1
Lixin Zhang,1
Thomas S. Whittam,2,
Carl F. Marrs,1 and
Betsy Foxman1*
Department of Epidemiology, University of Michigan School of Public Health, Ann Arbor, Michigan 48109,1 Microbial Evolution Laboratory, National Food Safety and Toxicology Center, Michigan State University, East Lansing, Michigan 488242
Received 2 September 2008/ Returned for modification 10 October 2008/ Accepted 29 December 2008
|
|
|---|
|
|
|---|
Because it is based on sequence and thus is both portable and unambiguous, many consider multilocus sequence typing (MLST), which is based on variation in housekeeping loci in the genome, the ideal typing method. MLST is highly discriminatory for establishing long-term patterns of evolution (9). However, MLST does not provide much insight into recent genetic history, such as acquisition of mobile genetic elements. For example, alleles of the highly clonal food-borne pathogen Escherichia coli O157:H7, which causes bloody diarrhea and hemolytic-uremic syndrome, share nearly identical (>99.9%) nucleotide sequences among different isolates (12). In this case, MLST cannot differentiate E. coli O157:H7 isolates from each other, making it less useful for pathogen tracking and outbreak investigations.
In addition, MLST analysis can be quite daunting, especially for a large collection of isolates. For example, a study involving typing of 1,000 E. coli isolates results in 14,000 DNA sequences that must be visually examined for base-calling errors, compensated for reference sequence additions or deletions, trimmed to the appropriate length, and finally analyzed for the correct allele numbers. Therefore, the analysis for MLST requires significant personnel time and laboratory resources beyond the base cost of DNA isolation, PCR, and sequencing.
Probe hybridization array typing (PHAT) is an alternative typing method complementary to MLST. PHAT is a highly discriminatory method that determines relatedness of strains by using a binary system based on results of DNA dot blot hybridization for the presence or absence of genetic material (13). Strains that share diverse regions of genetic content are more likely to be related than strains that do not. By using gene probes that reflect genetic variation, PHAT provides insight into a more recent genetic history of a strain than that found using MLST. PHAT is ideal for use with large isolate collections since it can be easily adapted to a high-throughput "library-on-a-slide" (LOS) microarray format (20). LOS is capable of testing up to 1,200 isolates in duplicate in a single experiment. In addition, the binary output of PHAT typing is easily digitally formatted for large databases, making data analysis of PHAT more time-efficient than for MLST. While using MLST on 1,000 isolates can be daunting, if they can be grouped first using PHAT, only a subset need be typed by MLST.
The previously published set of PHAT probes was a proof of concept requiring further refinement (13). First, the probes included in the initially published set were validated using uropathogenic and rectal isolates of E. coli. Therefore, the discriminatory power of PHAT for other pathotypes was not tested. Second, all probes were gene fragments from subtraction PCR experiments with rectal isolates, many without clearly defined genetic functions or relationships to virulence. Lastly, the PHAT types could not be mapped to MLST types, and thus direct comparisons between PHAT and MLST could not be made.
Here we describe a refinement of the PHAT probe set intended to classify E. coli isolates into groupings corresponding to MLST clonal groups (CGs). Probes were selected from genes with known and unknown functions related to virulence potential. The probe set was developed and validated by using a diverse collection of E. coli isolates, including strains that cause diarrhea, urinary tract infections (UTIs), and meningitis and commensal vaginal and rectal isolates and thus should be generalizable to isolates from these groups.
|
|
|---|
Selection and creation of PHAT probes. To determine the PHAT probe set, we used three sets of previously described genes and an additional new set of genes. The first set of genes includes the previously described genes chuA, yjaA, and tspE4.C2 (1). These genes categorize E. coli isolates into one of the four main ECOR phylogenetic groups: A, B1, B2, or D. If an isolate was positive for the presence of chuA and yjaA, then the isolate is categorized as belonging to phylogenetic group B2. If an isolate was positive for chuA but negative for yjaA, then the isolate is categorized as group D. An isolate that was negative for chuA and positive for tspE4.C2, would be categorized as group B1, and an isolate that was negative for both chuA and tspE4.C2 would belong to group A.
The second set of genes includes the previously described genes stx1, stx2, eae, bfp, lt, virF, ipaH, and aafII (15). These genes help to categorize E. coli isolates into one of six diarrheagenic pathotypes: enterotoxigenic E. coli (ETEC), enteropathogenic E. coli (EPEC), Shiga-toxin producing E. coli (STEC), enterohemorrhagic E. coli (EHEC), enteroinvasive E. coli, or enteroaggregative E. coli. E. coli isolates that are positive for the presence of Shiga toxin-producing genes, stx1 or stx2, are categorized as STEC. If these isolates are also positive for the presence of the intimin gene, eae, they are categorized as EHEC. E. coli isolates that are negative for stx1 and stx2 and are positive for eae are categorized as EPEC. EPEC isolates that are positive for the bundle-forming pili gene, bfp, are considered typical EPEC, while isolates that are negative are considered atypical EPEC. Isolates positive for the labile toxin gene, lt, are categorized as enterotoxigenic E. coli, while isolates positive for the aggregative adherence fimbria II gene, aafII, are considered enteroaggregative E. coli. Isolates that are positive for ipaH and virF, an invasion plasmid antigen and a transcriptional activator of a virulence loci, respectively, are considered to be enteroinvasive E. coli.
The third set of genes includes the previously described genes iroN, ompT, hly, kpsMT, and aer (5, 8, 10). These genes represent virulence factors that have been associated with E. coli that cause UTIs (uropathogenic E. coli).
The fourth set of genes consists of genes that were identified by in silico comparison of the sequenced strains E. coli CFT073 and Shigella flexneri 2457T to E. coli MG1655 using GenomeComp (17) and BLAST2 (National Center for Biotechnology Information [NCBI] database, National Institutes of Health [www.ncbi.nlm.nih.gov]) software (L. Zhang, unpublished data). Ideal genes were considered ones present in ca. 40 to 60% of the sequenced strains of E. coli and Shigella sp. in the NCBI database and have a potential virulence function. Genes selected for PHAT probes from the genome of E. coli CFT073 included c0286, c0311, c0340, c1164, c1600, c3389, and c3680 (named after the gene locus tag for E. coli CFT073 in the NCBI database). The gene that was selected to make into a PHAT probe from the S. flexneri 2457T genome included S3187 (named after the gene locus tag for S. flexneri 2457T in the NCBI database).
We redesigned the primers used for each gene (except the genes associated with UTI in set 3) to give product lengths that would optimize the LOS hybridization procedure using Primer Select Lasergene software (DNAstar, Inc., Madison, WI). The Primer sequences, product lengths, and annealing temperatures for each gene probe are shown in Table 1. Each PCR tube contained 50 ng of purified DNA template, 100 pmol of each primer, and Platinum PCR Supermix (Invitrogen, Carlsbad, CA), resulting in a total volume of 50 µl. PCR amplification was performed by using the PTC-100 programmable thermal cycler (MJ Research, Waltham, MA). PCR conditions used were as follows: soaking time of 2 min at 94°C, followed by 30 cycles of denaturing at 94°C for 30 s, annealing at the temperatures given in Table 1 for 30 s, and extension at 72°C for 2 min. This was followed by a final extension at 72°C for 5 min. PCR products were then purified by gel electrophoresis using a 1% agarose gel. Gels were then stained with ethidium bromide for visualization of DNA bands. Bands were excised from the gel and purified by using a QIAquick gel extraction kit (Qiagen, Inc., Valencia, CA). PCR amplification was performed a second time using the same procedure except using purified PCR product as template DNA. The resulting PCR products were then purified by using a QIAquick PCR purification kit (Qiagen). Probes were labeled with fluorescein-12-dCTP (Perkin-Elmer, Waltham, MA) by using a BioPrime labeling kit (Invitrogen). A DNA quantification probe was created using the seven MLST genes: aspC, clpX, fadD, icdA, mdh, lysP, and uidA. PCR amplification of the MLST genes was performed by using the MLST protocols described below and using DNA from E. coli CFT073 as the template. PCR products from all seven genes were pooled and labeled with digoxigen-11-dUTP (Enzo Life Sciences, Inc., Farmingdale, NY) using a BioPrime labeling kit (Invitrogen). All probes were tested with positive (template strain and purified probe) and negative (water) controls on nylon membranes (Hybond H+; Amersham Pharmacia, Buckinghamshire, United Kingdom) using the LOS hybridization procedure below to ensure the probes would hybridize as expected before use.
|
View this table: [in a new window] |
TABLE 1. Refined set of PHAT gene probes
|
MLST. MLST was performed previously on all representative E. coli collection isolates of T. Whittam. We used the same protocol to perform MLST on selected isolates from the cystitis, pyelonephritis, and vaginal and rectal collections for verification of the cluster analysis. MLST PCR protocols from the EcMLST database (16) webpage (www.shigatox.net/stec/mlst-new/index.html) were followed using Platinum Taq DNA polymerase (Invitrogen). PCR products were purified by using a QIAquick PCR purification kit (Qiagen) and then sequenced at the University of Michigan Biology Core facility using primers for both 3'and 5' directions. DNA sequence chromatograms were visualized and edited by using FinchTV software (Geospiza, Inc., Seattle, WA). A consensus sequence was obtained for each of the seven gene fragments for each isolate by using MegAlign (DNAstar, Inc.). DNA sequences were then compared to the EcMLST database to determine the allele type for each gene. The allele type profiles for each isolate were then used to assign sequence type (ST) and CG designations.
Analytic strategy. Our aim was to identify an optimal PHAT probe set which can maximally classify the E. coli isolates into different phylogenetic groups and compare the groups to classification using MLST. We performed cluster analysis using Cluster v2.11 and subsequently visualized the results by creating dendrograms with TreeView v1.60 software available at: http://rana.lbl.gov/EisenSoftware.htm (4), initially for classifying the representative E. coli collection for which MLST data were available. In the cluster analysis, distances between of all pairs of data to be clustered (e.g., all of the CGs in the current data set) were calculated using a Pearson correlation. Cluster uses agglomerative hierarchical processing, which consists of repeated cycles where the two closest remaining items (those with the smallest distance) are joined by a node/branch of a tree, with the length of the branch set to the distance between the joined items. The two joined items are removed from list of items being processed replaced by an item that represents the new branch. The distances between this new item and all other remaining items are computed, and the process is repeated until only one item remains. The CGs and STs used in the cluster analysis were as follows: CG7 (STs 23, 24, 25, 26, 357, and 378), CG13 (STs 86, 87, 88, and 296), CG14 (STs 104, 106, 110, and 310), CG17 (STs 118, 119, 120, 225, and 255), CG23 (STs 169, 170, 171, 172, 272, 273, 298, and 343), CG38 (STs –20, –3, 27, 28, 265, 271, 299, 338, and 346), and CG58 (ST 281, 282, 300, 344, and 384). The identified probe set was then used to classify isolates from other collections. As a validation, MLST was performed on a subset of isolates with PHAT signatures that corresponded to major CGs consensus PHAT signatures in the representative E. coli collection.
|
|
|---|
Cluster analysis. The distribution of PHAT probes in the representative E. coli collection was analyzed to compare the PHAT classification to MLST CGs. The results were imported into TreeView to create a dendrogram (Fig. 1). Here, all 62 CGs in the collection were distinguished as separate branches in the tree. Using any less than the full set of 24 PHAT probes gave results in which not all 62 CGs could be distinguished.
![]() View larger version (25K): [in a new window] |
FIG. 1. Dendrogram of cluster analysis results for isolates of the representative E. coli collection (n = 221). All 62 multilocus ST CGs were distinguished using 24 PHAT probes. CG 0 contains 73 singleton STs.
|
The PHAT probe set distinguished between many of the STs (Fig. 2) but had trouble distinguishing others, sometimes placing STs from the same CG into different PHAT groups. The sensitivity and specificity of this PHAT probe set for placing STs into the correct MLST CG ranged from excellent to good as follows: CG7 (100.0 and 100.0%), CG13 (66.6 and 95.0%), CG14 (80.0 and 100.0%), CG17 (100.0 and 97.5%), CG23 (50.0 and 94.7%), CG38 (80.0 and 94.4%), and CG58 (80.0 and 100.0%). Overall, the PHAT probe set has a sensitivity and a specificity of 78.3 and 97.5%, respectively, for this subset of strains from the representative E. coli collection, with the highest sensitivity and specificity for CG7 and CG17 and somewhat less for the other CGs. Identifying additional probes specific for each CG would improve this classification.
![]() View larger version (36K): [in a new window] |
FIG. 2. Dendrogram of the STs of the major CGs from the representative E. coli collection (n = 46) with PHAT genetic signatures. Gray squares indicate the presence of the gene for a given ST, and black indicates the absence of the gene. All STs in CG7 and CG17 were correctly grouped together, while between one and four STs were misclassified for CG13, CG14, CG23, CG38, and CG58.
|
MLST validation. From the cluster analysis of representative E. coli strains, we assigned a PHAT type that corresponded to each CG. In order to validate the PHAT assignment to CG, we identified all isolates in the cystitis, pyelonephritis, rectal, and vaginal collections whose PHAT type corresponded to a CG, based on the representative E. coli collection strains tested. From each pool of isolates identified from a specific CG, we randomly selected up to six isolates for MLST, for a total of 24. No isolates were chosen from CG14 (EHEC 2) because there were no isolates in the cystitis, pyelonephritis, rectal, or vaginal collections which matched the PHAT probe signature for this CG. The MLST typing results are shown in Table 2. The overall sensitivity and specificity of PHAT to predict MLST CG is 64.7 and 88.3%, respectively, with the most accurate assignments for CG7 and CG17: CG7 (100.0 and 95.6%), CG13 (0.0 and 75.0%), CG17 (100.0 and 100.0%), CG23 (40.0 and 92.8%), CG38 (100.0 and 81.8%), and CG58 (100.0 and 86.4%).
|
View this table: [in a new window] |
TABLE 2. Comparison of CG assignment by MLST and PHAT for selected E. coli isolates
|
|
|
|---|
The PHAT probe set can be refined to include a varied set of genes, including potential virulence genes, phylogenetic markers, antibiotic resistance genes, or other genes of interest. Probes can be added to a core set of PHAT probes, so as to better define closely related groups that may need more discrimination. For bacteria with very heterogeneous genomes, such as E. coli, the accuracy of assignment to CGs based on PHAT may vary somewhat with study collection, so additional or alternative probes may be required. While the PHAT probe set presented here was created and validated using E. coli isolates, other probe sets can be created to type other bacterial agents. Any new PHAT probe set, however, would need to be properly validated for the agent of interest. Thus, PHAT is a very adaptable system that can be modified to suit many typing objectives, in a relatively time- and cost-efficient manner.
MLST analysis demonstrated that the refined PHAT probe set is able to correctly resolve most isolates for MLST CGs CG7, CG17, and CG14, with the most discriminatory probes for these CGs being chuA, yjaA, tspE4.C2, c1164, and c3389. However, PHAT did not resolve all CGs the same as MLST, since classification by PHAT grouped strains of certain CGs together (CG13/CG23 and CG38/CG47), and a majority of CG58 strains were classified into different CGs. Isolates of CG13 and 23 are both phylogenetic group A, while CG38 and CG47 are both group B2. The PHAT probe results have considerable overlap for each pair of CGs, suggesting that each pair may be closely related. While four of five CG58 isolates were correctly classified in the cluster analysis, only two validation isolates were typed as CG58 by MLST. Three other validation isolates were typed as CG0 or CG23. However, the PHAT probe set was developed using uropathogenic, diarrheal, meningitic, and commensal E. coli isolates, but isolates included in the final validation were mostly commensal E. coli isolates. This may have contributed to lower concordance between PHAT and MLST typing. A more detailed discussion on these results is included below.
In the cluster analysis, PHAT correctly classified all isolates within CG7 (EPEC 4), CG17 (EPEC 2), and all but one isolate of CG14 (EHEC 2). All isolates in CG7, CG14, and CG17 were positive for the gene eae, while all other CGs are negative for this gene. The eae gene encodes for intimin protein located in the locus of enterocyte effacement. The locus of enterocyte effacement is responsible for attaching and effacing histopathology, a defining feature of the diarrheal disease caused by the pathotypes EPEC and EHEC (11).
The phylogenetic probes chuA, tspE4.C2, and yjaA were useful in differentiating CGs by PHAT. The cluster analysis shows that CG14 and CG17 are related and are both phylogenetic group B1. This finding is consistent with previous studies (2, 7). CG14 and CG17 are further differentiated by probes c1164 and c3389, which are positive for CG17 and negative for CG14. The gene c1164, also known as ycdT, encodes a hypothetical inner membrane protein with a predicted diguanlyate cyclase domain. The gene c3389 encodes a hypothetical protein that is similar to the OmpA family of outer membrane proteins. The gene c3389 was also found in most CG38 isolates, while being absent in all of the other major CGs. c3389 has been previously associated with group B2 strains and has been found to be absent in strains from phylogenetic groups A and D (3). A better understanding of the function of c1164 and c3389 would help to inform about the differences between these CGs.
The refined PHAT probe set is capable of classifying isolates into groups in a manner similar to major clonal complexes of MLST, indicating coevolution between the chromosomal background and the flexible gene pool. While groupings between MLST and PHAT are similar, they do not match exactly, and therefore the results are not directly comparable. However, this is to be expected given that the evolutionary basis for change is different in each method. MLST classifies isolates based on base pair changes in DNA fragments of conserved housekeeping genes. Most changes in these genes are thought to represent a distant genetic history of a given strain, giving rise to a situation where two related strains share a common ancestor. PHAT, on the other hand, classifies isolates based on the presence or absence of genes to give a binary signature. Changes in whole gene presence or absence are likely to reflect much more recent genetic history for a given strain, given that these virulence genes tend to be propagated by horizontal gene transfer mechanisms, such as bacteriophages, plasmids, and transposable elements. Therefore, two related strains by PHAT typing do not necessarily share a common ancestor for core genes. Thus, we are comparing two very different evolutionary measurement scales, with MLST changes occurring on a slow time scale and PHAT changes which may occur on a faster time scale. Instances where STs share similar lineages (as determined by MLST) but different PHAT signatures are especially interesting to characterize further, since they may represent lineages where acquisition of specific sets of virulence genes may result in increased propensity to cause disease.
Although these two typing systems give results that are not directly comparable, they are highly complementary. Many studies are focused on the virulence potential and the clinical implications of particular strains, and the relevance of the MLST gene is likely to occur only through indirect genetic linkages to actual virulence genes (14). PHAT, however, would be an ideal typing system to use to answer these types of questions since any potential virulence gene could be a PHAT probe candidate. These candidate genes can then be screened for across the collection relatively quickly to determine whether any relationships between the gene and clinical disease exist, which may help to determine whether further study is warranted. One caveat is that discriminatory power of a specific PHAT probe may vary by study collection. As the PHAT probe set is further developed and refined by applying to large sets of isolates from various diseases, it will be possible, and of practical importance, to establish an all-round array for PHAT.
We thank Harry Mobley and Patricia Brown for providing some of the E. coli isolates for the present study and Hardik Doshi for help with the LOS hybridizations.
Published ahead of print on 14 January 2009. ![]()
Present address: Michigan Department of Community Health, Lansing, MI 48909. ![]()
|
|
|---|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»