Previous Article | Next Article ![]()
Journal of Clinical Microbiology, January 2007, p. 206-214, Vol. 45, No. 1
0095-1137/07/$08.00+0 doi:10.1128/JCM.01543-06
Copyright © 2007, American Society for Microbiology. All Rights Reserved.

Department of Epidemiology,1 Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, Michigan 481092
Received 25 July 2006/ Returned for modification 11 September 2006/ Accepted 16 October 2006
|
|
|---|
|
|
|---|
Most gel- and PCR-based techniques generate complex banding patterns that lack uniform interpretation criteria (17). Although PFGE can be highly reproducible when a standard protocol and equipment is used, problems remain (17). The interpretation of gel-based methods is most straightforward when additional information regarding the relationships between strains is available, such as when they are epidemiologically linked and when assays are conducted in a single laboratory (24).
DNA-based typing methods have the advantage of portability and reproducibility. MLST is based on direct sequencing of 400- to 500-bp regions of five to seven housekeeping genes (1, 14). Each strain is scored based on nucleotide substitutions observed and assigned to unique allelic profile sequence types. This method has a high discriminatory power but is labor-intensive, time-consuming, and still is impractical for high-throughput applications. SNP typing based on high-throughput sequencing of 13 SNPs from 11 genes used for MLST has been demonstrated for E. coli (10). Although SNP typing is less discriminatory than MLST (for the SNPs analyzed), when used for phylogeny the resulting groupings are similar to those found by using MLST.
Binary typing is an alternative DNA-based typing method to MLST and is suitable for organisms with a large variation in genetic content. In binary typing, each strain is assigned a signature based on the presence or absence of a set of defined DNA sequences rather than allelic profiles. Binary typing using comparative genomic hybridization, containing all of the open reading frames (ORFs) of a sequenced genome (genomotyping), has been demonstrated for typing clinical bacterial Campylobacter and Salmonella strains (13, 18). In this method, strains can be typed for the presence or absence of all the coding regions on the bacterial genome. Although genomotyping has high discriminatory power, it is time-consuming for typing large collections since it uses a large number of ORFs to type a few bacterial strains. Oligonucleotide-based arrays have also been used to type bacterial strains (10).
A binary typing method using probes generated from RAPD sequences has been validated for Staphylococcus aureus (25, 26, 29). We describe here the development and validation of a hybridization-based binary typing method for E. coli, probe hybridization array typing (PHAT), and compare it to other typing methods. By selecting probes with the most discriminating power, we demonstrate that a relatively small probe set can be used to type large numbers of diverse bacterial strains. Consecutive additions to the PHAT probe set can be used to adjust the discriminatory power of PHAT.
PHAT uses the genetic diversity of the genome for identification rather than the conserved sequences favored by MLST. The more diverse regions that are shared among a group of strains, the more likely the strains are closely related. By focusing on the presence or absence of genetic content rather than allelic variation in conserved genes, PHAT detects changes on a relatively short time scale. The presence or absence of genetic regions is identified by using DNA hybridization. The resulting string of zeros and ones, corresponding to the absence and presence of the chosen genetic regions, creates a reproducible and portable PHAT "type" that is easily compared across laboratories. PHAT has the advantage of an adjustable level of discrimination: increasing the number of probes in the probe set will increase the level of discrimination between strains. Further, collapsing to a smaller probe set has a clearer biological meaning than similarity based on gel band pattern, since the genetic content of specific bands is usually unknown. PHAT can be applied in a high-throughput "library-on-a-slide" (LOS) format (33) and is readily adapted to other bacterial species with high variation in genetic content.
|
|
|---|
E. coli collections. Subtraction PCR (sPCR) probes generated from genome subtraction experiments were used to probe three different E. coli collections: (i) the E. coli reference collection (ECOR), which is a collection of 72 strains isolated from a variety of hosts and geographical locations (http://foodsafe.msu.edu/Whittam/ecor/); (ii) a set of 33 E. coli strains for which PFGE was available, also selected from college women aged 18 to 39 years with urinary tract infections (UTI) (8); and (iii) a set of 106 rectal strains randomly selected from E. coli isolates collected from college women aged 18 to 39 years with their first diagnosed UTI (9). The UTI collections have previously been characterized for the presence or absence of genes encoding adhesin P-pili (pff) further divided by adhesin subgroup (papGAD, papGJ96, and prsGJ96), S fimbrial adhesin (sfa), aerobactin (aer), group II capsule (kpsMT), cytotoxic necrotizing factor (cnf1), Dr family of adhesins (drb), hemolysin (hly), outer membrane protease T (ompT), Irg homolog adhesin (iha), uropathogenic specific protein (usp), catechole siderophore receptor gene (iroNE. coli), and heat-resistant agglutinin (hra) as described previously (4, 15, 23).
sPCR fragment selection. We generated a library of genomic sequences that are present on one bacterial strain (tester) but absent on another (driver) using sPCR. sPCR fragments from four different subtractions were used. These genomic subtraction experiments yielded sPCR fragments that were either uniquely present in a greater number of pathogenic UTI strains or more likely to be involved in shared strains between heterosexual partners or shared between bladder, vaginal, and rectal sites. The details of these subtractions are described elsewhere (3, 23, 28, 30, 32). sPCR fragments were cloned into commercial vectors (TOPO; Invitrogen, Inc.) and probed for presence or absence in UTI and non-UTI E. coli collections. Probes that were present in 40 to 60% of the screened study populations were selected as possible PHAT candidates. The magnitudes of the association between the different sPCR fragments were estimated by using the odds ratios and 95% confidence intervals, and the significance was tested by using the chi-square test. All analyses were done by using SAS v8.0.
Preparation of DNA probes. sPCR fragments were prepared by PCR from the strains from which they were originally cloned by using M13R and T7 primers. PCR amplification was performed using the model PTC-100 programmable thermal cycler (MJ Research), and the conditions used were at 94°C for 1 min, followed by 30 cycles of denaturation at 94°C for 30 s, annealing at 68°C for 30 s, and extension at 74°C for 1 min. The PCR products were purified by using a commercial PCR purification kit (QIAGEN, Inc.) and stored at 20°C for long-term use.
Dot blot hybridizations for PHAT. E. coli strains were probed by using dot blot hybridization with fluorescence-labeled PHAT probes. Briefly, bacterial DNA was prepared by growing strains overnight in LB medium in a 96-well deep-well plate (1 ml per well; Corning, Inc.). Bacterial cells were pelleted by centrifugation at 3,000 rpm in a Beckman desktop centrifuge and lysed with 800 µl of 0.4 N NaOH-10 mM EDTA at 70°C for 30 min. The bacterial lysate was arrayed on nylon membrane (Hybond H+; Amersham Pharmacia) using a BIO-dot microfiltration apparatus (Bio-Rad Laboratories). Nylon membranes were washed with 2x SSC (1x SSC is 0.15 M NaCl plus 0.015 M sodium citrate), dried, and fixed by using UV light. Fluorescently labeled gene fragments were hybridized to nylon membranes and detected by using the ALKPHOS fluorescein-based detection kit (Amersham) according to the manufacturer's instructions. Membranes were prehybridized with 20 ml of hybridization buffer for 30 min, followed by the addition of probe (200 ng). Hybridizations were carried out at 55°C overnight, and membranes were washed with primary and secondary wash buffers according to the manufacturer's protocol. Fluorescent signal was generated by using the ECF substrate provided in the kit. Hybridization intensities were detected by using Storm 860 PhosphorImager (Molecular Dynamics) and analyzed by using Image-QuaNT 5.0. The signal intensity of each spot was normalized to the intensity of each probe's positive control according to a previously published protocol (32). All strains were tested for the presence or absence of probe with a minimum of two independent membranes. Ambiguous results were retested on duplicate membranes and confirmed by Southern hybridization using previously described protocols (32). Sequencing of sPCR fragment DNA was performed at the University of Michigan Molecular Biology Core Facility using an Applied Biosystems model 373A automated sequencer.
MLST.
MLST was performed using the protocols listed on the EcMLST database (www.shigatox.net). Briefly, PCRs were performed to obtain
500-bp fragments for seven housekeeping genes, purified and sequenced at the University of Michigan Molecular Biology Core Facility in both the 3' and the 5' directions. A consensus sequence was obtained for each of the seven gene fragments in 33 strains of E. coli. Allele types were assigned to the PCR-amplified sequences after comparison with the EcMLST database for nucleotide substitutions. The combination of allele types for the seven housekeeping genes gave the sequence type (ST) for each strain.
PFGE. PFGE was performed according to our previously published protocol (8). Briefly, NotI-digested DNA was electrophoresed in a Bio-Rad pulsed-field apparatus (Hercules, CA) in 1.3% SeaKem HGT agarose at 14°C with pause ramping from 10 to 22 s for 14 h and from 55 to 60 s for 8 h at field strength of 6 V/cm. Gels were stained with Vistra green (Amersham Biosciences) and scanned by using a Storm phosphorimager. The data was analyzed by using commercially available software (BioNumerics). The sequenced E. coli strain CFT-073 was used as the internal control for creating a dendrogram based on PFGE types.
ERIC-PCR and automated ribotyping (AR). Ribotyping was performed by using the RiboPrinter microbial characterization system from Qualicon (Wilmington, DE). This automated typing system produces a RiboPrint pattern using an E. coli rRNA probe hybridized to restriction enzyme-digested chromosomal DNA. E. coli strains were digested using EcoRI enzyme based on the manufacturer's instructions. Ribotype groups were defined by the RiboPrinter system, which assigns ribogroups by comparing differences in band number, position, and signal intensity (19).
PCR amplifications of ERIC sequences were performed on E. coli strains using a modification of a protocol described previously (31). ERIC patterns were evaluated by using BioNumerics software from Applied Maths (Kortrijk, Belgium) (16, 31). Briefly, similarity matrices were constructed on the basis of Pearson correlation coefficient analysis of pairwise comparisons of ERIC patterns. We performed clustering analysis and constructed a dendrogram with the unweighted pair group method using arithmetic averages based on the similarity matrices. Strains with more than 90% similarity were placed in the same ERIC group.
Microarray LOS arraying and hybridizations. Genomic DNA (target) was purified from bacterial strains by using a QIAGEN genomic DNA purification kit according to the manufacturer's recommendations, sonicated, and centrifuged, and supernatants were arrayed and hybridized according to previously published protocols (33). Cy3 and Cy5 fluorescence- and biotin-labeled probes were generated from SJX206 and the 16S RNA housekeeping genes by using the BioPrime DNA labeling system (Invitrogen) and appropriate deoxynucleoside triphosphate mixtures. The probes were hybridized to glass slides that were previously arrayed with purified genomic DNA from 106 bacterial isolates in triplicate on Superamine glass slides (Telechem), and the hybridization signals were detected by using a Versarray Chipreader (Bio-Rad). The signal intensity of each spot was normalized to the signal intensity of the 16S RNA probe (housekeeping gene) to account for differences in genomic DNA concentrations at different spots and compared to the intensity of the positive control (sequence strain known to contain the gene probe) to determine the presence or absence of the sPCR fragment in different bacterial strains (see Fig. 6). Since LOS is a high-throughput microarray-based dot blot hybridization platform, we use the criteria established previously to determine probe positive cutoffs in dot blot hybridization to determine the positive cutoff points for LOS (32).
![]() View larger version (75K): [in a new window] |
FIG. 6. PHAT in an LOS microarray format.
|
![]() |
Statistical entropy. To determine the optimal number of probes required for PHAT typing, we calculated the entropy for the first probe and then calculated the entropy iteratively as more probes were added to the PHAT probe set. Entropy (E) is calculated as follows: E = p1 x log(p1) + p2 x log(p2) + ... pk x log(pk), where pk is the contribution of the kth PHAT signature to the total entropy (22). A binary PHAT signature was generated by collating the presence or absence of different sPCR fragments (Table 1) . The occurrence of each unique PHAT signature in the collection was determined as a percentage of the total frequency of all PHAT signatures. This established the contribution of entropy of each unique PHAT signature to the total entropy for a given probe set. The total entropy calculation was repeated iteratively as additional PHAT probes were added to maximize the discrimination with a minimal number of probes for isolates in this collection.
|
View this table: [in a new window] |
TABLE 1. PHAT probe candidates used for the calculation of Simpson's diversity index and entropy calculationsa
|
|
|
|---|
We did a pairwise comparison of the association between the prevalences of each probe in the rectal E. coli sample and all possible combinations of probes. If the association (as estimated by the odds ratio) between two probes exceeded 1.8 and was statistically significant by the chi-square test, the one with the higher prevalence was selected for inclusion in order to reduce redundancy among the probes selected for PHAT typing (data not shown). The final list of candidate probes is shown in Table 1.
Comparison of phylogenetic groupings based on PHAT, PFGE typing, ERIC-PCR typing, and AR. Thirty-three rectal strains from otherwise healthy women with UTI were typed by using PFGE (Fig. 1). We identified 25 pulsotypes (groups by PFGE) using 85% similarity as the cut-point. Note that some strains that are >90% similar by PFGE; for example, 88F62 and 324F63, the third and fourth strains from the top, have a single probe difference in PHAT signature. In contrast, 6F62 (fifth from the bottom of the dendrogram) has a PHAT type identical to that of 88F62, although it is considered quite distant from 88F62 by PFGE. The 72 ECOR strains were also typed by AR and ERIC-PCR (Fig. 2 and 3) and clustered based on their AR and ERIC-PCR types, respectively. A number of strains that were grouped similarly by PHAT and AR had the least resolved PHAT signature (00000000000). ERIC typing gave similar results; for example, ECOR strains 30 and 5, which are only one probe different by PHAT (00100000000), were determined to be more distant by ERIC (<70% similarity), whereas, in contrast, ECOR strains 20 and 21 are only one probe different by PHAT (11000000000 and 10000000000) and >90% similar by ERIC.
![]() View larger version (106K): [in a new window] |
FIG. 1. PFGE and PHAT analysis of 33 rectal E. coli strains. Clustering was constructed using PFGE data.
|
![]() View larger version (56K): [in a new window] |
FIG. 2. AR analysis of 72 strains from the E. coli reference collection (ECOR). The clustering dendrogram was constructed using AR data. PHAT signatures are shown adjacent to the ECOR strain names and phylogenetic groups.
|
![]() View larger version (60K): [in a new window] |
FIG. 3. ERIC-PCR analysis of 72 strains from the E. coli reference collection (ECOR). The clustering dendrogram was constructed using ERIC-PCR data. PHAT signatures are shown adjacent to the ECOR strain names and phylogenetic groups.
|
|
View this table: [in a new window] |
TABLE 2. Discriminatory power of PHAT, a binary typing method, compared to other genotyping techniques among the E. coli reference collection (ECOR) and a collection of human rectal isolates as determined by using Simpson's diversity index
|
![]() View larger version (37K): [in a new window] |
FIG. 4. PHAT analysis of 26 strains from the E. coli reference collection (ECOR) belonging to B2/D phylogenetic groups. The clustering dendrogram was constructed using PHAT signatures. MLST types are shown adjacent to the PHAT signatures.
|
![]() View larger version (9K): [in a new window] |
FIG. 5. Statistical entropy by number of probes used in PHAT in a collection of 106 rectal strains.
|
|
|
|---|
Sequence-based methods such as MLST use the variation within housekeeping loci to determine evolutionary relatedness within strains. Sequence variation in housekeeping genes is more likely to reflect phylogenetic descent than genes whose products are under selection. Thus, MLST is suitable for establishing evolutionary patterns in long-term global studies but less so for discriminating closely related strains (6) or strains involved in pathogenesis and antibiotic resistance. As for Streptococcus pneumoniae, invasive disease is rare for E. coli compared to the frequency of asymptomatic colonization, and MLST genotypes do not always correlate with virulence potential (5). Furthermore, even for MLST, the level of discrimination depends on the number of loci and the degree of allelic variation present in the population (6). For example, MLST lacks the discriminatory power required to distinguish between pathogenic strains of Listeria monocytogenes; in a recent study, more rapidly evolving virulence-associated genes were used to increase discriminatory power (34). Supplementing MLST by including sequence variation in multiple hypervariable loci also increases the discriminatory power of MLST (7, 20). In PHAT, many strains are screened for a few genes, and all strains are scored as 0/1 for each of the genes tested. By expanding the number of probes in the PHAT probe set, the discriminatory power of PHAT can be optimized to differentiate closely related strains.
PHAT resolution was at least as good as PFGE when we compared human rectal strains typed by both methods. However, the classifications of strains were different by the two systems. Strains that were determined to be similar by PFGE were not always classified in the same PHAT group and vice versa. Thus, the underlying genetic differences in the E. coli strains revealed by PHAT and PFGE are different. This is of critical importance in deciding which typing method to use. For example, integration of horizontally acquired genes will result in a change in the banding pattern obtained from PFGE but will be less likely to change the PHAT type, unless one or more of the newly integrated genes are included in the PHAT probe set. Analyzing the differences between closely related PHAT types provides more information about the genetic basis of differences between two strains than does PFGE; for example, we can determine whether strains are related by the loss or gain of mobile genetic element such as one conferring antimicrobial resistance.
A challenge of binary typing is determining the best candidate probe set to get maximum discriminatory power using the least number of probes. The minimum probe set is a function of the study population. For example, the PHAT probes in the present study were developed for human strains of E. coli. In that population discrimination was excellent (D = 94%); however, in the ECOR collection, which consists of E. coli strains from different organisms, serotypes, geographic regions, and phylogeny, discrimination was less (D = 80%). Adding additional probes specific to the diverse species found in ECOR would undoubtedly increase the discriminatory power for PHAT in ECOR.
The discriminatory power observed with PHAT is also influenced by the number of strains to be typed. In theory, an array consisting of n probes can result in 2n signatures, but the number of strains and the nature of probes will dictate the actual number of observed signatures. As the number of strains increases, more "unique" PHAT signatures get populated, resulting in a bigger increase in discriminatory power. To maximize the discriminatory power attainable for a larger set of strains, additional probes may be added. The choice of probes is critical to increasing the discriminatory power of PHAT. Probes that appear frequently across strains in a small study and contribute minimally to the discriminatory power of PHAT may still prove to be useful in a more global epidemiologic setting. An analogy can be found in the coa and spa typing of methicillin-resistant S. aureus strains, where the less discriminating coa typing reveals the relatedness of clonal groups of methicillin-resistant S. aureus strains from temporally and geographically diverse locations (21).
Optimal PHAT probes provide unambiguous results (32). Some probes have a high degree of nonspecific binding and background signal, probably due to the degree of sequence homology with other ORFs. In such cases, probe-positive and probe-negative strains are hard to determine accurately; we excluded such probes from our PHAT set. One of the sPCR fragments initially included, sRB33, was later replaced due to high levels of cross-hybridization with other strains.
In conclusion, binary typing for bacterial strain classification, such as PHAT, provides a high-resolution, direct method that measures the presence or absence of genetic content, and the binary output can be easily formatted in large databases, allowing for data storage and portability. PHAT is a reproducible, cost-effective, and time-effective means for fine discrimination and for identifying short-term outbreaks and person-to-person transmission. Since PHAT relies on the presence or absence of genes determined by dot blot hybridization, it can be easily adapted to a high-throughput LOS microarray format wherein thousands of strains can be typed simultaneously (33). The efficiency gained through the implementation of the microarray dramatically increases the efficiency of the typing process, reducing the cost and time required to type large numbers of strains. When hypervariable loci are used as probes, PHAT complements the basic clonal assignments at a population level from MLST (1, 2). In the long term, PHAT in conjunction with MLST may lead to a more complete picture of strain variations within the context of a slowly evolving core genome.
This study was supported by an award from the National Institutes of Health (grant RO1 DK55496 to C.F.M.).
Published ahead of print on 1 November 2006. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»