Previous Article | Next Article ![]()
Journal of Clinical Microbiology, September 2004, p. 4275-4283, Vol. 42, No. 9
0095-1137/04/$08.00+0 DOI: 10.1128/JCM.42.9.4275-4283.2004
Copyright © 2004, American Society for Microbiology. All Rights Reserved.
W. Mounts,2 F. McAleese,1 F. Immermann,3 D. Macapagal,1 E. Marsilio,4 L. McDougal,4 F. C. Tenover,4 P. A. Bradford,1 P. J. Petersen,1 S. J. Projan,5 and E. Murphy1*
Wyeth Research,1 Biometrics Research,4 Pearl River, New York,3 Wyeth Genomics,2 Protein Technologies, Cambridge, Massachusetts,6 Division of Healthcare Quality Promotion, Centers for Disease Control and Prevention, Atlanta, Georgia5
Received 28 January 2004/ Returned for modification 26 March 2004/ Accepted 13 May 2004
|
|
|---|
|
|
|---|
The techniques that are currently used for strain surveillance include spa typing, ribotyping, pulsed-field gel electrophoresis (PFGE), and multilocus sequence typing (MLST) (2, 3, 8, 20). Although these methods have proven valuable in monitoring strain relatedness, none extensively defines the genes that constitute the organism(s) under investigation. Further evaluation of the genes of interest within an individual isolate can be accomplished by PCR amplification of the particular loci of interest followed by restriction enzyme analysis or sequencing. Vandenesch and colleagues recently combined PFGE, MLST, and PCR for 24 individual virulence genes in a study that was designed to examine the relatedness and genetic composition of 117 community-associated oxacillin-resistant S. aureus (CO-ORSA) isolates from several countries (22). The study demonstrated that despite geographic origin or relatedness of the strains, all CO-ORSA isolates contained the Panton-Valentine leukocidin virulence factor locus. Other virulence factors that were analyzed were not detected within all strains. It is likely that additional genes, which were not subjected to PCR analysis in that study, are also conserved across all CO-ORSA isolates and may play a direct role in the prevalence of these strains within the community.
Recently, Musser and colleagues used a more comprehensive approach to evaluate the relatedness and genomic composition of 36 S. aureus clinical isolates (9). Using a DNA microarray constructed from the genomic sequence of the S. aureus COL strain, a comparison was made among the hybridization patterns of genomic DNA isolated from each isolate. The study identified genomic elements that were present in COL but absent from the strains of interest. Inferences could also be made regarding the relatedness of strains analyzed which, although not extensively evaluated, generally correlated well with more-established methods of determining genetic relationships, such as multilocus enzyme electrophoresis. However, the study was limited to comparing strains of interest with respect to COL open reading frames (ORFs) represented on the microarrays used. As a result, genes that were present within clinical isolates but that were not contained within the COL DNA sequence could not be identified, despite their potential importance in pathogenesis. In addition, many well-studied genes are not present in COL and could not be analyzed, including collagen adhesin (cna), epidermal cell differentiation inhibitor (edin), exfoliative toxins A and B (eta and etb), staphylokinase (sak), Panton-Valentine leukotoxin (luk-PV), and toxic shock syndrome toxin (tst). Similarly, all antimicrobial resistance determinants (except tetracycline resistance), loci for heavy metal resistance, non-type 5 capsule biosynthesis genes, and 27 enterotoxin and exotoxin genes, as well as 315 hypothetical genes from the genomes of S. aureus strain N315 or Mu50, or individual GenBank records could not be interrogated by the microarray used.
In the present study, we have used an Affymetrix GeneChip that represents predicted ORFs from six genetically divergent S. aureus strains and novel GenBank entries to analyze the relatedness of 21 ORSA isolates and each of the strains for which genomic sequencing information is currently available. The 21 isolates represent strains of eight U.S. oxacillin- and methicillin-resistant lineages (15). We compared these results with ribotyping and PFGE results and found that the GeneChip typing method used was more discriminative than conventional approaches. In addition, we established parameters that predict whether individual genes are present within a given strain. These parameters have been validated by PCR analysis, growth on selective media, and DNA sequence analysis. Collectively, these results demonstrate that GeneChips provide an extensive discriminative genotyping procedure and simultaneously provide the ability to identify specific loci that are present or absent within an S. aureus strain.
|
|
|---|
Several well-conserved genes were divided into three segments so that 5'/3' signal ratios could be tracked (SA0014[rp1I], SA0352[rpsF], SA0459 [rplY], SA0506[tuf], SA0578[conserved hypothetical], SA0600[conserved hypothetical], SA1129[conserved hypothetical], SA1334[proC], SA1456[aspS], and SA1682[conserved hypothetical]). Several very large genes encoding surface proteins were divided into 5,000-bp segments (SA0173 and homologs in other strains [gramicidin S synthetase], SA1267 and homologs [ebhA], SA1964 and homologs [fmtB], COL-SA0379 [bacteriophage L54a peptidase], COLSA1472 and homologs [putative pathogenicity protein], COL-SA2676 [LPXTG surface protein], and biofilm-associated protein [bap]). Genes known to contain constant and variable domains, such as agrB and agrC, were separated into individual domains.
Clustering. The ORFS were clustered to identify homologs among the individual strains by using the CAT 4.5 (clustering and alignment tool) software from DoubleTwist (4). To maximize both the discriminatory power of the array and the number of unique probe sets representing each qualifier, the sequence identity threshold for inclusion in an alignment was set at 97%. Each resulting cluster contained a set of highly similar ORFs that were aligned to derive consensus sequences. The clusters were manually curated to correct any obvious errors, such as those in which two adjacent genes merged into a single cluster as a result of a read-through error in a sequence record.
Intergenic regions and probe selection. Intergenic sequences of >50 bases in length were extracted from the N315 genome by using the published ORF coordinates. Any portion of an intergenic region with >90% identity to an ORF in another genome or in the N315 Glimmer ORF set was excised. Intergenic sequences (both strands) were clustered separately from ORFs. The ORF and intergenic sequences were submitted to Affymetrix (Santa Clara, Calif.) for probe selection. Thirty-four probe pairs were requested for each ORF, and 15 were requested for each intergenic region. The final array contained 7,792 S. aureus qualifiers recognizing 4,380 ORFS, 3,343 intergenic regions, and 69 exogenous control probe sets.
DNA isolation and labeling. S. aureus strains were grown overnight in brain heart infusion medium in ambient air at 37°C with vigorous aeration. For chromosomal isolation, 1.5 ml of an overnight culture was placed in a 1.5-ml Eppendorf tube and was centrifuged for 5 min at 4°C at high speed in a table-top centrifuge. Supernatants were discarded, and cell pellets were resuspended in an equal volume of ice-cold TE buffer (10 mM Tris, 1 mM EDTA; pH 8.0). Suspensions were then placed in 2-ml lysing matrix tubes (Bio 101, Vista, Calif.). Cells were lysed by shaking in an FP120 reciprocating shaker (Bio 101) two times at 6,000 rpm for 20 s, and cell debris was pelleted by centrifugation at high speed in a table-top centrifuge for 10 min. Chromosomal and plasmid DNA was then purified from the supernatant on a Qiagen DNA tissue easy column (Valencia, Calif.), following the manufacturer's recommendations for bacterial DNA purification. Two micrograms of purified DNA was subjected to electrophoresis on a 0.8% native agarose gel to assess DNA integrity. For DNA labeling, 5 µg of purified DNA was incubated at 90°C for 3 min and then plunged into an ice bath, followed by standard DNA fragmentation and labeling procedures according to the manufacturer's (Affymetrix Inc.) instructions for labeling mRNA for antisense prokaryotic arrays. A 1.5-µg aliquot of labeled DNA was hybridized to a GeneChip and was processed as per the manufacturer's protocol for GeneChip hybridization and washing. GeneChips were scanned as previously described (7). Signal intensities for elements tiled onto each GeneChip were normalized to account for loading errors and differences in labeling efficiencies by dividing each signal intensity by the mean signal intensity for an individual GeneChip. Results were analyzed using GeneSpring version 6.1 (Silicon Genetics) and Spotfire version 7.0.
PCRs. All PCR assays were performed using Invitrogen's platinum PCR Supermix kits (Carlsbad, Calif.), following the manufacturer's recommendations. Amplification of the cna gene was accomplished using the primers 5'-ACTGGACACATACGTGGACAGGATT and 5'-TTTTCCTGTTGCTTTTCCATCTTGA. Primers for PCR amplification of the srtA gene included 5'-AGCAGCAAGCTAAACCTCAAATTCC and 5'-AAGATTTTACGTTTTTCCCAAACGC. PCR amplification and typing of the agr locus was accomplished following the procedure described in reference 14.
Ribotyping and PFGE. Strains were subjected to PFGE as previously described (15). Ribotyping was performed using the RiboPrinter system (Qualicon, Wilmington, Del.) according to the manufacturer's instructions. Each strain was analyzed using two restriction enzymes, EcoRI and PvuII. Computer-generated riboprints for each strain were assigned to an EcoRI or PvuII ribogroup by the software and then visually inspected for correct assignment into ribogroups. Individual ribotypes were assigned to a strain based on identity of ribogroups for both restriction enzymes.
|
|
|---|
Analysis of sequenced strains. Prior to using the Saur2a array to evaluate the genetic composition and relatedness of clinical isolates, we determined the accuracy with which the array monitored the genes that constitute each of the six strains that were used in its development. We also analyzed S. aureus strain MW2, whose sequence was published after Saur2a was developed (1). An initial analysis was performed using Affymetrix Microarray Suite 5.0 algorithms, which provide detection calls (present, absent, or marginal) for RNA transcripts. Although these algorithms have been optimized for RNA analysis, they initially served to analyze DNA purified from each of the sequenced strains. DNA from each strain was labeled and hybridized to a Saur2a array. The signal intensity for each qualifier (predicted ORF or intergenic region) was measured. Affymetrix algorithms determined a detection call for each qualifier. Results are shown in Table 1.
|
View this table: [in a new window] |
TABLE 1. Analysis of sequenced S. aureus strains
|
The COL strain used was obtained from a publicly available strain repository. Upon further investigation, the strain produced two colony morphologies following extended incubation on nutrient-rich agar plates. Each colony type was purified and was subjected to microarray analysis (Table 1). One colony type (COL type 1) demonstrated an error rate of 0.1%, was resistant to tetracycline, and matched the profiles of COL strains obtained from three independent laboratories. The second colony type (COL type 2) demonstrated an error rate of 4.5% and was tetracycline susceptible. These results suggested that the initial strain analyzed was contaminated with at least one additional S. aureus strain, which was likely to be the cause of major discrepancies between the expected and observed results. Scientists at the strain stock center confirmed that the repository stock was contaminated with another S. aureus strain. (The strain stock center subsequently notified investigators to whom contaminated stocks had been shipped.)
Although Affymetrix algorithms accurately detected genes known to constitute each of the strains under investigation, between 424 and 1,838 other genes were erroneously identified to be present in each strain (Table 1). Based on sequencing data, these genes were known to be missing from the genomic sequence of a given strain but were called present by Affymetrix standards. This finding indicated that Affymetrix algorithms tended to overestimate the number of genes that were present within a DNA sample. As a result, we set out to redefine parameters that would allow more accurate call determinations to be made for all genes of an organism under investigation.
Calculating present and absent call determination values. The goal was to evaluate each sequenced strain independently and empirically identify signal intensity cutoff values to define whether a gene was present or absent within a given strain. A present call cutoff value was set so that 90% of the qualifiers known to be present in a given strain would have signal intensities above the designated value. Similarly, a second cutoff was to be defined so that 90% of the qualifiers expected to be absent would have signal intensities below this value.
The initial step in defining the appropriate cutoff values was identifying all the genes expected to be present in a given sequenced strain by using the 70% matching requirement described above. Next, GeneChip data for that strain were normalized to account for differences in labeling efficiencies. Normalization was accomplished by dividing the log-transformed raw signal intensity of each qualifier by the mean signal for the entire chip. For each strain, the distribution of the normalized values for the qualifiers expected to be present was examined, and the cutoff value that defined 90% of the signals was determined. This value was termed the adjusted present call determination value. Similarly, an adjusted absent call determination value was calculated based on the distribution of normalized signal values for the qualifiers expected to be absent. Qualifiers demonstrating intermediate signal intensities (between the absent and present call determinates) were considered undeterminable. The distribution of expected present and absent qualifiers for strain NCTC 8325 is shown in Fig. 1. All other sequenced strains demonstrated a similar distribution (results not shown).
![]() View larger version (19K): [in a new window] |
FIG. 1. Distribution of strain NCTC 8325 normalized signal intensity values for all qualifiers represented on Saur2a. Light grey indicates qualifiers with <70% perfect-match probe matches to strain NCTC 8325 chromosomal DNA. Dark grey indicates qualifiers with 70% perfect-match probe matches to the strain NCTC 8325 genome.
|
|
View this table: [in a new window] |
TABLE 2. Adjusted call determinations for sequenced S. aureus strains
|
6,000 qualifiers for each strain. To test the reproducibility of the adjusted call determination method, DNA isolation, GeneChip hybridization, and analysis of the adjusted detection calls were repeated for strains COL (two times) and Mu50 (one time). As expected, an average of 6,410 qualifiers (83.2%) were accurately determined, and 736 qualifiers (9.5%) were incorrectly identified as present or absent (data not shown). These results suggested that the adjusted determination calls allowed for reproducible results among a diverse strain set. The accuracy of the adjusted call method was analyzed in a number of ways. First, comparisons were made between the Affymetrix and adjusted call determinations for 315 genes that are known to be missing from COL but represented on the Saur2a array (data not shown). COL strains from three independent laboratories were analyzed. Affymetrix software erroneously identified between 28 and 46 genes to be present in the three strains (8.8 to 14.6% false-positive rate), whereas 10 to 15 genes (3.0 to 4.7%) were undeterminable. In contrast, adjusted present call determinations indicated that one gene was incorrectly identified to be present in each sample (0.3% false-positive rate) and either none or one gene (<0.3%) was considered undetermined for each replicate. Comparisons of genes that are known to be present in COL and represented on the array demonstrated a slight advantage to using Affymetrix algorithms (0.1% false-negative rate) compared to adjusted call values (1.2% false-negative rate). To further evaluate the two processes, GeneChip analysis was performed on 21 clinical isolates that were obtained from the Centers for Disease Control and Prevention (CDC), using both Affymetrix and adjusted present and absent call determination values of 0.894 and 0.981, respectively.
Table 3 compares Affymetrix and adjusted call determinations for the genes encoding collagen adhesion (cna) and sortase (srtA) virulence factors. Both Affymetrix and adjusted call determinations indicated that the gene encoding sortase (srtA) was present in every strain analyzed. These results were validated by PCR detection. However, discrepancies were observed between the two detection methods when analyzing the gene encoding collagen adhesion. Affymetrix algorithms determined that cna is present in both isolate 20 and the sequenced strain Mu50. cna was also considered present in two of three COL samples tested, and it was undeterminable in the third sample (laboratory 3) and also in strain N315. In contrast, adjusted call determination parameters indicated that cna was absent in each of these strains (isolate 20, Mu50, N315, and each of the COL samples). In each case, PCR analysis, as well as sequence analysis of Mu50, N315, and COL demonstrated that the adjusted determinations were correct and that cna is not present in these strains, further validating the accuracy of the methodology (Table 3; Fig. 2A).
|
View this table: [in a new window] |
TABLE 3. Comparison of Affymetrix, adjusted present calls, and PCR detection for sequenced and clinical isolates
|
![]() View larger version (32K): [in a new window] |
FIG. 2. PCR detection of cna, srtA, and agr type. PCR analysis was performed for cna and srtA and to distinguish agr alleles for all strains in this study. In each case, PCR results validated adjusted call predictions. (A) PCR detection of srtA (S) and cna (C) for several strains in which there were discrepancies between Affymetrix and adjusted call determinations. Positive (MRSA252) and negative (COL, laboratory 1) control strains are also shown. (B) Multiplex PCR detection of S. aureus agr alleles, as described in reference 14. CDC strains tested are indicated across the top. M, molecular marker; C1 to C4, controls for agr types 1 to 4, respectively.
|
Use of GeneChips to monitor strain relatedness. In addition to simultaneously providing an ability to obtain gene-by-gene information for a strain under investigation, the relatedness of each strain analyzed could also be determined with Saur2a. This was accomplished by using hierarchical clustering to develop a dendrogram that compared the normalized signal intensity of each qualifier for a given strain to the signal intensity of the same qualifier across all strains analyzed (Fig. 3A). Using this approach, strains that have similar signal intensities for all qualifiers are positioned closer together on the dendrogram than strains with divergent genomic compositions (differing signal intensities for the same qualifiers).
![]() View larger version (40K): [in a new window] |
FIG. 3. GeneChip-based genotyping. (A) Saur2a-derived dendrogram (top) with heat map (beneath) for all qualifiers that were analyzed in each strain. The dendrogram illustrates the relatedness of each strain based on the signal intensity of each qualifier across all strains. Within the heat map, each qualifier (total, 7,723) is shown vertically for each strain. Red indicates a high signal intensity; green indicates a low signal intensity. The order of qualifiers is identical for all strains. Scanning horizontally identifies qualifiers that have high signal intensity (red) in some strains but low intensities (green) in others. (B) Dendrogram of CDC strains 10, 13, 12, 9, and 8, which were all considered to be identical strains by both ribotyping and PFGE. The heat map illustrates 36 qualifiers (horizontally) that are considered present in strains 10 and 13 but absent in other strains, based on adjusted call determinations. (C) Growth characteristics of CDC strains 10, 13, 12, 9, and 8 on kanamycin-containing agar plates.
|
|
View this table: [in a new window] |
TABLE 4. Ribotyping, GeneChip, and PFGE genotyping results
|
|
|
|---|
This technology is expected to provide novel information about S. aureus pathogenesis, antimicrobial resistance, and vaccine tolerance. For example, studies such as those performed by Vandenesch and colleagues that demonstrated that the Panton-Valentine leukocidin virulence factor is present in every CO-ORSA strain they tested can now be extended to identify whether these genes are also present in health care institution-associated strains. It is likely that such a study will be helpful in defining whether a subset of genes can distinguish community-associated from nosocomial ORSA strains. Defining the entire repertoire of genes that are conserved across diverse CO-ORSA strains may also clarify how the proteins that they encode influence the prevalence of ORSA within the community.
Several genes have previously been linked to a particular type of S. aureus infection, such as tst with toxic shock syndrome and exofoliative toxins with scalded-skin syndrome. It is anticipated that this technology will also provide the ability to associate subsets of S. aureus genes with particular types of infections. Moreover, because the GeneChip used contains alleles of many genes, the potential exists to associate a particular phenotype with a gene allele. Studies evaluating agr types have previously demonstrated that allelic types do influence pathogenesis and, thus, their identification is important for epidemiological studies. Most clinical isolates are agr group 1. agr group 3 has been associated with community-associated methicillin-resistant S. aureus, group 2 has been linked to intermediate glycopeptide resistance, and group 4 has been associated with exfoliative toxin-producing strains (6, 11, 16, 18, 19, 23). Because adjusted call predictions accurately determined the agr type of each strain analyzed in this study, it is likely that this technology could be used to analyze the association of a specific agr type(s), and other genes or alleles, with disease-causing strains.
Although the technology described herein globally evaluates the genetic composition of an organism, there are currently limitations to the GeneChip approach. Each qualifier contains an average of 34 probes which collectively monitor the presence or absence of a qualifier. Due to the microarray design stringency, in some instances probes are not distributed equally across a gene. In addition, gaps exist between probes for most genes represented on the GeneChip, and as a result not every nucleotide of a coding sequence is interrogated. Therefore, results should be considered an estimation of whether a particular gene is present or absent within a sample. The same limitation would occur with standard PCR-based microarrays, which are less stringent and would erroneously consider a gene to be present if a fragment of a gene or if homologous sequences hybridized to a qualifier. However, the stringency of the GeneChip probe design provides an additional layer of information that allows genes and promoter regions to be monitored for point mutations and deletions (unpublished data). It is unlikely that small deletions or point mutations within a gene would easily be detected with PCR-based microarrays.
Because Saur2a arrays measure over 7,700 qualifiers, it is not surprising that the array has considerable potential for assessing strain relatedness. Importantly, the described approach was robust enough to identify replicates of four very divergent strains. This indicates that the technology can allow for one to determine whether a group of similar strains under investigation are clonal or slightly divergent in genetic composition. This distinction is a critical aspect of monitoring strain outbreaks.
This technology is also likely to be powerful for analyzing the acquisition of antimicrobial resistance determinants and may provide a means to evaluate whether other genetic determinants confer a predisposition, or contribute to, the development of resistance. McDougal and colleagues have recently shown that oxacillin-resistant strains from the United States belong to eight major lineages (15). The present study describes the use of strains from each of these lineages to develop a novel GeneChip-based method of interrogating S. aureus strains. We currently are using the described methodology to further characterize each lineage.
In most cases MLST, ribotyping, and PFGE provide the level of discrimination needed to monitor strains circulating throughout the community and health care environments. These techniques are more rapid, do not require extensive analysis, and can be accomplished at a fraction of the cost associated with microarrays. However, none of these methods also allows one to simultaneously define the genes that constitute the organism(s) under investigation on a genome scale. In addition to the uses described above, we envision the approach developed here to be helpful in characterizing isolates within the same ribo-, MLST, or PFGE group, or in studies where further characterization is needed.
|
|
|---|
Present address: University of Nebraska Medical Center, Omaha, NE 68198. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»