Previous Article | Next Article ![]()
Journal of Clinical Microbiology, September 2002, p. 3319-3325, Vol. 40, No. 9
0095-1137/02/$04.00+0 DOI: 10.1128/JCM.40.9.3319-3325.2002
Copyright © 2002, American Society for Microbiology. All Rights Reserved.
Arnaldo Yoshiteru Kuaye,1,
Theresa Gina Cargioli,1 Michael S. Chung,1 Rasmus Nielsen,2 and Martin Wiedmann1*
Department of Food Science,1 Department of Biometry, Cornell University, Ithaca, New York2
Received 23 April 2002/ Returned for modification 29 May 2002/ Accepted 1 July 2002
|
|
|---|
|
|
|---|
Commonly used phenotype-based subtyping methods for L. monocytogenes and other food-borne pathogens include serotyping, phage typing, and multilocus enzyme electrophoresis (MLEE) (14, 24). DNA-based subtyping methods include PCR-based approaches (e.g., random amplified polymorphic DNA and amplified fragment length polymorphism), ribotyping, and pulsed-field gel electrophoresis (2, 13, 16). These DNA-based methods define bacterial subtypes by using either PCR amplification or restriction digestion of bacterial DNA to generate DNA fragment banding patterns. While many of these methods have proven effective for differentiating L. monocytogenes subtypes, DNA fragment size-based subtyping methods have significant drawbacks. For example, despite the existence of software packages for data normalization and analyses (12), these subtyping methods are often difficult to standardize. As a consequence, the ease of exchanging and comparing subtype data among laboratories can be severely limited. While DNA fragment size-based subtyping methods have been used for cluster analyses, they generally do not provide information amenable to the inference of primary genetic characteristics (i.e., nucleotide sequences) for evolutionary analyses. As long-term studies on the epidemiology, ecology, and evolution of bacterial pathogens require subtyping data that can be used to infer and quantify the genetic relatedness of isolates, DNA fragment size-based subtyping methods have limited utility for these applications.
DNA sequencing-based methods are being developed and increasingly used for subtyping and characterizing bacterial isolates. In these methods, complete or partial nucleotide sequences are determined for one or more bacterial genes or chromosomal regions, thus providing unambiguous and discrete data. Sequencing can target a single gene (single locus approach) or multiple genes. The advantages of sequencing methods over DNA fragment size-based typing methods include their ability to generate unambiguous data that are portable through web-based databases and that can be used for phylogenetic analyses (3, 9). While a variety of DNA sequence-based subtyping strategies targeting virulence genes, housekeeping genes, or other chromosomal genes and regions are feasible, multilocus sequence typing (MLST), which is an extension of MLEE, represents a widely used strategy (26).
MLEE differentiates bacterial strains by detecting variations in the patterns of the electrophoretic mobilities of various constitutive enzymes. Cell extracts containing soluble enzymes are separated by size in nondenaturing starch gels, and enzyme activities are determined in the gels through application of color-generating substrates (14). While MLEE has been used to study the population genetics of many bacterial pathogens, including L. monocytogenes (23), this method is difficult to standardize among laboratories.
MLST directly determines the allelic variation of multiple housekeeping genes by using DNA sequencing instead of indirectly characterizing these alleles via measurement of the electrophoretic mobilities of the gene products through MLEE (7). MLST approaches have been developed for several organisms, including group A streptococci, Staphylococcus aureus, Neisseria meningitidis, and Campylobacter jejuni (4-6). MLST traditionally targets multiple loci that have slowly diversified from each other through accumulation of neutral or near neutral changes, thus providing reliable differentiation without the potentially confounding effects of positive selection that may particularly occur in certain categories of genes, such as bacterial virulence or surface genes (20, 26). On the other hand, direct sequencing of virulence genes (9) or intergenic regions may provide more-sensitive discrimination for cluster analyses and short-term epidemiological questions.
While some studies have explored the suitability of sequencing single genes to differentiate L. monocytogenes strains (10, 28), the discriminatory power of DNA sequencing strategies that target multiple distinct genes or regions has not yet been reported. Thus, we selected a well-characterized set of L. monocytogenes isolates to determine the suitability of DNA sequencing of housekeeping and stress response genes (sigB, prs, and recA), two virulence genes (actA and inlA), and two intergenic regions (plcA-hly and hly-mpl) to differentiate L. monocytogenes subtypes. The complete sequence information for these genes and regions was also used to define the most-discriminatory 600-bp fragments within these genes that could be used for rapid sequencing-based subtyping. This study also provides a general outline for a rational approach to the selection of target genes for DNA sequence-based subtyping of bacterial pathogens.
|
|
|---|
|
View this table: [in a new window] |
TABLE 1. L. monocytogenes isolates used in this study
|
|
View this table: [in a new window] |
TABLE 2. Relevant characteristics of target genes and regions used for DNA sequencing
|
|
View this table: [in a new window] |
TABLE 3. PCR primers
|
|
View this table: [in a new window] |
TABLE 4. PCR conditions
|
DNA sequencing and analyses. DNA sequencing was performed at Cornell University's Bioresource Center with an ABI 3700 DNA Sequencer. With the exception of actA and inlA, PCR primers were also used for DNA sequencing. Following initial sequencing with PCR primers, primer walking with primers designed to match strain-specific internal sequences was used to complete the sequencing of actA and inlA. Nucleic acid sequences were proofread and aligned with Seqman (DNAStar) and MegAlign (Lasergene). Cluster analyses were conducted using MEGA, version 2.1 (19), and the unweighted pair group method with arithmetic mean (UPGMA) (number of nucleotide differences) model. The resulting clustering data were assessed together with the alignments to assign allelic types for each region or gene. Two sequences were assigned different allele numbers if the sequences differed by at least 1 nucleotide (nt). Sliding window analyses to determine the number of DNA polymorphisms per 600-bp window for each gene were performed with ProSeq software (http://helios.bto.ed.ac.uk/evolgen/filatov/proseq.html).
A new algorithm was developed and implemented for identifying the most-discriminatory 600-bp region within each gene from a set of aligned sequences (WINDOWMIN). In this algorithm, the most-discriminatory region for a sample of size n is defined as the region that maximizes the number of different DNA sequences in a sample. WINDOWMIN allows the user to define the criteria that will classify two DNA sequences as different. For example, if sequencing errors are common, two DNA sequences may be defined as different from each other if they differ at more than 2 nt positions. If sequencing errors can be ruled out, it may be more reasonable to define two DNA sequences as being different if they differ in at least 1 nt position. For this study, two DNA sequences were defined as different if they differed by at least 1 nt. In WINDOWMIN, the number of different DNA sequences distinguishable by at least k nucleotides in a window starting at position i of the sequences is defined as nk,i. A window starting at position j in the sequence is most discriminatory if nk,j = maxi(nk,i) for a particular value of k. If many possible windows fulfill this criteria, the procedure can be iterated to find the set of windows fulfilling nk+1,j = maxi(nk+1,i) among the sequences fulfilling nk,j = maxi(nk,i). Window size must also be defined; for the analyses performed here, the window size was set to 600 bp. As a practical approach, we iterated the algorithm 10 times and then chose the window with the most segregating sites among the set of remaining windows. WINDOWMIN can be obtained by download at http://www.foodscience.cornell.edu/wiedmann/programs.html.
Nucleotide sequence accession number. The DNA sequences determined in this study have been deposited in the GenBank database and given accession numbers AF497139 through AF497243.
|
|
|---|
Analyses of DNA polymorphisms and allelic types. DNA sequences for each of the seven selected genes and intergenic sequences were aligned and evaluated. The size of these seven target sequences varied from 242 bp (plcA-hly intergenic sequence) to 2,235 bp (inlA) (Table 5). Based on the total number of polymorphic sites in each sequence (Table 5), actA displayed the highest sequence variability (14.3% of the nucleotides were polymorphic) while prs displayed the lowest level of overall sequence variability (4.9%). Only one target sequence (actA) allowed discrimination of all 15 strains characterized. Two strains (FSL J1-022 and FSL J1-047) differed by only a single nucleotide in actA. The other target genes and regions differentiated between 8 and 14 allelic types for each gene (Table 5). Two sequences were assigned different allelic types (e.g., 1 to 15 for actA) if the sequences differed by at least 1 nt; the allelic types for all strains are summarized in Table 6. Table 5 also shows how many allelic subtypes were defined based on at least a single nucleotide polymorphism and how many subtypes could be defined based on a cutoff of at least a 2-nt difference. For example, for the full-length prs sequence 10 allelic subtypes were defined based on at least a 1-nt difference and 7 allelic subtypes were defined based on at least a 2-nt difference. Thus, three of the prs allelic types were defined based on only a 1-nt difference. In addition, allelic information for the housekeeping genes prs and recA and the stress response gene sigB was used to assign MLST types (12 MLST types, A through L) (Table 6) based on the allelic types for all three genes.
|
View this table: [in a new window] |
TABLE 5. Summary of allelic subtypes and polymorphisms in actA, inlA, prs, sigB, recA, hly-mpl, and plcA-hly
|
|
View this table: [in a new window] |
TABLE 6. Allelic profiles of the virulence genes (actA and inlA), intergenic regions (hly-mpl and plcA-hly), and housekeeping genes (sigB, prs, and recA)
|
Sequence analyses of the hly-mpl intergenic region also revealed lineage-specific insertions or deletions. The five lineage I strains show two additional nucleotides at position 49. All lineage II strains have an additional adenine at nt 52. Insertion or deletion polymorphisms at the nt 121 to 129 region also show distinct patterns that differentiate strains in lineages I, II, and III; lineage II isolates have a 1-bp deletion compared to lineage I isolates, whereas lineage III isolates show 9- and 8-bp deletions compared the lineage I and II isolates, respectively.
Definition of most-discriminatory gene fragments.
Since complete sequencing of large DNA fragments (e.g., the 1,929-bp actA ORF) is not practical, in terms of cost and time, for large-scale subtyping, we utilized two different approaches to define smaller (600 bp) gene fragments that will provide optimal strain differentiation, ideally with the same discriminatory power achieved by analysis of the complete ORF sequences. The software ProSeq (version 2.8) was used initially in a sliding window analysis to define the 600-bp region within each gene (actA, inlA, recA, sigB, and prs) that showed the highest number of nucleotide polymorphisms. ProSeq calculated the value
, which is the average number of nucleotide differences per site between two sequences (also termed nucleotide diversity) (22), for each gene from the nucleotide alignment of the 15 strains. The region with the maximum nucleotide diversity is shown as the highest peak on a plot of
determined at sliding windows of 600 bp (Fig. 1). The locations of the 600-bp regions with the maximum nucleotide diversities for each gene are shown in Table 5. For actA, inlA, and sigB, the 600-bp window with the maximum nucleotide diversity did not allow the same level of allelic subtype differentiation as was achieved with the full ORF sequences (Table 5).
![]() View larger version (32K): [in a new window] |
FIG. 1. Graphical representation of polymorphisms within 600-bp sliding windows in the genes recA (a) and actA (b). Pi denotes the average number of nucleotide differences per site between two sequences, or nucleotide diversity (22). Pi was calculated with the computer program ProSeq (see Materials and Methods).
|
|
|
|---|
Target genes for DNA sequence-based subtyping of L. monocytogenes. Housekeeping genes are commonly used for conventional MLST as these genes are thought to diversify by neutral or near neutral nucleotide changes due to the vital roles of these gene products in contributing to an organism's survival (7, 9). While sequence analysis of housekeeping genes has been a valuable tool for studying the population genetics of bacterial pathogens, DNA sequencing of more-rapidly evolving genes may allow more-sensitive subtype discrimination. We thus selected the housekeeping genes recA and prs, the stress response gene sigB, the virulence genes actA and inlA, and the intergenic regions hly-mpl and plcA-hly to determine the relative abilities of these target genes to differentiate closely related L. monocytogenes strains. While prs is located close to actA on the L. monocytogenes chromosome, sigB, recA, and the virulence genes were chosen to represent distinct chromosomal locations. prs was chosen as a target gene that would allow comparison between the discriminatory power of virulence genes and housekeeping genes located in close proximity. actA and inlA were specifically selected as target virulence genes since they are located in two different L. monocytogenes virulence gene islands (18). The complete gene sequences for actA and inlA provided the highest discrimination among the 15 isolates (15 and 14 allelic types, respectively). In addition, preliminary analyses of our data showed that the actA virulence gene indeed may be under positive selection (dN/dS > 1.0; dN, rate of nonsynonymous substitutions; dS, rate of synonymous substitutions) (unpublished data). The complete gene sequences for the housekeeping and stress response genes provided discrimination into 8 to 10 subtypes, and the sequences for the intergenic regions provided 8 and 12 allelic types based on 335- and 242-bp sequences for hly-mpl and plcA-hly, respectively. When sequence information for the two housekeeping genes and the stress response gene was used to define MLST types, a total of 12 types (A to L) (Table 6) could be differentiated. These results indicate that sequencing of the virulence genes actA and inlA provides the most-discriminatory DNA sequence-based subtyping for L. monocytogenes. While limited sequencing and PCR-restriction fragment length polymorphism analysis of other L. monocytogenes virulence genes has been performed by various groups (10, 27, 28, 30), the results reported here provide the first comparative evaluation of different target genes for subtyping L. monocytogenes.
Our results support previous observations that DNA sequencing of virulence or surface protein-encoding genes that may have been subjected to positive selection pressures can allow more-sensitive strain discrimination than sequencing of housekeeping genes can. To illustrate, Enright et al. (9) showed that emm sequences differentiated more subtypes than single housekeeping gene sequences in Streptococcus pyogenes. However, interpretation of subtyping results based on highly variable sequences of virulence genes (such as actA) or surface proteins may be misleading, particularly if the data are used to probe the population genetics or long-term phylogenetic patterns of bacterial pathogens such as L. monocytogenes. Specifically, high rates of evolution and recombination among these genes may not reflect the true phylogenetic relationships among isolates (26). To overcome this obstacle, a subtyping scheme that includes sequencing of selected virulence genes in combination with sequencing of housekeeping and/or stress response genes and of regions with little or no selective pressure (such as intergenic regions) may provide the most appropriate approach for subtyping L. monocytogenes and other bacterial pathogens. Initial analyses for recombination within the two housekeeping genes and one stress response gene sequenced here indeed showed that these genes show no (sigB and recA) or weak (prs) indication for recombination (unpublished data). The sequencing of additional housekeeping genes as previously described for MLST approaches (4-6) for other bacteria may further improve the ability to study the phylogeny of L. monocytogenes. The inclusion of both positively selected genes, such as actA, and neutrally selected intergenic regions (such as the more-discriminatory plcA-hly region) allows maximization of the discriminatory power of a typing scheme though. Maximum discrimination is particularly important for bacterial pathogens such as L. monocytogenes, for which rapid and standardized cluster detection through molecular subtyping represents a critical public health need (29). The use of virulence gene targets in DNA sequence-based subtyping strategies also creates the opportunity to use pathogen-specific PCR primers to develop integrated PCR-based detection and subtyping strategies that do not require a culturing step (5).
Rational design of high-throughput DNA sequence-based subtyping schemes. Sequencing of complete virulence, housekeeping, and stress response genes (with ORF lengths between 780 and 2,235 bp) (Table 5) does not provide a suitable approach for high-throughput subtyping. High-throughput sequencing for subtyping purposes generally targets gene fragments between 450 and 600 bp in length, since these fragment sizes can easily be amplified and sequenced with a single set of primers or one set of sequencing primers nested inside the PCR primers (7, 20). In the past, target regions have been selected without prior identification of specific desirable characteristics, such as an optimum number of polymorphic and discriminatory nucleotide sites. To identify target regions through a rational strategy, the complete ORFs for the five genes sequenced in this study were aligned to define the 600-bp section(s) for each gene that was (i) most polymorphic (with ProSeq) or (ii) most discriminatory (with WINDOWMIN). Not surprisingly, most genes, and particularly the larger virulence genes actA and inlA, displayed considerable differences in the numbers of polymorphic residues found in different regions. Thus, the allelic discrimination achieved with different 600-bp regions within a given gene also differed considerably (Table 5). Interestingly, the 600-bp regions with the highest proportion of polymorphic residues did not necessarily provide the highest level of allelic discrimination. For three of the five genes sequenced (actA, inlA, and sigB), WINDOWMIN was able to define more discriminatory 600-bp regions than a program (ProSeq) that only determined the most-polymorphic 600-bp region within a gene. We conclude that our newly developed algorithm provides an improved rational approach for the selection of target regions for DNA sequence-based subtyping of bacterial pathogens and other microorganisms.
DNA sequence-based subtyping in L. monocytogenes. A variety of molecular subtyping approaches have been applied to L. monocytogenes, and the application of these techniques has allowed a better understanding of the biology, ecology, and epidemiology of L. monocytogenes and other bacterial pathogens (29). Studies using subtyping approaches also have suggested that L. monocytogenes subtypes may display heterogeneity in their potentials to cause disease in humans and animals (15, 16, 30). The data reported here provide the framework for the development and implementation of DNA sequence-based subtyping methods for L. monocytogenes. We have identified specific 600-bp regions that provide the most discriminatory targets for subtyping within different L. monocytogenes genes. The presence of insertion-deletions in some of these regions (e.g., at nt 852 to 1019 in actA) may complicate the interpretation of subtyping results for some isolates and may hamper design of PCR primers that allow reliable amplification of all L. monocytogenes subtypes. Thus, in addition to identifying the most discriminatory gene regions, the presence of insertion-deletions must also be carefully considered when selecting target regions for DNA sequence-based subtyping methods. Interestingly, our results also indicate that targeting specific insertion-deletions (e.g., in the hly-mpl intergenic region) by appropriate PCR assays may allow for sensitive differentiation of the three previously described L. monocytogenes lineages (17, 30). Phylogenetic analyses of housekeeping, stress response, and virulence gene sequences also confirmed that the 15 L. monocytogenes isolates tested fall into the previously determined three distinct lineages (unpublished data) (30). The existence of these lineages has previously been confirmed by a variety of subtyping methods and thus appears to be evolutionarily relevant (29).
While current DNA fragment size-based subtyping methods (such as pulsed-field gel electrophoresis and ribotyping) may provide good subtype differentiation, data obtained by these methods typically cannot be used to determine the evolutionary relatedness of isolates. The implementation of DNA sequence-based subtyping approaches for routine characterization of human, animal, and food L. monocytogenes isolates will not only allow for sensitive and standardized subtyping for outbreak detection, but will also provide an opportunity for using subtyping data to probe the evolution of this food-borne pathogen and to track the spread of clonal groups (25, 29). DNA sequence-based subtyping methods will also provide standardized data that can easily be shared electronically and through the World Wide Web (26), concomitantly providing public health professionals and laboratories around the world with direct access to the information needed to identify and monitor emerging pathogenic bacteria. The resolution power of DNA sequence-based subtyping methods is unmatched by any other subtyping method. For example, while it is estimated that MLEE requires approximately 26 nt changes in order to determine a new electrophoretic type (1), one single nucleotide change at a targeted locus will result in a new subtype classification for DNA sequence-based subtyping (8). The continued development of new technologies for automated high-throughput sequencing and the availability of these technologies at a reasonably low cost will further facilitate widespread implementation of DNA sequence-based subtyping methods. Further application of the DNA sequencing-based subtyping approaches described here on large sets of epidemiologically well-defined isolates will also provide critical validation of the subtyping scheme proposed here.
We thank Celine Nadon and Kathyrn Boor for help with this project and for critical review of the manuscript.
Present address: Faculdade de Engenharia de Alimentos, Universidade Estadual de Campinas, 13083-970 Campinas, SP, Brazil. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»