Previous Article | Next Article ![]()
Journal of Clinical Microbiology, May 2004, p. 1923-1932, Vol. 42, No. 5
0095-1137/04/$08.00+0 DOI: 10.1128/JCM.42.5.1923-1932.2004
Foodborne and Diarrheal Diseases Branch, Division of Bacterial and Mycotic Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia 303331
Received 1 August 2003/ Returned for modification 16 December 2003/ Accepted 16 February 2004
|
|
|---|
cluster. The fourth cluster contained a single antigen type, H:z29. The amino acid sequences of the conserved regions within each cluster have greater than 95% amino acid identity, whereas the conserved regions differ substantially between clusters (75 to 85% identity). Substantial sequence heterogeneity existed between alleles encoding different flagellar antigens while alleles encoding the same flagellar antigen were homologous, suggesting that flagellin genes may be useful targets for the molecular determination of flagellar antigen type. |
|
|---|
Serotyping consists of the immunologic classification of two surface structures, O-polysaccharide (O antigen) and flagellin protein (H antigen). Salmonella is unique among the Enterobacteriaceae in that it commonly has two distinct H antigens, the phase 1 and phase 2 flagellar antigens, that are coordinately regulated such that only one flagellar antigen is expressed at a time in a single cell (23). Rarely, Salmonella isolates express additional flagellar antigens. Some of these additional antigens have been unstable while others behave as typical flagellar antigens or are thought to be variants of common flagellar antigen types (10, 12, 19). They have been termed phase 3 and R phases; for simplicity, we will refer to all of these additional flagellar antigens as phase 3/R antigens.
The Kauffmann-White serotyping scheme for designation of Salmonella serotypes, which is used by most laboratories for the characterization of Salmonella isolates, recognizes 46 O serogroups and 114 H antigens that, in various combinations, make up the 2,523 characterized serotypes (19, 20). Some H antigens are composed of multiple antigens, termed factors; for example, H:e,n,x is the designation for a flagellar antigen that consists of three separate factors, e, n, and x, that occur together in one flagellum. The 114 H antigens are composed of combinations of 99 distinct antigenic factors. Flagellar antigens that are immunologically related are known as complexes. For example, the G complex includes all flagellar antigen types that contain antigenic factor g (e.g., g,m; f,g; g,z51), plus flagellar antigen m,t. Flagellar antigen types that include antigen H:z4 are considered the Z4 complex.
The Kauffmann-White scheme for serotype designation includes subspecies identification, which is typically determined by biochemical characterization. Phenotypic and genetic characterization of Salmonella identified two species, Salmonella enterica and Salmonella bongori (3). S. enterica was further subdivided into six subspecies (I, II, IIIa, IIIb, IV, and VI) (5, 8, 9). A seventh subspecies has also been described based on multilocus enzyme electrophoresis (2), but it is not used for the purpose of serotype determination. Subspecies IIIa and IIIb comprise what was formerly referred to as the genus Arizona (7). S. bongori was initially described as subspecies V, but subsequent studies showed that it was sufficiently divergent to be considered a separate species (22). However, for simplicity, S. bongori is still commonly referred to as subspecies V for the purpose of serotype designation. The name S. enterica does not have taxonomic standing with the Judicial Commission of the International Committee of Systematic Bacteriology; the nomenclature used here is based on the recommendations of the World Health Organization Collaborating Center for Reference and Research on Salmonella and is used by most laboratories worldwide (3).
Serotyping by traditional methods has several drawbacks. The complexity of the serotyping scheme makes it difficult to maintain. It requires more than 250 different typing sera as well as 350 different antigens for preparation and quality control of the antisera. Commercial antisera often are unavailable for less common antigens or, if available, are of variable quality. According to the current serotyping method, a minimum of 3 days is required to determine the serotype of an isolate; depending on the complexity of the serotype, it can take much longer. To circumvent the problems associated traditional serotyping, we have begun the development of a system for the molecular identification of serotype based on the genes responsible for expression of serotype antigens. There are many advantages to this approach. DNA probes can be chemically rather than biologically synthesized, making them easier to reproduce and quality control than antisera. The technology for DNA-based assays is fairly universal, making it available to more laboratories than traditional serotyping. Also, DNA-based methods have the potential to be faster and able to be automated, and generally, they are more precise than traditional serological typing.
In most isolates of Salmonella, two genes encode flagellar antigens. fliC encodes the phase 1 antigens, and fljB encodes the phase 2 antigens (30). These genes are coordinately expressed by a phase-variation mechanism (23). fliC is located in one of the flagellar biosynthesis operons, is present in all Salmonellae, and has a homologue in Escherichia coli (16). fljB is located in a region of the genome that is unique to Salmonella and is present in four of the six subspecies. Isolates of S. bongori have been reported to have a gene homologous to fljB, although this species is typically monophasic (1). A triphasic isolate that was genetically described possessed the third flagellin gene, flpA, on a plasmid (24).
Genes that encode bacterial flagellin are typically highly conserved at their 5' and 3' ends while the middle region is generally quite variable. The conserved regions encode the flagellar filament backbone and are critical for the assembly of the filament. The central region, corresponding approximately to amino acids 181 to 390, encodes the surface-exposed and antigenically variable portion of the filament (13-15, 29). Several studies have reported DNA sequences for Salmonella flagellin genes (6, 11, 13, 15, 17, 24, 25, 27, 29). As of June 2003, 74 complete or partial Salmonella fliC alleles and 25 complete or partial Salmonella fljB allele sequences had been reported in GenBank release no. 132, excluding complete genome sequences.
Here we report an analysis of 280 flagellin alleles from Salmonella. We sequenced complete fliC, fljB, and flpA alleles that represented 67 of the 114 known flagellar antigenic types. The 67 flagellar antigen genes that were characterized include all flagellar antigens found in the 100 most common serotypes in the United States; the 100 most common serotypes are responsible for 98% of culture-confirmed human infections (4). We characterized common and unique features of the flagellin alleles as a group, and we performed comparative DNA sequence analysis to determine the amount of genetic diversity within flagellin alleles and to determine whether or not flagellin sequences might serve as a useful target for molecular determination of flagellar antigen type.
|
|
|---|
Genomic DNA extraction. Genomic DNA for PCRs was prepared by using the QIAamp DNeasy kit (Qiagen) and the procedure for bacterial DNA isolation supplied by the manufacturer. Approximately 40 µg of genomic DNA was isolated from a 10-µl loop of bacteria.
Primers. Primers for amplification of the fliC, fljB, and flpA genes and for DNA sequencing are listed in Table 1. Because of their different genomic locations, the sequences that flank fliC, fljB, and flpA are distinct. Primers for the amplification of each gene correspond to the 5' and 3' noncoding sequences. Two nested, external primers were used to sequence each amplicon; an additional six internal primers were used to sequence alleles from the alpha cluster. To sequence alleles from the G and Z4 complexes, six and four additional internal primers were used for each complex, respectively. Additional primers were needed to complete the sequences of the more divergent alleles (b; d; j; k; l,v; z29; z36; z38; and z81). These primer sequences are available upon request.
|
View this table: [in a new window] |
TABLE 1. PCR and sequencing primers used in this study
|
DNA sequencing. All sequencing was performed on an ABI 377 by using the Big Dye sequencing kit (Applied Biosystems). Sequencing reactions were performed according to the protocol supplied with the kit, except that 3 µl of Big Dye mix was used with 3.3 pmol of sequencing primer and 11 µl of template in a 15-µl reaction. Dye terminators were removed with Centri-sep spin columns (Princeton Separations). Sequences were determined for both strands, resulting in twofold redundancy. Unique variable bases were confirmed by sequencing a second sequencing template.
DNA sequence analysis. DNA sequences were assembled and analyzed by using Lasergene 5.0 software (DNAstar) and the Wisconsin Package, version 10.1 (Genetics Computer Group, Madison, Wis.).
Nucleotide sequence accession numbers. All sequences from this study are available from GenBank under accession numbers AY353258 to AY353269, AY353271 to AY353287, AY353289 to AY353296, AY353298 to AY353303, AY353305 to AY353309, AY353311 to AY353389, AY353391 to AY353434, and AY353436 to AY353549.
|
|
|---|
|
View this table: [in a new window] |
TABLE 2. fliC alleles sequenceda
|
|
View this table: [in a new window] |
TABLE 3. fljB and other flagellin alleles sequenceda,b
|
![]() View larger version (19K): [in a new window] |
FIG. 1. Ninety representative Salmonella fliC, fljB, and flpA alleles. Amino acid sequences of representative alleles encoding the 67 flagellar antigens were aligned with the program Clustal V (DNAstar). The figure is a cluster analysis and does not imply any phylogenetic relationships between the sequences. The tree was generated in DNAstar. E. coli sequences are from Wang and colleagues (28); GenBank accession numbers for these sequences are shown on the figure.
|
cluster; the fourth group contained a single antigenic type, z29. The G complex contained all alleles encoding flagellar antigen g plus alleles encoding the immunologically related antigen m,t. The Z4 complex contained all alleles encoding flagellar antigen z4 in addition to antigens z36; z38; and z36,z38. Flagellar antigens z36; z38; and z36,z38 had not previously been noted to be related to the Z4 complex. The
cluster contained the remainder of the flagellar antigen types, with the exception of H:z29, which separated into a group by itself. Comparison of the conserved region. Overall, the 5' and 3' ends of the genes between the fliC, fljB, and flpA alleles were highly conserved, particularly at the extreme ends, where they approached 100% identity. However, with a few exceptions, there were 6 and 8 synonymous nucleotide substitutions within the first 37 and last 30 nucleotides, respectively, when comparing fliC alleles to fljB alleles (Fig. 2). The flpA alleles were closely related to fljB alleles in the conserved regions, as noted by Smith (24), although they encoded flagellar antigen d, which is typically encoded by fliC. We designated these unique sequences at the extreme 5' and 3' ends of the gene as the fliC and fljB signature sequences. There were 20 exceptions to the signature sequences among the 280 alleles sequenced (Fig. 2).
![]() View larger version (21K): [in a new window] |
FIG. 2. Diagram of 5' and 3' ends of fliC, fljB, and flpA. Sequences on the left are nucleic acid sequences; the corresponding amino acid sequences are on the right. Start and stop (Term.) codons are indicated. fliC is the reference sequence; bases or amino acids that differ from this consensus are indicated by substitutions. Numbers in parentheses are numbers of alleles with the sequence.
|
cluster had >89% amino acid sequence identity, but this number rose to greater than 94% amino acid sequence identity when the most divergent antigens (flagellar antigens b and d) were removed from the analysis. Li and colleagues (15) noted that flagellar antigen g,z51 clustered separately from the rest of the G complex; when g,z51 alleles were removed from the analysis, the G complex alleles had greater than 98% amino acid identity in the conserved region. In contrast, there was 73 to 87% amino acid identity between the groups, which is on the order of the sequence identity with representative E. coli fliC sequences from GenBank release 132. Comparison of the variable region. Alleles in the four clusters exhibited different levels of diversity within the variable region depending on the cluster. H:z29 alleles from multiple subspecies had greater than 98% amino acid identity within the variable region. With the exception of the most divergent antigens, H:m,t and H:g,z51, alleles in the G complex had at least 90% amino acid identity within the variable region; some antigens, such as H:g,p and H:g,p,s, differed by a single amino acid. Alleles in the Z4 complex that encoded a H:z4 epitope had greater than 84% amino acid identity in the variable region. Alleles encoding H:z36 and H:z38 antigens shared very little identity with the rest of the Z4 complex in the variable region.
The
cluster, which is composed of the largest number of antigens, was also the most diverse. Alleles encoding immunologically related antigens, such as those of the L or EN complexes, typically had greater than 90% amino acid identity in the variable region. Alleles encoding immunologically unrelated antigens had as high as about 70% amino acid identity in the variable region, although most alleles had no identifiable homology in the variable region. Alleles encoding antigen H:k were noted to be particularly diverse. Sequence comparison of the variable region from H:k alleles from subspecies I, II, IIIa, and IIIb revealed three groups of alleles that differed by about 20 to 25% of their amino acids (Fig. 3). In contrast, alleles encoding H:i from subspecies I, II, and IIIb had greater than 97% amino acid identity in the variable region and greater than 94% amino acid identity when compared to H:i alleles from S. bongori (Fig. 3).
![]() View larger version (35K): [in a new window] |
FIG. 3. Characterization of H:k alleles. (a) Dendrogram of H:k and H:i alleles. The serotype from which the sequence was obtained is listed to the right of each branch. Named serotypes are from subspecies I. Other subspecies are indicated as part of the serotype designation. (b) Alignment of amino acids 250 to 300 of the variable region. Amino acids that differ from the consensus are indicated (boxed).
|
Thr). The deletion and substitution were identical to the corresponding residues in H:i alleles. An allele encoding flagellar antigen 1... (a Phase 3/R antigen related to 1 complex antigens) contained a typical H:1,2 allele but with a 40-amino-acid deletion. Two other H:1... alleles were the same length as 1 complex alleles but were distinct from any other 1 complex alleles (Table 3; Fig. 1). Genetic location of 1 complex, EN complex, and other alleles. The Kauffmann-White serotyping scheme lists most 1 complex flagellar antigens as phase 2 antigens; a few 1 complex antigens are considered phase 3/R. Flagellar antigens of the EN complex are listed in both phase 1 and phase 2. Ten of 34 1 complex alleles that were sequenced were found to be at the fliC locus (Tables 2 and 3). Most of these isolates were from subspecies II, and many had an EN complex allele at the fljB locus. The identity of the antigens encoded by fliC and fljB in these isolates was inferred from their homology to other 1 complex alleles that were encoded by fljB and to other EN complex alleles encoding H:e,h and H:e,n,x. To confirm that 1 complex antigens were encoded by fliC, primers corresponding to sequences in fliD and fliB, which flank fliC in the genome, were designed and used to generate sequencing template. The fliD-fliB PCR fragment was sequenced by using both the fliC and fliD/fliB primers and was shown by sequence homology to encode a 1 complex allele in the fliC locus.
Alleles encoding flagellar antigens a, k, z6, z10, and z41 were also found in loci different from that predicted by the Kauffmann-White scheme and different from where they were typically found. H:a is always listed in phase 1 in the Kauffmann-White scheme but was identified in fljB in serotype II 45:a:z10 (Table 3). H:z6 is listed primarily in phase 2 in the Kauffmann-White scheme; however, an H:z6 allele was found at fliC in a subspecies II isolate with an H:e,n,x allele at fljB. Flagellar antigen d is commonly found in fliC but was identified at flpA and at fljB in the newly described Salmonella serotype Houston (20).
Flagellar antigens not sequenced. The genes encoding most of the phase 1 flagellar antigens, predicted to be encoded by fliC, were found and sequenced (Table 2). However, genes encoding five flagellar antigens that are listed in phase 1 of the Kauffmann-White scheme and expected to be encoded by fliC were not found. The fliC allele found in isolates expressing these flagellar antigens contained a premature stop codon or an insertion sequence that inactivated the fliC gene (Table 4). These sequences, with the insertion or stop codon deleted, had greater than 99% nucleic acid identity to alleles that were determined for other flagellar antigens. For example, an isolate of serotype Aesch, expressing flagellar antigens z60 and 1,2 contained a fliC H:e,h allele that contained a premature stop codon. An isolate of serotype Delmenhorst, expressing flagellar antigen H:z71, had a H:z4,z23 fliC allele with an insertion that was 99.9% identical to insertion sequence 30B family of insertion elements (26). The locations of the stop codons were confirmed by sequencing three independently amplified templates.
|
View this table: [in a new window] |
TABLE 4. Flagellar antigens not encoded by fliC
|
|
|
|---|
We determined the sequences of 280 flagellin alleles from Salmonella, representing approximately 90% of the phase 1 and phase 2 antigenic types. Comparative analysis of fliC and fljB alleles showed that flagellin alleles that are encoded by fliC, fljB, and flpA were homologous, particularly in the conserved region, and upon alignment, they clustered together irrespective of their genomic location. However, most fliC and fljB/flpA alleles could be distinguished by unique, phase-specific sequences at the extreme 5' and 3' ends of the gene. Smith also noted these conserved sequences, but as yet, no biological role has been ascribed to them (24). Most fliC and fljB alleles reported here and in the literature were fairly homogenous in size, ranging from 1,488 to 1,536 bp in size, except for fliC alleles from isolates of the Z4 complex, which were 200 to 300 nucleotides shorter.
Genes encoding flagellin are typically conserved, particularly at the 5' and 3' ends, across most bacterial species. This was also true of the Salmonella flagellin alleles. All flagellin sequences from Salmonella fell into one of four groups based on amino acid identity in the conserved region. Three groups contained multiple flagellar antigen types; these were designated the
cluster, the G complex, and the Z4 complex. Flagellar antigens z36; z38; and z36,z38 were included in the Z4 complex. Although these flagellar antigens appear to be genetically related to the Z4 complex, no antigenic cross-reactivity between these groups has been noted. The fourth group contained a single flagellar antigen type, z29. Within the four clusters, the sequences typically grouped by the antigens they encoded; antigens that were immunologically related were also genetically related.
The genus Arizona was originally described as distinct from Salmonella, and an independent serotyping scheme was developed for these organisms. Subsequently, DNA-DNA hybridization studies showed that arizonae belonged in the genus Salmonella (3), and Arizona serotypes were merged into the Kauffmann-White scheme (5). Most Arizona antigens fit well into the Kauffmann-White scheme; however, a few antigens were not completely compatible. For example, flagellar antigen k is weakly expressed in some serotypes, represented by "(k)" in the Kauffmann-White scheme, where the parentheses indicate a weak seroagglutination reaction. Sequence comparisons from H:k alleles from subspecies I, II, and IIIb revealed three distinct clusters of alleles with 60 to 80% amino acid identity in the variable region (Fig. 3a). In contrast, alleles encoding other flagellar antigens, such as H:i, were highly conserved across the subspecies. This observation suggests that flagellar antigen k could be considered multiple flagellar antigen types.
The Kauffmann-White scheme places most 1 complex antigens in phase 2, a few are listed as a third phase, and none are listed in phase 1. Ten of 34 of the 1 complex antigens sequenced were encoded by the fliC locus. Other antigens were also found at a locus not predicted by the Kauffmann-White scheme. Most of these instances were in less-common serotypes, and only one isolate of each serotype was characterized. It is unknown whether the location of the alleles encoding those antigens is unique to the serotype or to the isolates tested; however, it may be of interest that most of the isolates (14 of 18) belonged to subspecies II. The observation that the genetic location of the antigen-encoding gene does not always correspond to that predicted from the Kauffmann-White scheme should be considered when using DNA sequence data to determine flagellar antigen type.
Genes encoding several of the Salmonella flagellar antigens were not found at either the fliC or fljB locus; either the fliC locus contained an inactivated flagellin allele or the fljB locus was absent in a diphasic strain. The Salmonella serotype Aesch isolate, antigenic formula I 6,8:z60:1,2, had an inactivated H:e,h allele at fliC, suggesting the possibility that it is a variant of the more common Salmonella serotype Anatum (antigenic formula I 6,8:e,h:1,2) with the H:z60 antigen expressed from an uncharacterized genetic locus. Similarly, the Salmonella serotype Delmenhorst isolate, antigenic formula I 18:z71:, has an inactivated z4,z23 allele at fliC and may be a variant of Salmonella serotype Cerro, antigenic formula I 18:z4,z23:. In contrast, H:z48, predicted to be an R phase based on the Kauffmann-White scheme, was found at fliC. We found the gene for only one flagellar antigen that was not located at fliC or fljB, a previously characterized flagellar antigen d allele in a triphasic isolate of Salmonella serotype Rubislaw that was carried on a plasmid (24). Alternate genomic locations for flagellar antigen genes has also been described for E. coli, where several flagellar antigens have been shown to be encoded at loci other than fliC (21, 28). Sequencing flagellin genes that are in undetermined loci in Salmonella may prove difficult; amplifying these alleles with primers to the conserved regions will result in multiple amplicons in diphasic and triphasic strains. Cloning the intact flagellin allele may be required; this approach may have the added benefit of providing information regarding the genomic location of the flagellin allele, but the possibility exists that these genes are located in regions of the genome that are unique to a particular Salmonella serotype(s).
Comparative DNA and amino acid sequence analysis of our sequences and those available in GenBank identified regions that appear to be unique to specific flagellar antigen groups and types. It remains to be determined whether these amino acid substitutions are simply markers for a particular antigen or if they are responsible for the antigenic differences that are detected by serotyping. In either case, the sequence differences will be useful for the molecular identification of alleles encoding different flagellar antigen types. These unique sequences are being targeted in the development of probes that are specific for a particular antigen or group of antigens. We are developing a PCR combined with a DNA enzyme immunoassay to differentiate phase, complex, and specific antigen types. Combination of this approach with the molecular identification for O antigen type may prove to be a useful method for determination of serotype of Salmonella and can complement or largely replace traditional serotyping methods.
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»