Genome Sequence-Based Fluorescent Amplified Fragment Length Polymorphism of Campylobacter jejuni, Its Relationship to Serotyping, and Its Implications for Epidemiological Analysis

ABSTRACT The published genome sequence of Campylobacter jejunistrain NCTC 11168 was used to model an accurate and highly reproducible fluorescent amplified fragment length polymorphism (FAFLP) analysis. Predicted and experimentally observed amplified fragments (AFs) generated with the primer pair HindIII+A andHhaI+A were compared. All but one of the 61 predicted AFs were reproducibly detected, and no unpredicted fragments were amplified. This FAFLP analysis was used to genotype 74 C. jejuni strains belonging to the nine heat-stable (HS) serotypes most prevalent in human disease in England and Wales. The 74 C. jejuni strains exhibited 60 FAFLP profiles, and cluster analysis of them yielded a radial tree showing genetic relationships between and within 13 major clusters. Some clusters were related, and others were unrelated, to a single HS serotype. For example, all strains belonging to serotypes HS6 and HS19 grouped into corresponding single genotypic clusters, while strains of serotypes HS11 and HS18 each grouped into two genotypic clusters. Strains of HS50, the most prevalent serotype infecting humans, were found both in one large (multiserotype) cluster complex and dispersed throughout the tree. The strain genotypes within each FAFLP cluster were characterized by a particular combination of AFs, and among the cluster there were additional differential AFs. Identification of such AFs could act as a search tool to look for potential associations with disease or animal hosts, when applied to large number of human isolates. Genome-sequence based FAFLP, thus, has the potential to establish a genetic database for epidemiological investigations of Campylobacter.

Campylobacter is the commonest cause of bacterial gastroenteritis in developed countries. The incidence of human campylobacteriosis in England and Wales has risen in recent years from ϳ34,000 cases reported to the Public Health Laboratory Service Communicable Disease Surveillance Centre in 1990 to ϳ54,000 in 2000. It now surpasses the incidence of reported salmonella infections 3.6-fold (information found at the website http://www.phls.co.uk). Although most isolates are reported simply as Campylobacter spp., available data suggest that over 90% of campylobacters detected by culture belong to the species Campylobacter jejuni. The next commonest species is Campylobacter coli (10). Reproducible and discriminatory typing methods are needed to identify the sources of campylobacter infections, and to trace the routes of transmission of strains through the food chain.
Phenotypic discrimination between isolates of C. jejuni and C. coli uses serotyping of the heat stable (HS) (10,24) or heat labile (16) surface-exposed antigens. The two HS schemes both use antisera raised against the same type strains but differ in their antigen detection methods. The method of Penner and Hennessy (24) uses passive hemagglutination, while the method of Frost et al. (10) employs direct whole bacterial cell agglutination. Recent studies have shown the existence of two distinct HS antigens in Campylobacter, which may account for the differences in the two schemes. These antigens are a polysaccharide capsule (3,13) and a lipopolysaccharide and/or lipooligosaccharide component of the cell surface (17). The majority of C. jejuni isolates belong to only a limited number of HS serotypes, and 19% of isolates are nontypeable with the current antiserum panel using the direct agglutination method. Some of these limitations have been addressed by employing phage typing (9) as an adjunct to serotyping.
General drawbacks of these phenotypic typing methods are the restricted availability, cost and quality of the antiserum reagents and phage panels; cross-reactivity; and the level of nontypeability. These factors have limited epidemiological studies of Campylobacter, both nationally and globally (26). Genotyping may therefore have a particularly important role in studying the epidemiology of Campylobacter, and a number of molecular methods have been applied and evaluated. They include flagellin gene (PCR-restriction fragment length polymorphism analysis) typing, pulsed-field gel electrophoresis (PFGE), random amplified polymorphic DNA, and ribotyping. The main features of and findings by these methods have been reviewed by Wassenaar and Newell (26).
The genome-sampling technique of amplified fragment length polymorphism (AFLP) analysis has been applied in empirical fashion to Campylobacter spp., yielding promising new information based on percentage similarities of AFLP profiles, with regard to taxonomic relationships, genetic groupings, and outbreaks (7,15,20). Recently the complete C. jejuni genome sequence (22) has been released, enabling us to apply to C. jejuni, for the first time, predictive modeling of fluorescent AFLP (FAFLP) (1,12).
The aim of the present study was to model FAFLP from the genome sequence of C. jejuni strain NCTC 11168 and to validate the accuracy and reproducibility of this FAFLP by comparing predicted with experimental AF sizes. We then used the standardized FAFLP analysis to examine genotypic variation in relation to HS serotype for 74 C. jejuni isolates.

MATERIALS AND METHODS
Bacterial strains, phenotypic typing, and culture conditions. The 74 strains of C. jejuni studied were from samples of meat from retail outlets, unpasteurized milk, and human fecal specimens from sporadic campylobacter infections. They belonged to nine of the most prevalent HS serotypes isolated from humans in England and Wales during 1998 and 1999 and belonged to various phage types (PTs) ( Table 1). One C. coli strain (strain 75 [ Table 1]) isolated from a sample of poultry meat was also included. The genome-sequenced strain NCTC 11168 (22) and the reference strains for six serotypes were also included. All strains were HS serotyped using the direct agglutination method and phage typed according to the standard protocols of the Campylobacter Reference Unit (CRU) (Central Public Health Laboratory, London, United Kingdom). Strains were cultured microaerobically at 37°C for 48 h on blood agar plates and preserved for reference at Ϫ80°C in Microbank cryovials (Pro-Lab Diagnostics, Cheshire, United Kingdom).
Standard nucleic acid extraction. Genomic DNA was extracted from 48-h Campylobacter plate cultures using the DNeasy tissue kit (Qiagen Ltd., Crawley, West Sussex, United Kingdom) according to the manufacturer's instructions. The concentration of DNA was estimated using a spectrophotometer (Beckman DU 640) by standard methods (25).
Computer methods. The complete genome sequence of C. jejuni NCTC 11168 (accession no. AL111168) was analyzed with Lasergene (DNAStar, Madison, Wis.) and MacVector (Oxford Molecular, Oxford, United Kingdom) software. Data for the sizes and number of fragments generated by HindIII and HhaI digestion were imported into a spreadsheet. These fragment sizes were then adjusted to allow the addition of primer sequence during PCR amplification. Touchdown PCR cycling conditions were as described previously (4,5). FAFLP products were separated on an ABI 377 automated DNA sequencer (Perkin-Elmer Corp., Norwalk, Conn.) using Premix Long Ranger 5% polyacrylamide gel solution (FMC BioProducts, Vallensbaek Strand, Denmark) as described previously (4,5). Each FAFLP reaction (1 l) was loaded with an internal size marker (GeneScan-2500 labeled with the red fluorescent dye 6-carboxy-x-rhodamine [ROX; PE Applied Biosystems, Warrington, United Kingdom]). The running buffer was 1ϫ Tris-borate-EDTA (TBE), and the electrophoresis conditions were 2.0 kV at 51°C for 14 h. The well-to-read distance was 48 cm. Fragment analysis. Fluorescent amplified fragments (AFs) visualized on polyacrylamide sequencing gels were sized with GeneScan 3.1.0 software (Perkin-Elmer Corp.). Gel displays were transformed into electropherograms, and Genotyper 2.5 software (Perkin-Elmer Corp.) was used to generate a table with presence or absence of fragments. Fragment data were recorded in a binary format in Excel (version 6.0; Microsoft). Dice coefficients of similarity were calculated with an in-house program. Cluster analysis was performed by UPGMA (NEIGHBOR program of PHYLIP) (8), and a dendrogram was displayed with the TreeView program (21).

RESULTS
Predictive modeling of FAFLP for the genome sequence of strain NCTC 11168 and its experimental validation. When the complete genome sequence of NCTC 11168 was analyzed (see Materials and Methods) to predict the sizes of HindIII and HhaI restriction fragments amplified by the primer pair HindIIIϩA and HhaIϩA (7), 61 AFs were expected between 50 and 600 bp. The precision of sizing these AFs was Ϯ0.5 bp. Their positions in relation to the genetic loci on the NCTC 11168 chromosome (22) are shown in Fig. 1. To determine the accuracy and reproducibility of FAFLP for C. jejuni, three different DNA preparations of strain NCTC 11168 were each subjected to three different FAFLP reactions and run on individual sequencing gels. When experimental data were compared with expected values, all but one (sized 462 bp) of the predicted 61 AFs were observed in all nine of the reactions, and no unpredicted fragments were amplified. Of the 60 AFs observed, 36 were in the size range 55 to 300 bp; 35 of these were within 1 bp and 1 was within 2 bp of the predicted size. Among the remaining 24 AFs in the higher size range (300 to 600 bp), 9 AFs were within 1 bp, 6 were within 2 bp, 8 were within 3 bp, and 1 was within 4 bp of the predicted size.
Strain genotyping by FAFLP analysis. The FAFLP profiles of the 75 strains analyzed in this study consisted of 45 to 100 AFs ranging in size from 60 to 600 bp; AFs larger than 600 bp were not included in our analysis. Sixty-one different FAFLP profiles were detected among the 75 strains, and 51 strains including NCTC 11168 and the C. coli strain had unique profiles. There were eight profiles which were shared by eight pairs of isolates each, including a pair of isolates from the same meat sample (isolates 9 and 10 [ Table 1]). One profile was shared by three isolates, and another profile was shared by five isolates; these isolates sharing the same profile were from sporadic infections. Among all the 75 isolates, there were 231 polymorphic AFs (a polymorphic AF is here defined as one that is either uniquely present in a single profile or present in some profiles and absent in others). There were 182 polymorphic AFs among the 74 C. jejuni isolates.
Cluster analysis of FAFLP profiles and relationship to HS serotype. For each strain, AFs were sized (size calling tolerance was Ϯ0.5 bp) and scored as present (as 1) or absent (as 0) in a binary matrix. Cluster analysis was performed and a radial distance tree was generated, on the basis of which FAFLP profiles were given genotype designations ( Fig. 2; Table 1). The radial tree shows 13 major genotypic clusters (designated A-1 to A-6 and B to H in Fig. 2; Table 1). The isolates within all the 13 clusters showed Յ10% divergence. Cluster A complex (shown as a dotted line in Fig. 2) was defined as such to reflect the serological relatedness of its isolates (10) and con- and D and clusters C and E, respectively). All eight strains belonging to serotype HS19 grouped into cluster F, and all five HS6 strains grouped together in cluster H. Cluster A complex (  a Serotype of isolates as determined by direct agglutination method (see Materials and Methods). b FAFLP genotypes were designated following cluster analysis (see Fig. 2) as cluster B to H followed by clone number or, for cluster A complex, as clusters A-1 to A-6 followed by clone type (designated by letters a to f). c A-UG designated to a characterized FAFLP genotype which is unclustered within the cluster A complex. d UG designated to a characterized FAFLP genotype which is unclustered within this study. e Indicates the C. coli strain, which exhibited a diverse and unclustered genotype.
NCTC 11168, which serotypes as HS44. Strains belonging to serotype HS50 were distributed throughout the tree. Nine were found in the multiserotype cluster A complex described above; four of these grouped into cluster A-5, two grouped into cluster A-1, and one strain each grouped in clusters A-2 and A-4, respectively. Two strains within cluster A-complex showed unclustered genotypes (designated A-UG [  (21). The HS serotype of each strain is shown in boldface type following the strain number (see Table 1). Following cluster analysis, FAFLP genotypes were designated as clusters A-1 to A-6 and clusters B to H or as unclustered genotypes, shown in italics (UG1 to UG11). The isolates in all 13 of the clusters showed Յ10% divergence. The dotted line delineates cluster A complex isolates that are serologically related (8), and the isolates within this complex exhibited Յ25% divergence. Within this cluster complex, six genotypic clusters were designated A-1 to A-6 (shown in italics), and two isolates with unique genotypes were designated A-UG (see Results). An asterisk indicates a C. coli strain which was untypeable by both sero-and phage-typing and exhibited genotype UG11. the serotype-specific cluster A-3, two strains each grouped into multiserotype clusters A-5 and A-6, and one strain grouped into cluster G. The remaining two strains had unclustered genotypes ( Fig. 2; Table 1). All three strains belonging to serotype HS5 had unclustered genotypes (Fig. 2, Table 1). The C. coli strain exhibited a divergent and unclustered genotype.
All genotypic clusters included isolates belonging to diverse PTs. The 31 strains in cluster A complex strains belonged to 14 PTs, including five strains of PT33; four strains each of PT2, PT5, and PT35; and three strains each of PT1 and PT36, respectively ( Table 1). The remaining eight strains, however, belonged to single PTs (Table 1). All the three HS2 PT35 isolates grouped together in cluster A-1. Three of the four strains within cluster B belonged to HS11 PT44, while the six HS11 strains of cluster D belonged to five different PTs. Of the four strains in cluster C, two phenotyped as HS18 PT1 and the other two phenotyped as HS18 PT2. Seven of the eight strains in cluster F belonged to HS19 PT2.
For the 74 C. jejuni isolates, 165 of the 182 polymorphic AFs were found to combine in various ways to yield specific FAFLP genotypes. For example, cluster F strains were characterized by a combination of 55 marker AFs, and 10 other intragroup marker AFs distinguished strains within the cluster. This cluster contained clonally related strains of serotype HS19. Among strains of serotype HS11, the four strains in cluster B were differentiated from other clusters by a combination of 59 marker AFs, and cluster D was differentiated by a combination of 63 marker AFs. The two clusters, C and E, containing strains of serotype HS18 were distinguished by a combination of 59 marker AFs and a combination of 57 marker AFs, respectively. The cluster A complex, containing 41% of the strains (31 out of 75), was differentiated from the other clusters by a combination of 39 marker AFs; 57 other marker AFs distinguished the individual strains within it.

DISCUSSION
Human Campylobacter infections now greatly exceed Salmonella infections in incidence, but very few outbreaks have been linked to particular food sources (23). The frequent contamination of food by multiple strains has added to the difficulty of tracing the origins and routes of human infection and of recognizing outbreaks (14). Various phenotypic and genotypic methods have been applied in epidemiological studies of Campylobacter (18), but the question remains whether the recent large rise in C. jejuni infections reflects an increased incidence of sporadic infections or a number of outbreaks. In this study, we have standardized FAFLP to the genome sequence of C. jejuni NCTC 11168 (22) and experimentally validated this high-resolution genotyping technique using the published primer combination HindIIIϩA and HhaIϩA, first defined in the novel empirical FAFLP studies of Duim and colleagues (7). We then investigated its use in reproducibly genotyping C. jejuni isolates.
The 60 experimentally recovered AFs (98% of the predicted AFs) derived from the NCTC 11168 strain represent approximately 1% of its total genome. The precision of sizing AFs was Ϯ0.5 bp. Seventy-three percent of these AFs were within 1 bp of the predicted size. The accuracy of sizing was Ϯ1 bp for AFs up to 300 bp but decreased to Ϯ2 or 3 bp for AFs between 300 and 600 bp. AFs larger than 600 bp were not included in our analysis since the accuracy of sizing decreased further for larger fragments. The variation in sizing of ϾϮ1 bp can be attributed to the irregular spacing of bands in the ROX 2500 internal lane standard, and the use of more evenly spaced size markers would minimize this slight inaccuracy in AF sizing (data not shown). A few single base pair differences between pairs of predicted AFs (e.g., 97 and 98 bp and 104 and 105 bp) which could not be resolved by the ABI 377 sequencer presented as doublets on the electropherograms, both with higher signal strength. A few other identically sized AFs amplified from different regions of the genome (e.g., two of 117 bp and two of 237 bp) appeared either as broader peaks on the trace with a higher signal strength (e.g., the 117-bp AF) or as two AFs 1 bp apart.
Among the 74 isolates of C. jejuni examined in this study, seven AFs were common to all genotypes. This finding is being further investigated in our laboratory, with a view to sequencing AFs which have potential in molecular identification assays, e.g., in DNA arrays. The identification of AFs such as these also shows that FAFLP could act as a powerful search tool when used to screen large numbers of isolates from human disease. Clearly, there is a potential application in searching clinical microbiology FAFLP databases for as-yet-unidentified associations with host specificity and specific differential AFs.
Dingle et al. (6) examined the genetic variation present in seven housekeeping genes of C. jejuni by multilocus sequence typing (MLST). In a strain collection comprising 194 diverse C. jejuni strains from humans, animals, and the environment, they found 155 sequence types (STs) comprising 11 clone complexes and 51 unique STs. One of the largest MLST lineages, the ST-21 complex, predominantly comprised strains of the Penner HS serotypes Pen1, Pen2, and Pen4 and included NCTC 11168. The most commonly encountered HS serotypes in England and Wales include Pen1, Pen2, and Pen44 as serotyped by passive hemagglutination (24), or HS1, HS2, HS44, and HS50 as serotyped by direct agglutination (10). This group of serotypes can therefore be related to the FAFLP cluster A complex. Since our FAFLP cluster A complex included strain NCTC 11168 and comprised multiple isolates from serotypes HS2, HS16, HS44, and HS50, it is likely to correspond to or overlap with the MLST lineage ST-21. By the same reasoning, FAFLP cluster F, comprising strains of serotype HS19, can be related to MLST lineage ST-22. These conclusions are in line with the previously published congruencies of FAFLP with either MLST or multilocus enzyme electrophoresis (2, 11). Olive and Bean (19), in a recently published comparison of various DNA-based typing methods of bacterial organisms, estimate the costs for a single FAFLP reaction at $20 and that for seven-locus MLST of a single strain at $280. In our view, FAFLP exhibits superior cost-effectiveness, adaptability, discriminatory power, and ease of use. Nonetheless the two methods can be described as complementary because, both being based on the genome sequence, they are theoretically as well as experimentally congruent.
Serotype HS50 is the most prevalent serotype among C. jejuni strains isolated from humans, and it accounts for approximately 20% of all infections. The FAFLP data presented here indicate that this serotype is genetically heterogeneous. Although nine HS50 strains were found in the multiserotype cluster A complex and two strains were found in cluster G, there was no congruence between FAFLP genotypes and serotype for the remaining five HS50 strains included in the study. Strains belonging to serotypes HS11 and HS18 grouped into two separate genotypic clusters each. In contrast, all strains belonging to serotypes HS19 and HS6 grouped into serotype-specific clusters, indicating the genetic homogeneity of these two serotypes. The lack of congruence between FAFLP genotypes and certain serotypes and between FAFLP genotypes and sero-phage types of C. jejuni indicates that further investigative genotyping of this major enteropathogen will define its epidemiological clonality more accurately.
We have also demonstrated that FAFLP defines strain genotypes within and between clusters by unique combinations of precisely sized marker AFs. These AFs could serve as identification markers ("bar codes") that define clonality and could be used as the basis of a molecular typing scheme. A continuously updated radial tree such as that shown in Fig. 2 could, in effect, serve as a genetic record of strain genotypes. The genetic data could then be linked in a comprehensive database with epidemiological information and phenotyping data, to facilitate the identification of outbreaks and sources of apparently sporadic human infection.
We conclude that FAFLP is a highly reproducible method for typing C. jejuni (as demonstrated with the NCTC 11168 strain), capable of recognizing genotypic clusters. These FAFLP genotypic clusters were not congruent with all HS serotypes. Within genotypic clusters, FAFLP could readily distinguish between individual strains. We expect FAFLP to become a powerful tool in the macro-and microepidemiological analysis of campylobacter incidence, including outbreak investigations. It could also be used to establish a genetic database of strains drawn from across the food chain and human cases.