Previous Article | Next Article ![]()
Journal of Clinical Microbiology, August 2003, p. 3765-3776, Vol. 41, No. 8
0095-1137/03/$08.00+0 DOI: 10.1128/JCM.41.8.3765-3776.2003
Copyright © 2003, American Society for Microbiology. All Rights Reserved.
Department of Molecular & Cell Biology, Institute of Medical Sciences, University of Aberdeen, Aberdeen AB25 2ZD,1 Peter Medawar Building for Pathogen Research and Department of Zoology, University of Oxford, Oxford OX1 3SY, United Kingdom,3 Dipartimento di Patologia Sperimentale, Biotecnologie Mediche, Infettivologia ed Epidemiologia, Università degli Studi di Pisa, Pisa 56127, Italy2
Received 20 March 2003/ Returned for modification 5 May 2003/ Accepted 7 May 2003
| ABSTRACT |
|---|
|
|
|---|
500 bp) of eight genes encoding housekeeping functions were sequenced, including four that have been described before for C. albicans MLST, and four new gene fragments, AAT1a, AAT1b, MPI, and ZWF1. In total, 87 polymorphic sites were found among 50 notionally different isolates, giving 46 unique sequence types, underlining the power of MLST to differentiate isolates for epidemiological studies. Additional typing information was obtained by detecting variations in size at the transcribed spacer region of the 25S rRNA gene and tests for homozygosity at the mating type-like (MTL) locus. The stability of MLST was confirmed in two sets of consecutive isolates from two patients. In each set the isolates were identical or varied by a single nucleotide. Reference strain SC5314 and a derived mutant, CAF2, gave identical MLST types. Heterozygous polymorphisms were found in at least one isolate for all but 16 (18.4%) of the variable nucleotides, and 35 (41%) of the 87 individual sequence changes generated nonsynonymous amino acids. Cloning and restriction digestion of a gene fragment containing heterozygous polymorphisms indicated that the heterozygosity was genuine and not the result of sequencing errors. Our data validate and extend previous MLST results for C. albicans, and we propose an optimized system based on sequencing eight gene fragments for routine MLST with this species. | INTRODUCTION |
|---|
|
|
|---|
The particular advantages of MLST as a typing method are that DNA nucleotide sequences can be determined by automated technology with minimal subjective interpretation of data such as exists in all methods dependent on phenotypic characteristics, fermentation profiles, and other qualitative comparators. In addition, MLST data from different sources can be archived and distributed electronically and interrogated and added to from distant locations to facilitate comparisons for global epidemiology and population studies. A web site (http://www.mlst.net/new/index.htm) has already been created for data archiving and analysis with six pathogenic bacteria.
Recently, Bougnoux et al. described an MLST system for the opportunistic fungal pathogen Candida albicans (2). Many approaches to strain typing have been developed for this species (32), but no system has yet achieved universal acceptance. Those based on C. albicans nucleotide sequences show greater or lesser diversity of types depending on the extent of conservation of the target sequences. Typing based on the intergenic transcribed spacer sequences of genes encoding rRNA tend to differentiate isolates into a very small number of major subclasses (22, 33), and similar conserved sequences have also been used for differentiation at the species level (4). By contrast, DNA fingerprints revealed with oligonucleotide probes for widely dispersed repeat sequences in the C. albicans genome show a great diversity of strain types (28, 29) and are even capable of revealing minor genomic adaptations to host microenvironments, a process known as microevolution (25, 31). MLST, based on allelic variation in the nonconserved portion of unrelated genes, aims to provide characterization that is sufficiently conservative to be robust and reproducible but provides levels of discrimination appropriate for the purposes both of investigation of clinically relevant problems, such as epidemic outbreaks of infection and resistance to antifungal agents, and of population analyses.
Most MLST schemes described to date have been for haploid microorganisms. The permanently diploid chromosome complement in C. albicans allows extra differentiation of isolates in this species because MLST data for some isolates may show two bases at the same variable site, indicative of the presence of two diploid alleles in these diploid organisms (2).
To optimize and validate MLST for typing C. albicans, we analyzed results with 75 C. albicans isolates for sequences of the most discriminatory genes described by Bougnoux and colleagues (2) and of four other sequences chosen on the basis that the encoded enzymes were shown previously to be polymorphic in multilocus enzyme electrophoresis analyses (26). Our results allow us to propose an improved gene set for C. albicans MLST, and we demonstrated that MLST data for C. albicans showing two bases at a single site indeed represent allelic heterozygosity and not sequencing errors. Finally, we included two non-MLST, DNA-based typing characters in our typing scheme. The first is the subdivision of isolate types as described by McCullough et al., based on sequence variation at the transcribed spacer locus of the 25S rRNA gene (22), which divides the species into three major subtypes of epidemiological significance (21). The second is the determination of homozygosity or heterozygosity at the mating type-like (MTL) locus, originally described by Hull and Johnson (13), which is emerging as being associated with important properties such as antifungal resistance (27) and rapid phenotypic switching (17). We propose a unified scheme for gene sequence-based strain typing of C. albicans that is portable, reproducible, and discriminatory.
| MATERIALS AND METHODS |
|---|
|
|
|---|
The remaining 45 isolates were chosen to represent different degrees of genetic and phenotypic diversity based on their date, anatomical site, and geographical source of isolation. The set included 13 vaginal isolates originating from the United States in 1998, intended to represent isolates with a constant anatomical location. The set of 45 isolates plus SC5314, CAF2, T26, and one representative each from the two sequential series constituted our main set of 50 notionally diverse strain types. The yeasts were maintained on Sabouraud agar (Oxoid, Basingstoke, United Kingdom).
Choice of loci for MLST. Initially, 10 gene fragments were chosen: five that gave the greatest MLST discrimination in the hands of Bougnoux et al. (2) and five that corresponded to a subset of the 13 C. albicans housekeeping genes that were previously shown to be polymorphic in multilocus enzyme electrophoresis experiments (26). We reduced the set to a total of eight gene fragments that gave good discrimination in pilot experiments with 20 isolates of C. albicans (Table 1). Use of these two sets of four gene fragments allowed us to make direct comparisons of the results obtained from our full panel of isolates with those for the same four gene fragments already published (2). Primers were designed to amplify gene fragments of 450 to 750 bp and are also detailed in Table 1. The primers 5'-ACTCAAGCTAGATTTTTGGC-3' (forward) and 5'-CAGCAACATGATTAGCCC-3' (reverse), which are specific for the AAT1a region of the AAT1 gene upstream of the conserved region, were used for experiments to investigate heterozygosity in MLST.
|
Amplification and nucleotide sequence determination. PCR assays were used to amplify the gene fragments. Reaction volumes of 50 µl contained 100 ng of genomic DNA, 2.5 U of Pfu DNA polymerase (Promega, Madison, Wis.), 5 µl of 10x buffer (supplied with the enzyme), 200 µM deoxynucleoside triphosphate mix (Promega) and 10 µM each of the forward and reverse primers. A Flexigene thermocycler (Techne, Cambridge, United Kingdom) was set up with a first cycle of denaturation for 2 min at 94°C, followed by 25 cycles of denaturation at 94°C for 1 min, annealing at 52°C for 1 min, elongation at 72°C for 1 min, and a final extension step of 10 min at 72°C. The amplified products were purified with a commercial PCR purification system (Wizard PCR preps DNA purification system; Promega, Southampton, United Kingdom). Both strands of purified gene fragments were sequenced on an ABI (Foster City, Iowa) 3700 DNA analyzer with a 2.5 µM concentration of the same primers that were used in the PCR step. The sequence data were coupled with DNAStar software. Heterozygosities were defined by the presence of two coincident, equivalently sized peaks in the forward and reverse sequence chromatograms. The one-letter code for nucleotides from the International Union of Pure and Applied Chemistry nomenclature was used to define the results.
Statistical analysis of MLST data. To determine similarities between MLST strain types, the nucleotides at all 87 polymorphic loci found for the eight gene fragments and 50 notionally diverse isolates were scored for each pair of isolates as 0 for identical nucleotides, 0.5 for heterozygous or homozygous pairs that shared one nucleotide, and 0.0 for identical nucleotides. A similarity matrix was generated by adding the 87 scores for each pair of isolates and dividing the result by 87. Each pair of isolates was therefore assessed by a similarity index between 0 (complete nonidentity) and 1.0 (complete identity). To represent the matrix in two dimensions, a single-linkage dendrogram was constructed by the unweighted pair group method with arithmetic mean with the aid of Mega software (http://www.megasoftware.net/). The same software was used to construct a neighbor-joining tree.
To determine closely related genotypes within highly similar clusters, the data were subjected to a Burst analysis (http://www.mlst.net/Burst/burst.htm) that was devised originally for analysis of bacterial MLST data (9, 20). The software determines clonal complexes from the data, suggesting a consensus "ancestral" type which contains the most-represented identical loci for a subgroup of isolates and indicates variants differing at just one, two, or three loci. The results are displayed as concentric circles, with the consensus strain type in the center and each new circle indicating isolates that differ from the consensus sequence by one nucleotide for each circle. The default design of the software analyzes seven gene fragments for five groups. We set our data input to accommodate eight gene fragments and scrutinized the results with group settings of three, four, five, and six. The five-group and six-group settings gave identical and interpretable results.
The discriminatory power (D) of the MLST system was determined by the formula of Hunter (14).
Investigation of heterozygous loci. The accuracy of determination of heterozygosity was ascertained with experiments done with the AAT1a fragment (478 bp) from C. albicans 76/002, which showed four putatively heterozygous sites. It was cloned in the pGEM-T Easy vector system (Promega). Because Pfu polymerase generates blunt ends, the PCR product was A-tailed by incubating 5 µl of the purified gene fragment at 70°C for 30 min in the presence of 1 µl of 10x buffer, 1 µl of 25 mM MgCl2, 1 µl of 10 mM dATP, and 5 U of Taq polymerase. The DNA ligation reaction mix comprised a 10-µl volume containing 1 µl of the A-tailed PCR product, 1 µl of 50-ng/µl pGEM-T vector, 5 µl of 2x rapid ligation buffer (supplied with the enzyme), and 3 U of T4 DNA ligase (Promega), and the mixture was incubated at 4°C overnight. Then 5 µl of ligation mix was used to transform Escherichia coli XL-1-Blue competent cells, following the method described by Hanahan (12). Transformed cells were subsequently plated on Luria-Bertani plates with ampicillin, 5-bromo-4-chloro-3-indolyl-ß-D-galactopyranoside, and isopropylthiogalactopyranoside (IPTG) (12).
Plasmid DNA was extracted from six colonies of E. coli that grew on this medium with the Qiaprep Spin Miniprep kit (Qiagen, West Sussex, United Kingdom), following the manufacturer's protocol. The presence of the expected gene fragments was separately checked by digesting the plasmid DNA obtained from the six selected clones with EcoRI (New England Biolabs, Beverly, Mass.) and by PCR amplifying the AAT1 fragment from plasmid DNA. Plasmid DNA obtained from the six clones was then sequenced as previously described, with a 5 µM concentration of the same primers that were used in the PCRs.
Second, we used MspI to digest the AAT1a fragment from C. albicans 76/002. Sequence analyses had shown that the heterozygosities in this PCR product should result in the creation of an MspI restriction site in one of the alleles at polymorphic site 6. No MspI restriction site was found anywhere else in the whole gene fragment. The gene fragment was digested for 4 h at 37°C with MspI (New England Biolabs) in a 30-µl reaction volume containing 5 µl of the PCR product, 3 µl of 10x buffer 2 (supplied with the enzyme), and 1.5 µl of 20-U/µl MspI. Digestion products were loaded onto a 1.8% agarose gel containing ethidium bromide (0.5 µg/ml). TAE (40 mM Tris acetate [pH 8.0], 1 mM EDTA) was used as the running buffer, and a 100-bp DNA ladder (Promega) was used as molecular size markers. DNA bands were visualized by UV transillumination.
Additional strain typing characters.
PCR for MTL status used the primers Fwd (5'-GAATTCACATCTGGAGGC-3') and Rev (5'-CAAAGCAGCCAACTCAGG-3') for MLT
and Fwd (5'-ACCTGCATGAAGAAACAG-3') and Rev (5'-GTGGCTAGGTTGAATTTG-3') for MTLa. Conditions were as described above, but 50-µl multi-PCR volumes contained 100 ng of genomic DNA, 2.5 U of Taq polymerase (Promega), 5 µl of 10x magnesium-free buffer, 3 µl of 25 mM MgCl2, 200 µM deoxynucleoside triphosphate mix, and 5 µM each of the forward and reverse primers. PCR for the rRNA gene transcribed spacer region was done as previously described (22).
| RESULTS |
|---|
|
|
|---|
For the previously studied gene fragment derived from CaVPS13, we found a further four polymorphic sites upstream and one further site downstream of the portion of sequence already published (2). For CaSYA1, an additional two polymorphic sites upstream of the published sequence were found, and for CaRPN2 one more polymorphic site was revealed downstream of the published sequence. The data for polymorphic nucleotide sites in Table 1 for the four gene fragments already published are limited to the range of the published sequences and do not include these extra polymorphisms. The results show that the isolates that were investigated revealed two new polymorphic sites (positions 157 and 350) within the published range for CaADP1 and two (positions 32 and 307) for CaSYA1.
Nucleotide polymorphisms and amino acid changes. To investigate the impact of nucleotide polymorphisms on amino acid sequence, we mapped the triplet codons for each gene fragment, based on the genomic information available (from the Stanford [http://www-sequence.stanford.edu/group/candida/], Galar Fungail [http://www.pasteur.fr/recherche/unites/Galar_Fungail/], and Minneapolis [http://alces.med.umn.edu/bin/genelist?genes]) for the C. albicans genome databases. While most of the polymorphisms were synonymous, 35 (40%) of the 87 individual changes recorded were nonsynonymous. Details of the alterations are shown in Table 2. Of the 35 amino acid changes, 19 were substantive changes, e.g., basic to acidic side chains, aliphatic to aromatic side chains, etc. In two cases, the change was between proline and serine.
|
|
|
|
C. albicans isolate 76/002 had four potentially heterozygous sites in the AAT1a sequence (genotype 1 in the Appendix). At position 40, the computer analysis of the sequence showed two equally sized peaks for adenine and guanine only for the reverse direction (Fig. 1a), while the putative heterozygosity was detected by sequencing in both directions at nucleotide 124 (Fig. 1b). The putative heterozygosities at loci 7 and 89 were of the clear double peak variety shown in Fig. 1b. Six colonies of E. coli transformed with the cloned AAT1a PCR fragment were selected randomly for plasmid DNA extraction and sequencing analysis. The results obtained showed that four of the six clones, each theoretically containing one of the two alleles of isolate 76/002, carried the bases G, A, A, and C at the four polymorphic sites, while for the other two clones the bases A G, G, and T were identified in the same nucleotide positions.
Validation of MspI polymorphism. One of the putative sequence heterozygosities (position 124, Y = C or T) observed in the PCR product in C. albicans 76/002 created an MspI restriction site (CCGG) in one of the alleles. Since no MspI restriction sites were found anywhere else in the gene fragment, the PCR product was digested with MspI. As shown in Fig. 2, the digested products confirmed that the allele with the MspI restriction site gave the two predicted DNA fragments of 305 and 173 bp, while the other one remained undigested, as evidenced by the DNA band of 478 bp. Therefore, sequencing errors were unlikely to account for the polymorphisms observed in MLST for this diploid organism.
|
. The majority of the isolates (40 of 50) were genotype A by the transcribed spacer element PCR. There were seven type B strains and three type C strains. These characters are shown, together with the MLST data, in Table 3. Epidemiological relationships of isolates typed. In Table 3 the diversity of genotypes detected across all eight MLST fragments is shown for all 50 isolates tested. For the four previously studied gene fragments (2), the published genotype numbers are used, and additional, higher numbers are assigned for the novel genotypes that we found in this study (Table 3). The final column indicating the full range of diploid sequence genotypes (DSTs: the combination of genotypes from all individual gene fragments) indicates that 46 unique types were found among the 50 isolates examined. Two identical DSTs were found for SC5314 and its derivative CAF2. An oral isolate, 78/028, from a healthy volunteer first cultured in 1978, and J981305, from a patient with vaginitis in the United States, were found to be identical to type 22. Three isolates, two from different U.S. patients with vaginitis and an isolate from a penis obtained 17 years earlier in the United Kingdom, shared DST 28.
A UPGMA similarity dendrogram from the data for the 50 isolates tested was generated with Mega software (Fig. 3). The two pairs and triad of isolates with identical MLST types, together with a further 19 isolates, clustered in a single group (bracketed in Fig. 3) with >84% identity by this method of analysis. The remaining isolates generally showed a higher level of diversity. Analysis of the strain types by means of a neighbor-joining tree similarly clustered the same 26 isolates bracketed in Fig. 3 into a single, highly related group. Nine of the 13 vaginal isolates obtained from the United States in 1998 fell into the highly similar clusters in both analyses, with the remainder showing little relation to each other.
|
|
| DISCUSSION |
|---|
|
|
|---|
Data for highly related isolates show that MLST gives high reproducibility within and between laboratories. Isolate SC5314 was also tested by MLST in the study by Bougnoux and colleagues; the SC5314 genotypes found by them (2) were identical to those we determined with the same DNA fragments. CAF2, derived from SC5314 by deletion of one copy of the URA3 gene, also gave the same result. This consistency is a demonstration of the power and reproducibility of MLST applied to C. albicans. Unlike some typing systems, in which reproducibility and discriminatory power are inversely related (14), our data show both 100% reproducibility and a discriminatory power of 0.996 (14). Moreover, sequential isolates from each of two patients gave identical MLST types in one instance and types that differed by only a single nucleotide in the second case. These findings demonstrate that MLST with the gene fragments used in this study can recognize isolates of C. albicans that are identical or nearly so and may be detecting minor microevolutionary changes, known to occur in longitudinal studies with repeated isolates from the same patient (16, 25, 31).
That such changes may occur even in laboratory isolates is exemplified by the finding of four nucleotide differences between strain T26 and its parents SC5314 and CAF2. T26 was engineered from SC5314 by a series of changes that included gene disruptions and selection for spontaneous resistance to echinocandins (6, 15). We conclude that one or more of the steps in the lineage of T26 resulted in the small sequence differences detected in this study. All four nucleotide changes in strain T26 occurred in a single strand of the diploid DNA in the AAT1 gene, since each involved loss of nucleotide heterozygosity.
Among the amino acid changes resulting from 40% of the nucleotide polymorphisms, many involved switches between types of amino acid, including two instances of proline to serine (Table 2), which would be expected to effect significant alterations in secondary and higher peptide structures. If this level of sequence change is representative of variation for the products of other C. albicans genes, it must be concluded that the fungus is tolerant of the observed differences in protein structure. It is likely that many subtle phenotypic differences between strains may exist that have yet to be detected. The high level of genotype-phenotype differences possible between strains of a yeast species was indicated in a recent study based on expression profiling with a fresh, wild-type isolate of Saccharomyces cerevisiae and a laboratory-maintained isolate. This analysis showed that 1,500 of the 6,116 genes in the yeasts differed in levels of expression (3). To what extent the differences in expression are reflected as differences in the genome sequence have yet to be determined.
The findings of this study indicate that the set of genes listed in Table 1 are adequate for high-quality strain typing by MLST. For population genetic analyses and other statistical approaches to the epidemiological study of C. albicans, this larger set of MLST fragments should represent a more discriminatory tool than the six-fragment set already proposed (2). It is notable that the sequence differences between SC5314 and T26 would not have been detected with the published six-fragment set. These two strains are not the only examples of isolates whose types were indistinguishable by MLST with the four published gene fragments but could be distinguished by the new gene fragments. Conversely, some strains could be distinguished by the published gene fragments but not by the new ones.
So far, C. albicans is the only example for which MLST has been attempted with a species having a permanently diploid genome, where heterozygous alleles may occur. The frequency with which heterozygous sites occur in MLST with C. albicans is high, both in our own and in the previous study of MLST (2). The present study investigated the possibility that apparent heterozygosity may arise through sequencing artifacts and confirmed it to be genuine. Heterozygosity in a diploid genome adds extra characters for strain discrimination over simple sequence variation and will allow future analyses of population genetics research based on haplotypes. The relative frequencies of clonal reproduction and sexual or other recombinational events in natural populations of the fungus remain an open question (1, 10, 11, 16, 18, 26, 30, 34).
The addition of two non-MLST characters to a C. albicans strain-typing scheme adds extra discriminatory detail. Data from surveys based on the system that divides C. albicans into three subtypes, A, B, and C, based on sequence differences in the transcribed spacer region in the gene encoding rRNA have already been used to demonstrate geographical differences in C. albicans isolate populations (21). This approach also allows direct, unequivocal recognition of the species C. dubliniensis (22). In common with McCullough et al. (21), we found most of our isolates were type A by rDNA PCR. Of note, all the isolates in the cluster of highly related strains (Fig. 3) were type A. The level of discrimination possible by MLST clearly exceeds that of ribosomal DNA typing, since we could distinguish 36 distinct strains among the 40 designated as type A. All the isolates of type B and type C strains could be differentiated by MLST.
Determination of mating type in C. albicans isolates is of possible relevance to antifungal resistance (27) and to phenotypic switching in this fungus (17). In common with Rustad and colleagues (27), we found that only a minority of isolates were homozygous at MTL. They found 12 homozygous isolates among 96 tested (12.5%); we found 4 among 50 (8%). Even if the frequency of homozygous mating types in the clinical population of C. albicans is only on the order of 10%, this may be adequate to explain small departures from clonality in the Hardy-Weinberg equilibrium analyses that have been described previously for this species (11, 26).
We conclude that MLST with C. albicans offers an effective system for epidemiological work with the species, that the high frequency of heterozygous sequences in the DNA regions chosen for MLST add extra information to MLST that is not available with haploid organisms, and that the creation of a central database for archiving of MLST data will enhance research based on strain typing. Although at present MLST is more likely to constitute a research rather than a reference tool, MLST has the advantage that it is scalable from a small number of isolates to many hundreds or even thousands of isolates by the exploitation of robotic DNA extraction and high-throughput nucleotide sequence determination technologies. The application of high-throughput technology also leads to substantial reductions in the cost of isolate characterization. For example, in this study full MLST profiles were obtained for a consumables cost of approximately US$30 per isolate; however, with recently developed DNA analyzers, a cost of less than US$15 per isolate is now attainable, with the prospect of further substantial reductions in cost in the near future. Such automation will not only reduce costs but also increase throughput of the method. At present, at least 24 isolates per week can be typed with all procedures done by hand, but automation of the processes will increase this number to hundreds per week.
Inclusion of the MTL status and genotype (A, B, or C) as extra typing data further refines the ability of DNA-based methods to distinguish isolates of C. albicans. The findings of this study and the previous investigation (2) show a high level of sequence variation in transcribed housekeeping genes among isolates of C. albicans. We are now establishing MLST for other Candida species and investigating the frequencies of sequence changes in isogenic strains of C. albicans exposed to various conditions in vitro and in vivo.
| APPENDIX |
|---|
|
|
|---|
| ACKNOWLEDGMENTS |
|---|
We thank Amanda Davidson for excellent technical assistance; Merck, Inc., for strain T26; and the colleagues who have supplied clinical isolates of C. albicans to our collection over the last 30 years.
| FOOTNOTES |
|---|
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Antimicrob. Agents Chemother. | Clin. Microbiol. Rev. |
|---|---|
| Clin. Vaccine Immunol. | ALL ASM JOURNALS |
|---|