Previous Article | Next Article ![]()
Journal of Clinical Microbiology, December 2004, p. 5624-5635, Vol. 42, No. 12
0095-1137/04/$08.00+0 DOI: 10.1128/JCM.42.12.5624-5635.2004
Copyright © 2004, American Society for Microbiology. All Rights Reserved.
Department of Biology, Duke University,1 Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, North Carolina,3 Institut für Mikrobiologie und Hygiene (Charité Hospital), Humboldt Universität, Berlin, Germany2
Received 2 March 2004/ Returned for modification 2 April 2004/ Accepted 13 August 2004
|
|
|---|
|
|
|---|
The order Saccharomycetales contains many species of practical importance and scientific interest, including several species that are pathogenic for humans, such as Candida parapsilosis, Candida tropicalis, and Candida albicans, which is the most common human pathogenic fungus (4). Other species of the Saccharomycetales are exploited by industry to produce secondary metabolites and fermentative by-products (23). During the translation of mRNA to polypeptides, several species of Candida exhibit alternative codon usage. The reassignment of the codon CUG from leucine to serine was first described by Kawaguchi et al. (16) for Candida cylindraceae. Sugita and Nakase (44) applied phylogenetic methods to investigate relationships among Candida species with alternative use of the CUG codon. They showed that only 11 of 78 species of Candida used CUG as a codon for leucine. The remaining 67 species translated CUG as serine, but they did not form a monophyletic group. The authors suggested a correlation between codon reassignment and coenzyme Q9 (Co-Q9) as the predominant ubiquinone in these species based on chemotaxonomy. Co-Q is a mitochondrial electron carrier with various numbers of isoprene units. The length of the isoprene chain is usually consistent within a monophyletic group (49) and has therefore been used in yeast taxonomy.
The order Saccharomycetales is purported to comprise 11 families and 55 genera (23). However, the family assignments of several genera of budding yeasts, e.g., Pichia and Debaryomyces, remain questionable (23). Because of a lack of distinctive morphological characters, molecular methods are invaluable for clarifying the phylogenetic relationships among ascomycetous yeast species. Previous phylogenies were based on single genes (7, 21, 22), sampled only one family (25), or were restricted to the most common pathogenic species of Candida (14, 30, 31, 50). The single-gene studies have focused on the actin gene (ACT1) (7) or the large (26S rDNA) or small (18S rDNA) ribosomal subunit (21) and generally lacked good statistical support, especially for relationships along the backbone of the trees. Relationships among different genera or families of the Saccharomycetales were rarely resolved. Recently, Kurtzman and Robnett (25) presented a multigene phylogeny based on three nuclear rDNA genes (18S rDNA, 26S rDNA, and ITS [internal transcribed spacer]), three protein-coding genes (EF-1
, ACT1, and RPB2), and two mitochondrial genes (small-subunit rDNA and COX2). This impressive study included 75 species but focused on members of the "Saccharomyces complex" and included only one pathogenic species of Candida (25).
The major goal of the present study was to generate a multigenic phylogeny of the families of Saccharomycetales that included the clinical and related species. We used maximum likelihood inference and a Bayesian metropolis-coupled Markov chain Monte Carlo analysis to evaluate the statistical support for the observed evolutionary relationships. We used the phylogeny to examine the evolution of alternative codon usage and isoprene chain length in Co-Q. Although others have correlated these two traits (44), our extensive data set of DNA sequences allowed the application of different statistical methods of character mapping, reconstruction of the ancestral character state, and evaluation of correlated character evolution. We used six nuclear genes (four encoding proteins, 18S rDNA, and 26S rDNA) and included 38 species from 5 of the 11 families recognized by Kurtzman and Fell (23). Several criteria guided the selection of strains: (i) a broad taxonomic sampling that included environmental and clinical isolates, (ii) taxa with different Co-Q systems, and (iii) Candida species with reassigned codon usage. We selected strains that permitted each of these features to be analyzed independently and allowed evaluation of putative associations among them.
|
|
|---|
|
View this table: [in a new window] |
TABLE 1. Taxa and DNA sequences used in this investigationa
|
|
View this table: [in a new window] |
TABLE 2. Primers used for PCR and cycle sequencing
|
(ii) Tests for substitutions and saturation of codon positions. All three codon positions in the protein-coding genes were tested independently for saturation. This was achieved by plotting genetic distances in a two-parameter model (F84; described by Kishino and Hasegawa [18]) and the uncorrected p-distance estimates (20). Deviation from a 1:1 ratio of the two distances was visually evaluated.
(iii) Data set partitioning, maximum likelihood, and Bayesian analyses. To appropriately model nucleotide substitutions, the data set was partitioned in several ways. In addition to analyzing all the data together, we analyzed separately the data for each gene. Each of the four protein-coding genes was analyzed for differences in the first two codon positions and the third codon position. The sequence data were also partitioned into protein-coding and rDNA genes, and the resolving powers of these data sets were compared by using maximum likelihood.
We initially analyzed each gene of the Saccharomycetales data set by using MrBayes v3.0B4 (12). Each protein-coding gene was partitioned into two character sets (first and second codon positions versus the third codon position), and the best-fitting evolutionary model was determined for each partition by using Modeltest 3.06 (40) (Table 3). Each Bayesian analysis consisted of six runs of 2,000,000 generations, each using the default, uniform priors, and a sample frequency of 100. Likelihood scores of each sampled generation were plotted by using Excel (Microsoft Corp.) and visually analyzed. The trees collected before the stationary phase of the chain was reached were discarded. The trees remaining from each of the six runs were combined, and a 95% consensus tree for each gene was generated by using PAUP* 4.0b10 (45). Consensus trees for the six different genes were then compared for topological congruence, as described by Kauff and Lutzoni (15). Taxa were considered to be in conflict when they showed different relationships in two genes, supported by posterior probabilities of
95%.
|
View this table: [in a new window] |
TABLE 3. Overview of evolutionary substitution models applied to combined- and single-gene analyses and to different codon positions in the protein-coding genes
|
Maximum likelihood analyses were applied to the combined data for all the Saccharomycetales in a homogeneous analysis as described above. We also conducted two heterogeneous Bayesian analyses using the same search options and combined model settings that were used for the single-gene analyses. In the first heterogeneous analysis, each of the six genes was accommodated with a different model (Table 3) without consideration of different codon positions ("6-model analysis"). For the second heterogeneous analysis, separate models were applied to the combined first and second codon positions and to the third codon positions of the four protein-coding genes ("10-model analysis"); the 18S and 26S rDNA data sets were not further partitioned (Table 3). We also compared the protein-coding and rDNA genes in separate heterogeneous analyses. Different evolutionary models were applied to the codon positions in the combined protein-coding genes as described above ("8-model analysis"). The rDNA genes were analyzed with one model for the 18S rDNA and another for the 26S rDNA ("2-model analysis").
A likelihood ratio test was used to test whether the heterogeneous model was a significant improvement over the homogeneous model and to determine which heterogeneous analysis best fit the Saccharomycetales data (8). The likelihood of the observed data and the degrees of freedom were calculated using PAUP* 4.0b10 (45) for the homogeneous analysis and p4 v. 0.80 (9) for the 6- and 10-model analyses. To assess whether the likelihood of the more complex model was a significant improvement, the likelihood ratio was compared to a
2 distribution.
(iv) Reconstruction of the ancestral character states and testing for the correlation of character evolution. To study the evolutionary history of alternative codon usage and number of isoprene chain units in the Co-Q system within the Saccharomycetales, we reconstructed the ancestral character states of these traits. Data for translation of the CUG codon were obtained from the National Center for Biotechnology Information database (http://www.ncbi.nlm.nih.gov) and an earlier study by Sugita and Nakase (44) and scored as leucine (1) or serine (0). Information on the Co-Q system was gathered from Kurtzman and Fell (23), supplemented by the CBS database (http://www.cbs.knaw.nl). For the Co-Q system, we coded Co-Qs with nine isoprene units as 1 and Co-Qs with any other isoprene chain length as 0. Each trait was mapped under the likelihood criterion as implemented in Mesquite v. 1.02 (32) on the maximum likelihood tree of the Saccharomycetales, using inferred branch lengths. The likelihood of the observed character state distributions was calculated by using two models: a one-parameter Markov k-state model (27), which is a generalization of the Jukes-Cantor model (13), and an asymmetrical two-parameter Markov k-state model, which allows two different rates of change (forward and reverse) (36). A likelihood ratio test (8) was applied, and the model with the best fit was chosen for ancestral character state reconstruction. At each node the ratios of likelihoods for both character states (0 and 1) were calculated. A likelihood ratio of at least 7:1 for a given character state at a node was considered to be significant (42).
To test whether the two characters evolved independently, we used a continuous Markov model in a maximum likelihood framework, as described by Pagel (36, 38, 39) and implemented in the program Discrete (37). Discrete uses the likelihood ratio test to compare likelihoods for a model with independent transition rates for each character and a model where the transition rate for one character is dependent on the state of the other characters. The null hypothesis of independent evolution is rejected if the model of correlated evolution fits the data significantly better than the simpler model of independent evolution.
|
|
|---|
![]() View larger version (44K): [in a new window] |
FIG. 1. Single most likely tree based on a combined analysis of the nuclear 18S rDNA and 26S rDNA of 73 ascomycetous taxa, using two basidiomycetes as outgroups. The nodes marking the Ascomycota, Euascomycetes, Archiascomycetes, and Hemiascomycetes as well as many terminal branches are supported by homogeneous Bayesian posterior probabilities 95%. Arrows indicate the origins of the three classes of the phylum Ascomycota. The branch length for Y. lipolytica = 0.22906 U of expected substitutions per site.
|
![]() View larger version (36K): [in a new window] |
FIG. 2. Combined maximum likelihood analysis of six genes (ACT1, EF2, RPB1, RPB2, 18S rDNA, and 26S rDNA) for 38 taxa of Hemiascomycetes and two outgroup species, an Archiascomycete (S. pombe) and a Euascomycete (N. crassa). Thickened lines denote heterogeneous Bayesian posterior probabilities 95% as calculated in the combined analysis. Node numbers are indicated above each branch. Refer to Table 4 for the statistical support of each node by each gene, as well as the combined multigenic support. Branch lengths leading to Y. lipolytica (0.19157) and the outgroup taxa (0.31967) were shortened to fit the figure. Black lines on the right indicate the three clades recognized in this study.
|
95%) (28 groups; thick branches in Fig. 2). Although single-gene trees were compatible with the combined phylogeny in almost all cases, only 29 to 64% as many clades received significant support in the single-gene trees compared to the combined analysis. The individual genes and the number of groups with significant support are as follows: RPB1 (18 clades), RPB2 (15 clades), EF2 (10 clades), 18S rDNA (10 clades), 26S rDNA (10 clades), and ACT1 (8 clades). Table 4 lists each node recognized in the combined analysis (Fig. 2) and its statistical support as calculated in the single-gene tree. Twelve nodes were not supported by any of the genes. However, 5 of the 12 (nodes 4, 15, 16, 32, and 35) appeared in the combined analysis with significant support. The remaining seven were recognized but not significantly supported. Fifteen clades were supported when the rDNA genes were combined, but when the four protein-coding genes were analyzed, 35 groups were resolved with statistically significant support. In the combined analysis of protein-coding genes and rDNA genes, 7 of the 35 nodes lost their statistical support. These nodes can be found throughout the tree: node 3 is located on the backbone, nodes 19 and 20 define relationships of groups of 10 and 6 taxa, respectively, and nodes 6, 12, 14, and 24 show sibling relationships between two terminal taxa. |
View this table: [in a new window] |
TABLE 4. Single-gene posterior probabilities for nodes in the combined analysis of the Saccharomycetales shown in Fig. 2
|
Phylogenetic analysis of the combined data sets for Saccharomycetales. (i) Homogeneous and heterogeneous Bayesian analyses of the Saccharomycetales. The maximum likelihood analysis of the combined data for the six genes (5,064 aligned nucleotides) resulted in the single tree shown in Fig. 2 (ln L 48,580; 8 df). The heterogeneous analyses of the combined data set resulted in trees with ln L 47,829 (6-model analysis) and ln L 46,486 (10-model analysis) at 142 and 168 df, respectively. The likelihood ratio for the homogeneous analysis and the 6-model analysis revealed that the heterogeneous model significantly improved the likelihood of the data. The 6-model analysis was then compared with the 10-model analysis. Applying different evolutionary models to codon positions, as in the 10-model analysis, results in the highest likelihood and statistical support for the relationships within the Saccharomycetales.
(ii) Phylogenetic relationships among Saccharomycetales. Three major clades were resolved within the Saccharomycetales with strong support. Clade 1 originates with node 29, clade 2 originates at node 16, and clade 3 originates at node 4 (Fig. 2). Yarrowia lipolytica was significantly supported as a sibling species to these three clades. Stephanoascus ciferrii formed the most basal taxon of the order, although the position of S. ciferrii was supported only by the analysis of the protein-coding genes. Clade 1 is comprised of six Candida species (C. albicans, C. dubliniensis, C. maltosa, C. tropicalis, C. viswanathii, and C. parapsilosis) and Lodderomyces elongisporus. They are most closely related to clade 2, which contains a monophyletic clade of the C. guilliermondii complex, the Metschnikowiaceae (i.e., species of Clavispora and Metschnikowia), and the sibling species C. zeylanoides and P. norvegensis. These taxa have a sibling relationship with a monophyletic clade of three Debaryomyces species. However, this association was significantly supported only by the analysis of the protein-coding genes, and there was no support for monophyly of the Debaryomyces clade. Clade 3 contains the Saccharomycetaceae and has a sibling relationship with clades 1 and 2 (Fig. 2). In clade 3, S. cerevisiae appears most closely related to Candida castellii and Candida glabrata. These three species are related to a clade composed of Eremothecium gossypii, Saccharomyces kluyveri, Kluyveromyces lactis, and K. marxianus, which together form the sibling group to Candida norvegica and Pichia jadinii. A clade of Issatchenkia orientalis, Pichia fermentans, and Pichia membranifaciens completes this sample of Saccharomycetaceae.
Ancestral character state evolution.
Ancestral character states for codon reassignment were reconstructed under the asymmetric two-parameter model. The logarithmic likelihood calculated under the model (15.57) was significantly greater than that calculated under the one-parameter Markov k-state model (20.57) at
= 0.05. The reconstruction indicated that codon reassignment occurred once in the evolutionary history of the Saccharomycetales in the most common ancestor of clades 1 and 2 (Fig. 3). The forward transition rate (0
1) was calculated to be 0.16, while the backward rate (1
0) was 4.26. Losses of the character are therefore more likely than gains. The left panel in Fig. 3 shows the reconstructed character states for each node. Gains or losses were assigned to nodes with statistical support higher than 87% support (black and white branches in Fig. 3). A character state could not be assigned unambiguously for nodes with less than 87% support (gray branches). The reconstruction indicates that codon reassignment occurred once, and there were at least five specific losses at the branches leading to Lodderomyces elongisporus, Clavispora opuntiae, Metschnikowia pulcherrima, Pichia norvegensis, and the Debaryomyces species (compare Fig. 2 and 3).
![]() View larger version (28K): [in a new window] |
FIG. 3. Diagrammatical representation of phylogram in Fig. 2 with character states for the translation of CUG (left side) and the presence or absence of Co-Q9 (right side) in the terminal taxa. The small rectangles in the center denote each species in Fig. 2. Different branch shading demarks reconstruction of ancestral character states. Black and white branches represent branches for which the presence or absence of the character could be reconstructed unambiguously with a statistical support value of 87%. Uncertainty in character reconstruction is indicated by gray branches (<87%).
|
When correlated evolution between codon recapture and Co-Q9 was tested, the null hypothesis of independent evolution could not be rejected (P = 0.1430). Likelihoods for two models of evolution (independent versus dependent) were calculated and compared in a likelihood ratio statistic. The likelihood of the model of dependent evolution was not significantly higher than the likelihood of the independent model. Since no significant correlation between the evolution of these characters could be detected by these reconstruction methods, we conclude that they evolved independently.
|
|
|---|
Most, but not all, of the pathogenic species of Candida were placed within the well-supported clade 1. However, common pathogenic Candida species and emerging pathogenic yeasts (11) can be found in every major clade of the phylogram (Fig. 2); e.g., C. glabrata is in clade 3. C. viswanathii, a rare opportunistic pathogen, is closely related to C. tropicalis, as noted by Barns et al. (2). With the exception of C. glabrata (Torulopsis glabrata), the most prominent clinical species of Candida are clustered in clade 1. However, clade 1 also includes at least two nonpathogenic species (C. maltosa and L. elongisporus). This result, as well as the placement of other opportunistic pathogens in different clades (e.g., C. glabrata, C. lusitaniae, C. guilliermondii, I. orientalis [C. kruesei], and S. cerevisiae), suggests that pathogenicity evolved independently on multiple occasions. Indeed, the base of the phylogram includes two extremely rare pathogens, Y. lipolytica (anamorph, Candida lipolytica) and Stephanoascus ciferrii (anamorph, Candida ciferrii) (11).
Combining sequence data for six genes in a phylogenetic analysis enabled us to clarify several other relationships among the species and families of medical yeasts. (i) Previous studies analyzed single genes and failed to define the relationships among C. albicans, C. viswanathii, C. tropicalis, and C. parapsilosis with significant statistical support (7, 14, 21, 50). This investigation resolved the phylogeny of these species. (ii) S. kluyveri was thought to be most closely related to S. cerevisiae, despite low statistical support (7). However, our analysis determined that S. kluyveri belongs in the Saccharomycetales but is more closely related to the genus Kluyveromyces than Saccharomyces. (iii) The results also support transferring Eremothecium gossypii from the family Ermotheciaceae to the Saccharomycetaceae, where it is closely related to S. kluyveri and Kluyveromyces species (25). This finding warrants further analysis of the Ermotheciaceae family, which contains four other species of Eremothecium. (iv) We included strains of the three recognized genotypes of C. parapsilosis (28) and confirmed that genotype I (ATCC 96138) most resembles the type strain (CBS 604). This analysis provides statistical support for a sibling relationship between genotypes II (ATCC 96140) and III (ATCC 96144) within a clade that includes genotype I and the type strain. (v) Our data confirm the polyphyletic composition of both Pichia and the anamorphic genus, Candida (7, 21). As neither genus is monophyletic, we recommend a reexamination and revision of these multifarious genera.
A discrepancy of minor importance involves the placement of P. norvegensis, a member of the polyphyletic genus Pichia. Kurtzman and Robnett's comparison of 26S rDNA sequences of 500 species of ascomycetous yeasts put C. zeylanoides in the Debaryomyces clade, and P. norvegensis was grouped with Pichia and Issatchenkia (24). Similarly, an analysis of the actin gene phylogeny placed P. norvegensis with the other Pichia species, some distance from C. zeylanoides (7). In agreement with these studies, our actin gene tree also grouped P. norvegensis with other Pichia species. However, analyses of the RPB1, RPB2, and 26S rDNA genes (Table 4, node 21) provided excellent support for the arrangement in our combined tree (Fig. 2). The previous studies used the type strain of P. norvegensis (CBS 6564), whereas we used a different strain, P. norvegensis var. zeylanoides (CBS 1922).
We also investigated the evolutionary associations of two traits common among the Saccharomycetales: codon usage and the Co-Q9 system. Although an examination of these character states among the terminal taxa suggested a correlation between codon recapture and Co-Q9 (44), our comparative analyses did not find significant support for the coevolution of these characters. Of course, this analysis may have been affected by the taxa we sampled, and the inclusion of more taxa might lead to a different conclusion. However, there was significant support for the monophyletic origin of CUG usage.
Most human pathogenic species of Saccharomycetales are equipped with alternative CUG usage, Co-Q9, or both. These attributes may be advantageous for pathogenicity of these organisms in mammals, which is a hypothesis that remains to be tested. Recent reviews of the CUG reassignment from leucine to serine indicate that the requisite serine-tRNACAG evolved in an ancestor to both Saccharomyces and Candida, and this gene was subsequently lost from Saccharomyces (34, 43). There is some experimental evidence that the redefinition of the CUG codon destabilized the proteome, leading to the overexpression of stress proteins that may have imparted an evolutionary advantage to pathogenic yeasts (43).
Regarding the type of Co-Q, our data significantly support a phylogeny with Co-Q9 as the ancestral character state for the entire Saccharomycetales, with a loss in the Saccharomycetaceae lineage. Co-Q9 is present in the two most distal taxa, Y. lipolytica and Stephanoascus ciferrii, as well as in the Archiascomycetes, Euascomycetes, and many basidiomycetes. Additional taxa will need to be analyzed to determine more precisely where, and how often, species within the Saccharomycetaceae replaced Co-Q9 (with Co-Q6 or Co-Q7 among the taxa in clade 3). Obviously, neither the redefinition of CUG or the presence of Co-Q9 is essential for pathogenicity, as many pathogenic yeasts and molds translate CUG as leucine and use other forms of Co-Q.
For a long time, rDNA genes were the only accessible source of data to investigate phylogenetic relationships among fungi. That situation has changed over the last several years with the increasing accumulation of protein-coding DNA sequences in databases and the completion of genome sequencing projects. This investigation demonstrates the value of including protein-coding sequences, which contributed significantly to the statistical support and resolution of the phylogeny. Protein-coding DNA sequences are easier to align than data derived from rDNA genes, especially when the investigated taxa span several families, as presented here. Another limitation of rDNA sequence data is that these genes are highly conserved, variation is limited, and multiple substitutions may occur. This study also illustrates the importance of analyzing relevant genes to elucidate specific relationships; for example, clade 1 was defined by the rDNA genes, but relationships within the clade were resolved by analyzing the RPB1 and RPB2 genes. As illustrated in Table 4, reliance on a single gene or two may yield inaccurate results. The availability of a well-supported multigene phylogeny also provides a valuable framework for assessing the results of ongoing genome projects and comparative genomics.
This multilocus phylogeny has clarified the evolutionary relationships among the medically important and related species of Saccharomycetales, defined taxa that require further investigation, and analyzed the origins of pathogen-related characters.
This study was funded by Public Health Service grant AI 28836 from the National Institutes of Health and the German Academic Exchange Service in the form of a diploma stipend for S.D.
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»