ABSTRACT
Because less than one-third of clinically relevant fusaria can be accurately identified to species level using phenotypic data (i.e., morphological species recognition), we constructed a three-locus DNA sequence database to facilitate molecular identification of the 69 Fusarium species associated with human or animal mycoses encountered in clinical microbiology laboratories. The database comprises partial sequences from three nuclear genes: translation elongation factor 1α (EF-1α), the largest subunit of RNA polymerase (RPB1), and the second largest subunit of RNA polymerase (RPB2). These three gene fragments can be amplified by PCR and sequenced using primers that are conserved across the phylogenetic breadth of Fusarium. Phylogenetic analyses of the combined data set reveal that, with the exception of two monotypic lineages, all clinically relevant fusaria are nested in one of eight variously sized and strongly supported species complexes. The monophyletic lineages have been named informally to facilitate communication of an isolate's clade membership and genetic diversity. To identify isolates to the species included within the database, partial DNA sequence data from one or more of the three genes can be used as a BLAST query against the database which is Web accessible at FUSARIUM-ID (http://isolate.fusariumdb.org ) and the Centraalbureau voor Schimmelcultures (CBS-KNAW) Fungal Biodiversity Center (http://www.cbs.knaw.nl/fusarium ). Alternatively, isolates can be identified via phylogenetic analysis by adding sequences of unknowns to the DNA sequence alignment, which can be downloaded from the two aforementioned websites. The utility of this database should increase significantly as members of the clinical microbiology community deposit in internationally accessible culture collections (e.g., CBS-KNAW or the Fusarium Research Center) cultures of novel mycosis-associated fusaria, along with associated, corrected sequence chromatograms and data, so that the sequence results can be verified and isolates are made available for future study.
In addition to being the single most important genus of toxigenic phytopathogens (40), Fusarium (Hypocreales, Ascomycota) has emerged over the past 3 decades as one of the most important genera of filamentous fungi responsible for deeply invasive, opportunistic infections in humans (83). Clinically, fusarioses in immunocompetent patients typically present as superficial infections, such as onychomycosis and trauma-associated keratitis, or locally invasive infections, such as sinusitis, catheter-associated peritonitis, pneumonia, or diabetic cellulitis (77). The 2005-2006 keratitis outbreaks within the United States and Asia, however, were unusual in that they were linked to the use of a novel soft contact lens cleaning solution, which was subsequently removed from the market (11). In contrast, immunocompromised or immunosuppressed patients who are persistently and profoundly neutropenic may acquire life-threatening angioinvasive, hematogenously disseminated fusarial infections associated with high morbidity and mortality rates (15). The high mortality of immunosuppressed patients is due in part to the broad resistance of most fusaria to the spectrum of antifungals currently available (1, 4-7, 56); liposomal amphotericin B shows the greatest efficacy among the drugs currently in use (3, 17, 66).
A series of molecular phylogenetic studies has led to the important conceptual advance that morphological species recognition within Fusarium (22, 38, 47) greatly underestimates its species diversity (49, 50, 53, 54-57, 59, 70, 85). This finding is not too surprising, given that phenotypic methods for identifying fusaria rely on relatively few morphological and cultural characters (75). Based on an extensive literature review, Nucci and Anaissie (48) recently recorded 12 morphospecies associated with fusarial infections within the immunocompromised patient population. However, phylogenetic species recognition based on genealogical concordance of multilocus DNA sequence data (herein referred to as GCPSR) (79) has identified at least 69 clinically important Fusarium species (Table 1) (49, 54, 56, 57, 70, 85). Phylogenetic species in these studies were recognized if they received ≥70% maximum parsimony (MP) bootstrap support (78) from the majority of the individual gene partitions and the combined data set and if their monophyly was not contradicted by analyses of any of the individual single-gene partitions.
Fusaria subjected to DNA MLST
Although GCPSR-based studies have revealed extensive cryptic speciation across the phylogenetic breadth of the genus and within other medically (9, 10, 33, 35, 65) and agriculturally important fungi (reviewed in references 21, 80, and 81), the level of cryptic speciation was especially pronounced within the Fusarium solani species complex (FSSC) (49, 56, 85) and F. incarnatum-F. equiseti species complexes (FIESC) (57). These two species complexes collectively harbor at least 75 species, including 41 associated with mycotic infection of humans and other animals. Multilocus DNA sequence data have proven to be essential for accurately circumscribing species boundaries within Fusarium and also have demonstrated utility in identifying epidemiologically important multilocus haplotypes, such as the widespread F. oxysporum clonal lineage (F. oxysporum species complex 3-a [FOSC 3-a], sequence types [ST] 33, 51, and 58) and FSSC 1-a and 2-d, which appear to be common in water systems (43, 54), including those of hospitals, where they pose a significant risk for nosocomial infections (2, 58).
Given the importance of fusaria to medicine, veterinary science, and agriculture, it is not surprising that diverse molecular methods for their identification have been published. The majority of these methods target the nuclear ribosomal internal transcribed spacer (ITS) region (30, 37, 67, 73) or domains D1 and D2 of the nuclear small-subunit ribosomal DNA (rDNA) (27, 28) as markers. Unfortunately, these methods were developed in reference to Fusarium morphospecies concepts, which greatly underestimate the species diversity reported herein based on GCPSR. Moreover, rDNA loci are too conserved to distinguish many closely related human pathogenic fusaria (8, 13, 54). Fortunately, recently published multilocus molecular phylogenetic studies of Fusarium have revealed that certain protein-encoding genes contain a wealth of phylogenetic signal (19, 53, 54, 56, 57, 70, 85). It is reasonable to assume that the genetic diversity of clinically and veterinarily relevant fusaria will continue to expand, whereas phenotypic methods will remain woefully inadequate for yielding accurate species-level identifications for over two-thirds of the fusaria encountered in the clinical laboratory. In response to this growing need for accurate species identification, the present study was initiated with the aim of developing a comprehensive DNA sequence database that includes a representative of all presently known human/animal pathogenic Fusarium species identified previously using GCPSR.
Toward this end, a three-locus DNA sequence database for all known human opportunistic/pathogenic fusaria (i.e., 69 species) was developed to meet the following four objectives: (i) determine the utility of single- and multilocus DNA sequence data (EF-1α, RPB1, and RPB2) for accurately identifying clinically important fusaria to species level, including partial sequence data from the DNA-directed RNA polymerase largest subunit (RPB1), which is used here for the first time for phylogenetic inference within Fusarium; (ii) investigate the phylogenetic diversity and evolutionary relationships of mycosis-associated fusaria; (iii) provide an Internet-accessible, three-locus database for accurately identifying and placing novel etiologic agents of fusarioses within a precise phylogenetic framework as they are encountered in the clinical microbiology laboratory; and (iv) archive a duplicate set of isolates at the CBS-KNAW in Europe and the ARS (NRRL) Culture Collection in the United States that is readily accessible to various research groups wanting to pursue further research on this topic. This Fusarium database, together with alignments and the corrected sequence chromatograms, will be incorporated into the FUSARIUM-ID database accessible via the Web at Pennsylvania State University (http://isolate.fusariumdb.org ) and the Centraalbureau voor Schimmelcultures (CBS) Biodiversity Centre (http://www.cbs.knaw.nl/fusarium ) to facilitate global identifications via the Internet and to promote cooperation and coordination in documenting and sharing the diversity and occurrence of clinically relevant fusaria.
MATERIALS AND METHODS
Fusarium isolates.The 71 isolates included in this study comprise 69 phylogenetically distinct species (Table 1). Of these, 65 were cultured from human clinical or veterinary sources. Actual medical or veterinary case isolates were unavailable for three fusaria reported to cause infections in humans, so representative isolates of these three species (i.e., F. napiforme, F. sporotrichioides, and F. lateritium) from other sources were used as substitutes. With the exception of these three species, all of the other isolates have been characterized molecularly in published studies using partial DNA sequence data (see references in Table 1). The isolates included in this study are available upon request from the Agricultural Research Service (NRRL) Culture Collection (http://nrrl.ncaur.usda.gov/TheCollection/index.html ), National Center for Agricultural Utilization Research, Peoria, IL, and the CBS-KNAW, where they are stored cryogenically.
Molecular biology.Mycelia were grown in 300-ml Erlenmeyer flasks containing 100 ml of yeast-malt broth (20 g dextrose, 5 g peptone, 3 g yeast extract, and 3 g malt extract per liter; Difco, Detroit, MI) for 2 or 3 days on a rotary shaker at 100 rpm, harvested over a Büchner funnel, and then freeze-dried. Total genomic DNA was extracted from ∼100 mg of freeze-dried mycelium using a cetyl trimethyl-ammonium bromide (CTAB; Sigma-Aldrich, St. Louis, MO) protocol as previously described (50). Portions of the translation elongation factor (EF-1α) and DNA-directed RNA polymerase second largest subunit (RPB2) were selected based on their demonstrated utility in previous studies (54, 56, 57, 85). DNA-directed RNA polymerase subunit 1 (RPB1) was chosen based on published species-level studies from the Assembling the Fungal Tree of Life (AFTOL) project (18, 42). PCR and sequencing primers used for these three loci are provided in Table 2. Platinum Taq DNA polymerase (Invitrogen, Carlsbad, CA) was used in all PCRs which were conducted in an Applied Biosystems (ABI) 9700 thermocycler (Emeryville, CA) using published cycling parameters (50). Amplicons were size fractionated via gel electrophoresis in 1.5% agarose gels (Invitrogen) run in 1× TAE buffer (69), stained with ethidium bromide and then photographed over a UV transilluminator. Prior to cycle sequencing, amplicons were purified using Montage96 filter plates (Millipore Corp., Billerica, MA). Sequencing reactions were conducted in a 10-μl volume containing 2 μl of ABI BigDye Terminator, version 3.1, reaction mixture, 2 to 4 pmol of a sequencing primer, and approximately 50 ng of amplicon as previously described (50). After cycle sequencing, all reaction mixtures were cleaned up using an XTerminator purification kit and then run on an ABI 3730 48-capillary automated sequencer.
Primers used for PCR and DNA sequencing
Phylogenetic analysis.Sequencher, version 4.9 (Gene Codes, Ann Arbor, MI), was used to edit and align raw ABI chromatograms, after which the RPB1 and RPB2 alignments were manually edited using TextPad, version 5.1.0 for Windows (Helios Software Solutions; Longridge, United Kingdom). Due to the presence of a number of length-variable indels within the three introns, sequences from the EF-1α partition were aligned automatically using MAFFT, version 6.0 (http://align.bmr.kyushu-u.ac.jp/mafft/software/ ), after which 92 ambiguously aligned intron nucleotide positions were excluded from the subsequent phylogenetic analyses. It is important to note that the entire region of EF-1α sequenced is archived at the Fusarium-ID and CBS-KNAW databases. A conditional combination approach, which employed maximum parsimony bootstrap values of ≥70% as the threshold for topological discordance, indicated that the three individual partitions could be analyzed as a combined data set (Table 3). Phylogenetic relationships among the clinically relevant fusaria were inferred from the combined three-locus data set using unweighted MP implemented in PAUP, version 4.0b10 (78), and maximum likelihood (ML) employing GARLI, version 0.951 (86), as previously described (56). MrModeltest, version 3.8 (64), using the ModelTest server 1.0, identified the general-time-reversible model with a proportion of invariant sites and gamma-distributed rate heterogeneity (GTR+I+ Γ) as the best-fit model of nucleotide substitution for the combined data set for the ML analyses. Searches for the shortest MP trees employed tree bisection and reconnection (TBR) branch swapping and 1,000 random sequence addition replicates. MP clade support was assessed by nonparametric bootstrapping, employing 1,000 pseudoreplicates of the data, 10 random addition sequences per replicate, and TBR branch swapping. Nonparametric ML bootstrapping was conducted with a 2.6-GHz MacBook Pro, using 5,000 generations without improving the topology parameter and 1,000 ML pseudoreplicates of the data.
Tree statistics for the individual and combined partitions
Nucleotide sequence accession numbers.DNA sequences have been deposited in GenBank under accession numbers HM347114 to HM347221.
RESULTS AND DISCUSSION
The primary objective of this study was to develop a Web-accessible three-locus DNA sequence database to facilitate identification of fusaria associated with human and animal infections. Additionally, the utility of single- and multilocus DNA sequence data for accurately identifying clinically important fusaria and a duplicate set of isolates at the CBS-KNAW in Europe and the ARS (NRRL) Culture Collection in the United States is described.
Phylogenetic relationships and identification of human pathogenic fusaria.The database was populated with aligned partial sequences from the nuclear genes RPB1 (1,607 sites), RPB2 (1,742 sites), and EF-1α (632 sites) from 71 isolates representing 69 fusariosis-associated species reported in the literature (Table 1). Sequences of F. sporotrichioides and F. lateritium were included in the database, with the caveat that the single reports of these species causing infections in humans need to be verified. Although F. napiforme has clearly been shown to cause a human mycotic infection (44), we consider reports of F. sporotrichioides (60) and F. lateritium (46) as etiological agents of fusarioses to be tentative because these identifications were not supported by molecular data and because these isolates were unavailable for further study. The morphological concepts of both species are known to comprise multiple phylogenetic species (20; K. O'Donnell, unpublished data). If identification of haplotypes within a species is required, detailed information on additional loci to sequence has previously been published (34, 51, 56, 57, 71, 85). NEXUS files with a PAUP block used to develop the FSSC, FIESC, F. chlamydosporum species complex (FCSC), F. dimerum species complex (FDSC), and Gibberella fujikuroi species complex (GFSC) can be downloaded from the Internet-accessible Fusarium database sites cited above or accessed via the dedicated BLAST servers.
PCR primers Fa and G2R, which were designed for higher-level phylogenetics as part of the AFTOL project (Table 2), successfully amplified an 1,894-bp fragment from the RPB1 D-to-G region in all of the fusaria included in the database except for F. cf. lateritium NRRL 25197. The Fa and G2R primer sites in this isolate, however, appear to be conserved, based on DNA sequence analysis of overlapping fragments obtained using Fa/R8 and F7/R9 as PCR primers (Fig. 1). Because DNA sequence data from the RPB1 locus had not been used previously for Fusarium phylogenetics, the design of internal sequencing primers was accomplished by downloading sequences of this gene from the three sequenced fusarial genomes (F. graminearum, F. oxysporum, and F. verticillioides) at the Broad Institute of MIT and Harvard University (http://www.broadinstitute.org/annotation/genome/fusarium_group/MultiHome.html ) and the genome of F. solani f. sp. pisi/Nectria haematococca from the Department of Energy Joint Genome Institute (JGI) (http://genome.jgi-psf.org/Necha2/Necha2.home.html ). As reported for other filamentous ascomycetes (26), the RPB1 D-to-G region in Fusarium was entirely exonic and free of indels. Alignment of the four RPB1 sequences facilitated the design of five conserved internal sequencing primers (Fig. 1). However, only three were needed (i.e., F5, F7, and F8) to obtain reliable sequence coverage of 66 of the 71 fusaria included in this study. Two additional primers (i.e., F6 and Fa) were required to completely sequence the D-to-G region in members of the F. dimerum species complex (FDSC) and F. fujikuroi NRRL 43610. Alignment of the RPB2 region between primers F5 and F7 required the insertion of two indels 3 and 39 bps in length to accommodate 1 and 13 additional codons, respectively, within members of the FDSC and the Gibberella clade. By way of contrast, due to the presence of three length-variable introns, we employed the software program MAFFT (http://align.bmr.kyushu-u.ac.jp/mafft/software/ ) to align the EF-1α region. To obtain an alignment of the EF-1α region that reflected positional homology, 92 intron nucleotide positions were coded as ambiguously aligned and excluded from all subsequent phylogenetic analyses.
Map of the RNA polymerase largest-subunit (RPB1) locus. Location and orientation of PCR and sequencing primers are indicated by half-arrows. PCR primers Fa and G2R (Table 2) successfully amplified an 1,894-bp fragment in all of the isolates except for NRRL 25197 Fusarium cf. lateritium, even though the PCR primer sites appear to be conserved in this isolate. Therefore, the Fa × G2R region was amplified in this isolate as two overlapping fragments using the PCR primer pairs Fa/R8 (1,127 bp) and F7/R9 (2,217 bp). Primers F5 and G2R flank the 1,607-bp RPB1 D-to-G region that was sequenced and analyzed. Sequences for 65 of the 71 isolates were generated using the F5, F7, and F8 sequencing primers. See Fig. 1 in Matheny et al. (41) for a detailed map of the entire RPB1 locus showing the position of the D-to-G region.
The three-locus data set totaled 3,981 bp of aligned DNA sequence data, including 1,531 parsimony-informative nucleotide positions. Summary sequence and tree statistics for the individual and combined data sets indicated that the three loci contained very similar levels of parsimony-informative characters (PIC) per bp of sequence (Table 3). The best ML tree, based on 10 independent analyses of the concatenated data set, yielded a log likelihood of −38,129.37 (Fig. 2). The four most-parsimonious trees were 6,683 steps in length and differed only in minor rearrangements of four closely related phylogenetic species within the FIESC (19, 22-24) and members of the FOSC (Table 3). Because the root position of the tree is unknown (54), the trees were midpoint rooted. Irrespective of whether trees were midpoint rooted or rooted using sequences of the FDSC or FSSC as a sister to the ingroup, the root always joined the tree with F. cf. lateritium NRRL 25197 forming the most basal divergence within a strongly supported (100% bootstrap support [BS]) Gibberella clade (Fig. 2). ML and MP phylogenetic analyses of the concatenated data set recovered trees that were highly concordant topologically (Fig. 2; only the best ML tree is shown). Evolutionary relationships among the six informally named species complexes within the Gibberella clade were fully resolved by ML bootstrapping. ML and MP bootstrapping recovered similar levels of clade support, with two exceptions. One of these involved the monophyly of the GFSC and its three biogeographically structured subclades (50). Although the GFSC and its three subclades were strongly supported by ML bootstrapping (Fig. 2), as were the previously inferred (American (Asian, African)) evolutionary relationships of the subclades (50), only the American clade received strong MP bootstrap support. In the second exception, the F. tricinctum species complex (FTSC) received moderate support (79% BS) as a sister to the ((FCSC, F. sambucinum species complex [FSAMSC]) FIESC) clade in the ML analysis, but this relationship was not supported by MP bootstrapping (51% BS). Close to three-quarters of the medically relevant Fusarium species were nested within the following three species complexes: FSSC (n = 21), FIESC (n = 20), and GFSC (n = 10). Of these, members of the FSSC are by far the most important, accounting for approximately 50 to 60% of all fusarioses worldwide (5, 56, 85).
Best maximum likelihood tree inferred from the combined three-locus data set for 71 isolates representing 69 medically and veterinarily important Fusarium species. Because the branching order of the two most basal lineages, the F. solani and F. dimerum species complexes (FSSC and FDSC), was unresolved in more inclusive analyses (54), the phylogram was midpoint rooted. The Gibberella clade contains the six most derived, clinically relevant species complexes. Species and their multilocus haplotypes are identified by Arabic numbers and lowercase Roman letters, respectively, for members of the Fusarium incarnatum-F. equiseti species complex (FIESC), the F. chlamydosporum species complex (FCSC), and the FSSC as previously reported (56, 57). Numbers in parentheses by the three F. oxysporum species complex (FOSC) isolates refer to clades as reported by O'Donnell et al. (50). Note that Latin binomials can be applied with confidence to only 23 of the 69 species. Fusarium sporotrichioides and F. cf. lateritium are highlighted in gray to indicate that reports of these species causing human infections need to be confirmed. Numbers above internodes represent ML bootstrap values based on 1,000 pseudoreplicates of the data. MP bootstrap values are indicated below internodes only when they differed by ≥5% of the MP value. Af, African subclade; Am, American subclade; As, Asian subclade; FSAMSC, F. sambucinum species complex; FTSC, F. tricinctum species complex; and GFSC, Gibberella fujikuroi species complex.
Web-based identification of human pathogenic fusaria.The database described in this paper is accessible via the Web in two forms with different features, either of which can be used for routine identification, and housed and maintained at Pennsylvania State University (http://isolate.fusariumdb.org ) and at the CBS Fungal Biodiversity Center (http://www.cbs.knaw.nl/fusarium ). FUSARIUM-ID was originally set up in 2004 (19) as the first dedicated website for the molecular identification of fusaria using a partial EF-1α gene sequence of an unknown as the query to BLAST the database. Construction of the FUSARIUM-ID database was motivated in part to ensure that researchers could use sequence data to make connections between their isolates of interest and sequence-characterized isolates available in public culture collections. This is essential because queries of GenBank (http://blast.ncbi.nlm.nih.gov/Blast.cgi ) consistently recovered Fusarium sequences that were incorrectly identified (discussed in reference 63). For the latter reason, researchers who choose to access the data generated in the present study via GenBank are advised to look at the top sequences sorted by maximum identity (“Max ident”) producing significant alignments, to make sure that the organism name is used consistently. FUSARIUM-ID has been updated with new data search and visualization tools (S. Kang, unpublished data) and currently archives multilocus DNA sequence data from selected species, including the broad spectrum of medically important fusaria, and sequence data from most previously published studies will be uploaded into FUSARIUM-ID. Expansion of FUSARIUM-ID should greatly facilitate accurate species identifications, especially in light of the fact that at least two-thirds (46/69) of the human pathogenic species cannot be identified currently using morphological data.
Using the FUSARIUM-ID database.A Web-accessible user's guide to FUSARIUM-ID can be found at the following link: http://isolate.fusariumdb.org/guide.php . The updated FUSARIUM-ID database discussed here can be queried via the BLAST feature using sequence data or by other information associated with isolates, including host or substrate of origin, geographic origin, and accession numbers from other culture collections. We recommend using one of the three loci highlighted in this paper (EF-1α, RPB1, or RPB2) first because of their relatively complete coverage of human pathogenic fusaria.
Also available are the DNA sequence alignments used in the various published multilocus phylogenetic and identification studies that are the basis for this database (49-59, 70, 85). Additional tools for manipulating DNA sequence data and for visualization of geographic distribution of isolates are available, as they are for the Phytophthora plant pathogen database (62). The Biolomics software package, utilized in the CBS strain database (http://www.cbs.knaw.nl/fungi/BioloMICS.aspx?searchopt=4 ), provides a wide array of additional tools, including simultaneous searches with multiple loci utilizing the “multiple sequences” tool and searches against all of the accession numbers within the entire CBS fungal culture collection.
Interpreting the results.We anticipate that clinical microbiologists with access to DNA sequencing technology will utilize this database for identification of isolates to the species level, often using a single DNA marker (generally EF-1α, RPB1, or RPB2) in doing so. In the context of other information, data from a single locus is often but not always sufficient to provide a reasonably high-confidence identification. The utility of a sequence match result depends on a variety of factors, including the quality of the sequence being used as a query, the degree to which the diversity of fusaria are represented in the database, and the various levels of known and actual DNA sequence diversity within species. All results must be interpreted with care, and precise conclusions may require additional data and analyses, including phylogenetic analysis.
Prior to conducting BLAST searches, it is essential that the sequences be edited using a software program such as DNA Chromatogram Explorer Lite (freeware available from HeracleSoftware, Lilienthal, Germany) or Sequencher (Gene Codes, Ann Arbor, MI) to trim low-quality ends and reconcile any differences between overlapping sequences. Whether using a partial EF-1α, RPB1, or RPB2 gene sequence as the query, an exact match to one of the human pathogenic species in the FUSARIUM-ID database generally can be interpreted as definite species-level identification. It is important for researchers to be aware, however, that several plant-pathogenic Fusarium species, including those that cause economically devastating diseases, such as fusarium head blight (FHB) of cereals (FSAMSC) (53, 59, 84) and soybean sudden death syndrome (SDS) (FSSC) (55) cannot be distinguished using DNA sequences of the three genes included in FUSARIUM-ID. For this reason, it is prudent to check the top sequences producing significant alignments to make sure that two or more species do not share the same allele. In addition to the FHB and SDS fusaria noted above, this problem may be encountered with a small number of clinically relevant species within the FSSC and FIESC. In anticipation of this problem, published sequence data comprising the three-locus typing schemes (EF-1α, RPB2, and ITS+large subunit [LSU] rDNA) for these two complexes (56, 57) have been incorporated into the FUSARIUM-ID database. It should be noted that, compared to the partial EF-1α, RPB1, and RPB2 gene sequences, the ITS+LSU rDNA possesses relatively little phylogenetic signal within Fusarium (8, 54). Although the ITS rDNA region has been adopted widely by the fungal community as the universal locus for DNA barcoding of fungi (72), use of this marker within Fusarium for inferring evolutionary relationships is complicated by the presence of ITS2 paralogs (origin by gene duplication) or xenologs (origin by horizontal gene transfer) whose phylogenetic distribution does not track with the species phylogeny (50). Similarly, the discovery of highly divergent paralogs and low sequence divergence among orthologs of the mitochondrial cytochrome oxidase 1 gene in Fusarium (23), a locus widely promoted for barcoding diverse organisms (http://www.barcoding.si.edu/DNABarCoding.htm ), indicates potential problems in using this locus for phylogenetics and identification of fusaria.
A single-locus sequence query to the database may provide exact matches to isolates of one or more multilocus sequence types (MLSTs) defined in previous studies. Users are reminded that such an exact match reflects only the single locus used as a query; their isolate of interest may differ from the MLST matches at other loci and thus not fit into that MLST as heretofore defined. As discussed previously (19), when a query sequence does not perfectly match anything in the database, the unknown may represent a novel allele of a species in the database or a novel Fusarium species. For the majority of the queries, it is reasonable to assume that the unknown sequence represents a novel allele of a species represented in the database, given the database's dense taxon sampling. Assuming that this is the case, then the top match or matches should confirm that the variant allele is nested within a species previously characterized by GCPSR. However, when the unknown appears to be closely related to more than one species in the database, we recommend that additional sequence data be generated to take advantage of the wealth of multilocus DNA sequence data generated in published GCPSR-based studies of the FDSC (70), FSSC (49, 56, 85), FOSC (51, 58), GFSC (50), FCSC, and FIESC (57). Though representatives of the FTSC and FSAMSC are included in the current database, they are very rarely encountered as etiologic agents of fusarioses (57). If the BLAST results indicate that the query sequence represents a novel species not currently represented in FUSARIUM-ID, then the test isolate's genealogical exclusivity should be evaluated via GCPSR analyses, using the appropriate multilocus typing scheme and including two or more isolates, if available, to assess their monophyly via bootstrapping. In practice, we recognize such isolates as phylogenetically distinct species only if they are resolved as reciprocally monophyletic in the majority of the bootstrapped individual gene partitions, as well as in the combined data set, and their monophyly is not contradicted by bootstrapping of any individual partition (14, 56, 57).
While some phylogenetic species are known to possess little or no allelic variation within the major diagnostic markers employed, others are far more variable. Isolates of F. proliferatum, for example, may differ by as much as 2.1% at the EF-1α locus and 1.7% at the RPB2 locus (D. M. Geiser and K. O'Donnell, unpublished data). In most cases, a moderately divergent, single-locus best match (e.g., 96 to 98% identity at the EF-1α locus) would likely represent a species that is not represented in the database, but data from additional loci and phylogenetic analyses would be necessary to determine that.
It is worth mentioning that the current version of FUSARIUM-ID possesses notable similarities and differences with TrichoKEY (http://www.isth.info/tools/molkey/index.php ), a Web-accessible site dedicated to identification of Trichoderma spp. (16). Like FUSARIUM-ID, TrichoKEY provides a BLAST function to identify unknowns using DNA sequence data. FUSARIUM-ID has been updated to incorporate two useful features of TrichoKEY: (i) BLAST queries using three-locus DNA sequence data and (ii) the ability to download sequence alignments for subsequent phylogenetic analyses. These two databases differ primarily in that ITS rDNA has been reported to be more useful than partial EF-1α and RPB2 sequences for identifications within Trichoderma (12, 32), whereas EF-1α, RPB1, and RPB2 appear to contain roughly equal levels of phylogenetic signal useful for species-level identifications within Fusarium. We also point out that the 5-to-7 region of RPB2 often can be used alone, without including sequence data from the 7-to-11 region, to identify an unknown to the species level.
The utility of the fusariosis-associated portion of the FUSARIUM-ID data set is expected to grow as new validated sequences/sequence chromatograms and cultures are accessioned, especially as researchers and journals recognize the necessity for molecularly based identifications of fusarial pathogens. The Web-accessible FUSARIUM-ID database and the CBS database will continue to be updated with phylogenetically diverse partial EF-1α, RPB1, and RPB2 sequences, thereby enabling researchers to accurately identify virtually all unknowns to the species level, as well as allowing them to precisely place novel pathogens within a robust phylogenetic framework. Molecular phylogenetic clarification of human opportunistic fusarial species limits represents a significant conceptual and technological advance, which should help facilitate the long-term goals of epidemiologic studies directed at identifying the spectrum of etiologic agents and their environmental reservoirs, especially among hospitalized patients at greatest risk for acquiring nosocomial infections. Through the elucidation of species boundaries in the human-pathogenic fusaria, it should be possible to develop DNA sequence-independent methods for their rapid identification, such as microsphere (54, 55, 84) and oligonucleotide arrays (30).
ACKNOWLEDGMENTS
Special thanks are due to Stacy Sink and Jean H. Juba for excellent technical assistance, Nathane Orwig for generating the DNA sequences in NCAUR's DNA core facility, and the culture collections and individuals who supplied isolates used in this study.
The mention of trade products or firm names does not imply that they are recommended by the U.S. Department of Agriculture over similar products or other firms not mentioned. In addition, the findings and conclusions in this article are those of the author(s) and do not necessarily represent the views of the CDC.
FOOTNOTES
- Received 17 May 2010.
- Returned for modification 1 July 2010.
- Accepted 27 July 2010.
- Copyright © 2010 American Society for Microbiology
REFERENCES
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.
- 26.↵
- 27.↵
- 28.↵
- 29.
- 30.↵
- 31.
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.
- 37.↵
- 38.↵
- 39.
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.
- 75.↵
- 76.
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.
- 83.↵
- 84.↵
- 85.↵
- 86.↵