Previous Article | Next Article ![]()
Journal of Clinical Microbiology, June 2002, p. 2095-2100, Vol. 40, No. 6
0095-1137/02/$04.00+0 DOI: 10.1128/JCM.40.6.2095-2100.2002
Copyright © 2002, American Society for Microbiology. All Rights Reserved.
Sezione Microbiologia Applicata, Dipartimento di Biologia Vegetale e Biotecnologie Agroambientali,1 Sezione Microbiologia,2 Sezione Clinica Malattie Infettive, Dipartimento di Medicina Sperimentale e Scienze Biochimiche, University of Perugia, Perugia, Italy3
Received 31 July 2001/ Returned for modification 17 December 2001/ Accepted 7 March 2002
|
|
|---|
|
|
|---|
In this work a comparison of three systems for the analysis of typing data is presented. These consist of two commercial packages, specifically designed to interpret data from electrophoretic banding patterns, and one system under development at the Industrial Yeasts Collection of the Dipartimento di Biologia Vegetale di Perugia (DBVPG). These procedures have been employed separately in a multicenter comparison carried out on the same digitalized pictures obtained with three of the most widely used molecular procedures (random amplified polymorphic DNA [RAPD] analysis, contour-clamped homogeneous electric field [CHEF] analysis, and [GACA]4 analysis) carried out on 12 strains of Cryptococcus neoformans and one reference strain of Saccharomyces cerevisiae. The first aim of this work is to determine whether different analytical systems produce dendrograms with significantly diverse topologies. The second aim is to ascertain how these systems estimate the distances among the strains under study and whether they recognize their geographic and clinical source of isolation.
|
|
|---|
|
View this table: [in a new window] |
TABLE 1. Strains employed in this study
|
DNA for RAPD experiments was extracted and purified according to an existing method (3); chromosomal-grade DNA was extracted according to the procedure of Cardinali et al. (5). In both methods, the only modification was to double the amount of biomass employed in order compensate for the presence of the large capsule typical of C. neoformans. Extraction of genomic DNA for (GACA)4 analysis was also performed as previously described (13).
RAPD analyses. Primers ZP19 (AAGAGCCCGT) and ZP20 (GCGATCCCCA), currently used to type Escherichia coli (also referred to as 1247 and 1283, respectively [9, 14]), were employed in the RAPD analyses. Reactions were carried out in a 40-µl reaction volume containing 2 µl of primer (20 pMol), 4 µl of 10x buffer (10 mM Tris-HCl, 150 mM KCl, 1.5 mM MgCl2, 0.1% Triton X-100 [pH 8.8]), 1.5 U of DyNAzyme II (Finnzymes), and 500 ng of DNA, with a hot-start program consisting of an initial denaturation of 2 min at 94°C, followed by 40 cycles each of 60 s at 94°C, 60 s at 36°C, and 60 s at 72°C, with a final 5-min elongation at 72°C. Amplification products were subjected to 1% agarose gel electrophoresis in 0.5x TAE (20 mM Tris-acetate, 0.5 mM EDTA) buffer. Pictures of the ethidium bromide-stained gels were captured with a black-and-white camera (Kappa GmbH) coupled with a NuVista image card, digitalized in TIFF format, and delivered to all laboratories.
(GACA)4 analyses. Amplification reactions were performed using 100 pmol of the repetitive oligonucleotide (GACA)4 as single primer, a 100 µM concentration of each deoxynucleoside triphosphate (Boehringer Mannheim GmbH, Mannheim, Germany), 3 mM MgCl2 (Perkin-Elmer Italia, Monza, Italy), 10x reaction buffer (500 mM KCl, 100 mM Tris-HCl [pH 8.3]; Perkin-Elmer), 0.5 U of AmpliTaq DNA polymerase (Perkin-Elmer), and 400 ng of DNA sample. PCR was performed in a GeneAmp PCR system 2400 thermal cycler (Perkin-Elmer), with an initial cycle of 5 min at 94°C; 38 cycles of 30 s at 94°C, 30 s at 47°C, and 60 s at 72°C; and a final cycle of 5 min at 72°C. Amplification products were subjected to 1.4% agarose gel electrophoresis in 1x TBE. (0.089 M Trizma base, 0.089 M boric acid, 0.002 M EDTA [pH 8.4]; Sigma-Aldrich, Milan, Italy). Pictures of the ethidium bromide-stained gels were captured digitally in TIFF format as described above and delivered to all laboratories.
CHEF procedure. Chromosomal DNA was subjected to pulsed-field gel electrophoresis (1% agarose gel) in 1x TAE buffer at 14°C, employing a CHEF apparatus (Bio-Rad Laboratories) with a multistate program consisting of a first 72-h block at 2 V/cm, with the switching time linearly ramping from 20 to 24 min, and a second 7-h block at 6 V/cm, with the switching time linearly ramping from 25 to 145 s. Pictures of the ethidium bromide-stained gels were captured with a black-and-white camera (Kappa GmbH) coupled with a NuVista image card, digitalized in TIFF format, and delivered to all laboratories.
DAS. The analysis system under development at DBVPG (the DBVPG analysis system [DAS]) includes free domain applications available from the Internet and link applications run in Excel to ensure a multiplatform usage. The system consists of the following operations:(i) measurement of the migration distance of each band from the well; (ii) calculation of the molecular weight of each band using a regression equation obtained with the migration distance and molecular weight values of a reference pattern, always included in the gels; (iii) classification of the molecular weight data; and (iv) statistical or phylogenetic analysis.
(i) Migration distance measures The measures of migration distance data were obtained with the freeware package NIH-Image 1.62 (available with full instructions from its own web page: [http://rsb.info.nih.gov/nih-image/index.html] in both Macintosh and PC versions) by using either the automatic reading of densitometric peaks in the gel macro, or the haircross tool. In our hands, both procedures yielded very similar data series; the second was, however, preferred because it allows detection of the bands directly on the gel.
(ii) Regression analysis Regression analysis was carried out with the MacCurve Fit 1.5 software using a polynomial regression-fitting algorithm of third degree in order to avoid biphasic trends of the curves. The obtained equation was then used to convert the migration distance data (from NIH-Image) into the corresponding molecular weight values. Only regressions with an R value not lower than 0.99 were processed.
(iii) Classification of molecular weight data An Excel-based macro, Classify 1.3 (freely available from the corresponding author), was used to classify the molecular weight data. The upper limit of the molecular weight classes was defined according to the following formula: max(i + 1) = maxi x (100 - CA)/CA, where max(i + 1) is the upper limit of the (i + 1)th class, maxi is the upper limit of the ith class, and CA is the class amplitude.
Lower limits were simply as follows: min(i + 1) = maxi + 1, where min(i + 1) represents the lower value of the (i + 1)th class. CA values normally range from 5 to 10.
By using this algorithm the class amplitude decreases with the decreasing of the absolute molecular weight value, obtaining narrower classes for the lighter bands and wider classes for the larger bands. This situation produces very precise assignments of the low-molecular-weight data corresponding with the best-resolved bands of the lower part of the gel.
Classified values were arranged automatically in a matrix of discrete values (1 = presence of the band in the class; 0 = absence), referred to hereafter in this work as the binary matrix, representing the output of the third step.
Binary matrices were introduced in the SPSS program (SPSS Inc, Chicago, Ill.) for the statistical cluster analysis, carried out according to the Ward algorithm, using Euclidean distance.
Analyses with commercial packages Comparison of fingerprinting patterns was performed by Diversity One (DO) software V 1.3 (PDI, Huntington Station, N.Y.) and with the molecular Analyst Fingerprint (MA) software (Bio-Rad). The DO software is able to calculate the molecular weight of each band of the different fingerprints using as reference the appropriate molecular weight size standards and correcting gel distortion in order to minimize mistakes in the molecular weight calculation. According to their molecular weights, bands were then matched with 5% tolerance. The overall comparison of the different fingerprints was then performed defining a similarity matrix and applying the neighbor-joining method to construct the dendrogram.
Gel analyses by MA fingerprinting (Bio-Rad) were carried out following the automatic routine using 4% drift (tolerance) and the Dice coefficient to calculate distances among strains. Dendrograms were generated with the unweighted pair group method with arithmetic mean algorithm.
Organization of the multicenter comparison RAPD, (GACA)4, and CHEF analyses were performed separately in three laboratories: DBVPG (Perugia), Dipartimento di Medicina Sperimentale e Scienze Biochimiche (Perugia Hospital), and Istituto di Igiene e Medicina Preventiva (Milan); upon digitalization, the same pictures of the four gels were distributed to the three centers for separate analysis in order to exclude experimental variability due to the molecular procedures. According to this scheme, variations in the final dendrograms should be exclusively attributed to the differences in the analytical systems. In order to allow an easy, visual comparison, all dendrograms were graphically processed at DBVPG to normalize their dimensions, leaving untouched both the shape (topology) and the proportional lengths of the branches.
It might be useful to point out here that the production of binary (1 = present; 0 = absent) matrices, reporting the data from a gel, can be accomplished with two systems, which use the migration distance or the molecular weight values to assign the bands to predefined classes. In principle, both systems should perform equally well, although some inconvenience apparently affects the application of the routine based on the migration distances (4).
The two commercial packages employed in this study classify bands according to their migration distance, while DAS uses molecular weight values.
|
|
|---|
![]() View larger version (119K): [in a new window] |
FIG. 1. Pictures of gels analyzed to produce the dendrograms in Fig. 2. (a) RAPD obtained with the primer ZP19; (b) RAPD analysis obtained using the primer ZP20; (c) gel with (GACA)4 patterns; (d) electrokaryotypes (CHEF). Strain labels refer to Table 1.
|
![]() View larger version (52K): [in a new window] |
FIG. 2. Dendrograms obtained with each of the three analytical procedures under comparison. Dendrograms of RAPD obtained with primer ZP19 (a), RAPD obtained with primer ZP20 (b), (GACA)4 analysis (c), and CHEF banding patterns (d). Dendrogram dimensions were graphically normalized to allow an easy visual comparison of the topologies and of the relative distances among strain. See Table 1 for a list of strains used.
|
(GACA)4 dendrograms. PCR analysis with (GACA)4 (GACAGACAGACAGACA) primers has been extensively used (7, 10) to differentiate among varieties and serotypes of C. neoformans because of the ability to produce patterns with several well resolved bands.
Dendrograms (Fig. 2c) obtained from the gel with the (GACA)4 products (Fig. 1c) show that again the DO system fails to detect the identity between the two PG 1934 replicates, whereas the overall similarity among the strains MI 1956 A, MI 1956 F, MI 1988A, and MI 1988C and the strict relatedness among the PG 1934 and CBS 6995 strains are recognized by all three systems.
CHEF dendrograms. Dendrograms (Fig. 2d) from CHEF profiles (Fig. 1d) show a strict relation among the MI 17 and MI 1988C strains in all three systems. DO fails to recognize the identities of the two PG 1934 and MI 15 replicates but is the only system to recognize that indeed the S. cerevisiae strain is the outgroup.
DAS produces a dendrogram formed by two clusters of strains, of which one includes strains isolated from individuals in Milan (with the exception of PG 1883) and the other includes strains isolated from individuals in Perugia (with the exception of MI 1956F). This observation confirms the ability of DAS to discriminate the Cryptococcus strains according to their geographic origin as already noted in the case of the analysis of the patterns obtained from ZP19 products (Fig. 2a).
Merging different data sets in one matrix. An everyday situation in molecular epidemiology is the impossibility of accommodating in a single gel all strains studied by a given technique (e.g., RAPD or CHEF analysis). This hampers the analysis of banding patterns by methods that use as input the migration distances, because it is virtually impossible for the same band to migrate exactly the same distance in different electrophoretic runs. The above problems can be solved with DAS by preparing a single matrix with all molecular weight data obtained from separate gels. Another advantage of this system is the possibility of merging in a single matrix all data sets obtained by applying different techniques (RAPD and CHEF analyses, etc.) to the same strains to obtain a single dendrogram from all available information.
In order to test this feature of the DAS, we merged in a single matrix all data sets derived from the four molecular analyses described above, to draw a dendrogram according to Ward's algorithm calculated on the basis of the Euclidean distances. Results in Fig. 3 show that the strains considered are separated according to the geographic distribution, with only the exception of PG 1883, which falls in the cluster containing the isolates from Milan.
![]() View larger version (19K): [in a new window] |
FIG. 3. Comprehensive dendrogram calculated with DAS by pooling all data (RAPD with primers ZP19 and ZP20, [GACA]4, and CHEF) in a single matrix. Details of the procedure are illustrated in the text.
|
|
|
|---|
The use of migration distances precludes the comparison of data coming from different laboratories, because the positions of the same bands in different gels vary even if care is taken to regulate the running conditions. Some commercial packages have tried to solve the problem by an artificial video alignment of the corresponding bands of identical molecular weight marker used in all gel runs. We have argued that this operation is legitimate only when the regression curve between migration distances and corresponding molecular weight values is linear, as in the case of many RAPD gels, but not when it has a hyperbolic shape, as in pulsed-field gel electrophoresis gels (4). These observations lead to the conclusion that the transformation of the migration distances into the corresponding molecular weights is an essential issue of the standardization process.
The choice of the algorithm to use in different cases is not always clear, although some procedures should be used prudently because they are not necessarily appropriate to the biological situation under study. The unweighted pair group method with arithmetic mean, for instance, should not be used with organisms coming from environments with different (or unknown) selective pressures, because it assumes a constancy of variation rates (8). Even the algorithms to calculate statistical distances should be chosen carefully, considering that not all were designed for the purpose for which we tend to employ them. This can be the case of the Dice coefficient, introduced by Dice in 1945 for a study on species associations, which gives a weight of 2 to the double presence of bands and does not consider cases of double absence (6). It could be argued that in a RAPD profile both double presence and double absence of bands indicate similarity and that the presence of two bands of the same molecular weight is not a guarantee that indeed those two fragments of DNA represent the same DNA sequence. These few examples of criticism aim not to open a controversy but rather to stress the need for consensus analytical systems able to produce results immediately comparable by all investigators of the same scientific area and not affected by major statistical problems. Results presented in this paper suggest that all steps of each analytical routine need to be explicitly defined and standardized, according to the best performance found. In our opinion there are at least two strong requirements to achieve this goal: a wide discussion on the algorithms and an effective standardization of the software application employed.
During the review process of this article, in a paper dealing with comparisons between commercial software packages, we found that there are evident discrepancies between the three systems employed and that none provides an indisputably correct analysis (11). These findings confirm our conclusions and call for active research to solve the problem of the effective significance obtained with these systems of analysis.
Work is in progress in our laboratory to develop new procedures, aiming to give the investigator more choices of analysis to improve the overall flexibility of the system and to allow an effective comparison between different approaches. These procedures will be included in free software programs designed to interact with some of the most popular statistical or phylogenetic packages such as PHYLIP, PAUP, SPSS, ADE4, Le Progeciel.
This work was partially supported by grant ISS 1998-# 50A 0.01 of the Istituto Superiore di Sanità.
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»