Previous Article | Next Article ![]()
Journal of Clinical Microbiology, February 2007, p. 358-363, Vol. 45, No. 2
0095-1137/07/$08.00+0 doi:10.1128/JCM.01848-06
Biotechnology Core Facility Branch, Division of Scientific Resources, National Center for Preparedness, Detection, and Control of Infectious Diseases, Coordinating Center for Infectious Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia
Received 6 September 2006/ Returned for modification 22 November 2006/ Accepted 5 December 2006
|
|
|---|
|
|
|---|
Variola virus (VARV), a member of genus Orthopoxvirus, is recognized as the most significant poxvirus that infects humans, causing the smallpox disease. VARV is divided into two phenotypic subtypes based on case fatality. A fatality rate ranging up to 40% was recorded for the variola major subtype in unvaccinated human populations; rates of approximately 1% characterized the other strain, termed the variola minor strain (13). It is estimated that smallpox disease has killed more humans than all other infectious diseases combined (15). In 1979, the World Health Organization (WHO) certified the eradication of smallpox virus following successful vaccination worldwide. Since then, the known smallpox virus samples have been stocked in repositories at the Centers for Disease Control and Prevention (Atlanta, Georgia) and the State Research Center of Virology and Biotechnology (Koltsovo, Novosibirsk region, Russia).
Although significant advances in the control and treatment of smallpox virus have been made, the issue of reemergence and potential zoonotic poxvirus infection in the future remains a serious public health concern. Furthermore, the specter of bioterrorism using native or genetically engineered variola virus has arisen as a vital concern, exacerbated by the facts that routine vaccinations were stopped after eradication and that a large population worldwide is thus completely unprotected from variola virus infection.
Genomic diversity within and between populations of an organism is caused by single-nucleotide mutations, change in the repetitive sequences, recombination, deletion, and insertion. For decades, Sanger's dideoxy termination method has been the method of choice for genomic sequencing, including sequencing of human-pathogenic bacteria and viruses. With the advent of high-throughput, high-density resequencing GeneChip technology developed by Affymetrix (Santa Clara, CA), it is now possible to rapidly identify genetic variation across genomes. This highly parallel tool has also been successful in gene expression and genotyping studies of various viruses, such as dengue virus, murine cytomegalovirus, rabbitpox virus, reovirus, and human immunodeficiency virus (1, 2, 18, 24, 25).
In a recent study, we successfully characterized the complete genomes of two strains of an emerging infectious human pathogen, severe acute respiratory syndrome (SARS) coronavirus, by hybridizing Affymetrix SARS resequencing GeneChips (21). The present study was carried out with the major objective of being prepared to characterize the smallpox virus genome rapidly should a malicious release occur. This study describes the resequencing of 14 variola virus strains by use of the high-throughput smallpox virus resequencing GeneChip set.
|
|
|---|
|
View this table: [in a new window] |
TABLE 1. Genomic sequences of VARV tiled as reference in smallpox virus resequencing GeneChip set and strains resequenceda
|
Pooling and quantitation of PCR amplicons. The PCR amplicons were purified using a QIAquick cleanup kit and eluted in 50 µl elution buffer (QIAGEN, Valencia, CA). The concentration of each PCR product was measured at a 260-nm absorbance in a NanoDrop-1000 spectrophotometer (NanoDrop Technology, Rockland, DE). To cover the complete genome, seven different sets of PCR products (equimolar amounts) representing overlapping genomic segments were pooled before fragmentation according to instructions in the GeneChip CustomSeq array manual (Affymetrix).
DNA fragmentation. One microgram of pooled PCR products was fragmented using the GeneChip fragmentation reagent (Affymetrix) in a thermocycler preheated to 37°C with a single cycle of 37°C for 15 min, 95°C for 15 min, and a 4°C hold. The fragmented samples were visualized on a 4 to 20% Novex Tris-borate-EDTA gel (Invitrogen, Carlsbad, CA) to ensure that an optimal range of fragment sizes (20 to 200 bases) was achieved. Subsequently, the fragmented samples were end labeled using the GeneChip labeling reagent (Affymetrix) while incubating at 37°C for 2 h, and this was followed by inactivation at 95°C for 15 min. Samples were then cooled on ice and stored at 20°C until the hybridization was completed.
GeneChip hybridization. The set of smallpox virus resequencing GeneChips was equilibrated at room temperature for at least 15 min before hybridization and prehybridized by filling each GeneChip with 200 µl of prehybridization buffer (10 mM Tris [pH 7.8], 0.01% Tween 20). Subsequently, these GeneChips were placed in a GeneChip hybridization oven (Affymetrix) set at 45°C and rotated at 60 rpm for 15 min. Meanwhile, the hybridization solution (3 M tetramethyl ammonium chloride, 10 mM Tris [pH 7.8], 0.01% Tween 20, 500 µg/ml acetylated bovine serum albumin [BSA], 100 µg/ml herring sperm DNA, and 0.26 µg of fragmented and labeled 7.5-kb DNA serving as the positive hybridization control) was prepared and denatured by placing the tubes at 95°C for 5 min and then equilibrated at 45°C for 5 min. The prehybridization buffer was removed from the GeneChips and replaced with equilibrated hybridization solution, and the GeneChips were placed in a GeneChip hybridization oven (Affymetrix) for 16 h at 45°C with rotation at 60 rpm. After completion of hybridization, the hybridization solution was removed from the GeneChips and saved at 20°C, and the GeneChips were completely filled with 200 µl of nonstringent wash buffer A (6x SSPE, 0.01% Tween 20) (1x SSPE is 0.18 M NaCl, 10 mM NaH2PO4, and 1 mM EDTA [pH 7.7]).
Washing, staining, and scanning. The hybridized GeneChips were washed and stained using a GeneChip FS-450 fluidics station (Affymetrix). Staining of the GeneChips was performed twice with a solution containing 6x SSPE, 0.01% Tween 20, 2 mg/ml acetylated BSA, and 10 µg/ml of streptavidin-R-phycoerythrin conjugate. One additional cycle was also performed with antibody wash (6x SSPE, 0.01% Tween 20, 2 mg/ml acetylated BSA, 3 µg/ml biotinylated anti-streptavidin, and 100 µg/ml goat immunoglobulin G) to remove excess streptavidin-R-phycoerythrin conjugate. The hybridized, washed, and stained smallpox virus GeneChips were then scanned by a GeneChip scanner (Affymetrix). The Affymetrix GeneChip operating software program was employed to operate the fluidics station as well as the scanner.
Data acquisition and analysis. Scanned, raw pixel array data files (.DAT files) were converted into cell intensity values by GeneChip operating software, generating the cell intensity files (.CEL files). These files were used to determine the base calls and the quality scores. The data were initially analyzed by GeneChip DNA analysis software (GDAS), which Affymetrix provides to automate the base calls. This software implements the 2001 ABACUS algorithm of Cutler et al. (4). The data were also examined by RATools, another implementation of the ABACUS algorithm which provides a rigorous framework for the analysis of resequencing GeneChips (28). In order to find the differences between the whole-genome sequences generated by the resequencing GeneChips and those generated by conventional sequencing of any VARV strain, we used the MUMmer program (5).
Design of smallpox virus resequencing GeneChips.
The smallpox virus resequencing GeneChips (a set of seven) were designed based on the whole-genome sequences of 24 variola virus strains. Multiple alignments of the sequences were carried out to summarize the sequence conservation and variation among these strains by use of the CLUSTAL_X and DnaSP programs (20, 23). Although our smallpox virus resequencing GeneChip design contains 408 instructions across the seven-chip set, the conserved genomic region among all the strains was tiled only once; additional unique sequences for all strain-specific variations were also tiled (Table 2) . Nucleotide sequence of the CHN48_horn strain (GenBank accession no. DQ437582) was tiled as the major reference sequence, and the computational alignment features (insertions or deletions) were calculated relative to this sequence. These GeneChips, containing
240,000 different types of 25-mer oligonucleotides with a 20- by 25-µm feature size, were fabricated by Affymetrix. Each position in the reference sequence was represented by four nearly identical 25-mer probes that differed only at the 13th position. Each of the seven GeneChips belonging to a set was designed to analyze a divergent segment of approximately 30,000 bases of the smallpox virus genome. The maximum theoretical coverage of the reference sequence (CHN48_horn) in these resequencing GeneChips was approximately 99%, as 1,779 bases represented repeats, which were not tiled. The high-density smallpox virus resequencing GeneChips were fabricated using proprietary photolithography and solid-phase DNA synthesis by Affymetrix (8, 14).
|
View this table: [in a new window] |
TABLE 2. The 22 long-range PCR products pooled and hybridized with seven resequencing GeneChips to cover the smallpox virus genomes
|
|
|
|---|
For data analysis, we used two implementations of the base-calling ABACUS algorithm, GDAS and RATools (4). ABACUS scores a base as uncalled (ambiguous call or N call) when confidence values for that call fall below a user-determined threshold. We considered a "discordant call" to be when the base call obtained from the chips differed from those generated by dideoxy sequencing, defined the percentage of nucleotide sequence with high confidence across the genome as the call rate, and calculated accuracy as the total number of correct calls excluding uncalled bases (Table 3) . For RATools-based analysis, the total threshold was set at 30, and a strand minimum of 2 was used (4). A comparable default setting for GDAS-based analysis was selected. The calling patterns of both programs were essentially similar, though in this paper we report results as obtained with RATools.
|
View this table: [in a new window] |
TABLE 3. Call rates observed across the hybridized set of seven smallpox virus resequencing GeneChips for 14 variola virus strainsa
|
The analysis revealed a distinctly conserved pattern across the VARV genomes characterized. Of the 14 VARV strains resequenced, the nucleotide sequences generated by smallpox virus GeneChips and conventional capillary sequencing were identical for six of the genomes; a similar result was evident for the major reference strain (CHN48_horn), which was hybridized in duplicate with two different sets of the GeneChips. Five genomes differed from Sanger's sequencing by a single base, two genomes differed by 3 bases, and the SOM77_ali genome had differences of 12 bases (Table 4) .
|
View this table: [in a new window] |
TABLE 4. Variation between calls generated by capillary-based sequencing and array-based smallpox virus resequencing GeneChipsa
|
|
|
|---|
The current standard for de novo genome characterization, automated dideoxy sequencing by capillary electrophoresis, is a relatively time- and proprietary reagent-consuming process. Resequencing by hybridization of DNA microarrays is not a replacement for de novo sequencing but a complement. Of course, resequencing presupposes the existence of at least one guide (reference) sequence. Although Affymetrix' implementation of GeneChip technology is designed to detect single-nucleotide polymorphisms at each locus, if more-extensive variants of the guide sequence, such as deletions or insertions, are to be detected, these too must be present on the chip as supplementary tiles (24). Furthermore, the effort and cost involved in the design and manufacture of complex chip sets is significant. In spite of these facts, GeneChip-based genome characterization has been found to be advantageous over conventional sequencing, which requires a larger amount of genomic material (21). The set of smallpox virus resequencing GeneChips used in this study incorporates all variants known from de novo sequencing of 24 VARV strains. Accommodating this information required 408 tiles spread across seven GeneChips, each containing approximately 60,000 features. In contrast, the SARS resequencing GeneChip that was previously reported contained only four tiles on one GeneChip (21).
Offsetting the cost and effort of construction of resequencing GeneChips, however, is the potential of the characterization assay for speed and sensitivity. These attributes would seem to make resequencing on microarrays ideal for public health and bioterrorism preparedness settings in which reemergence, either natural or malicious, of genomically well-characterized pathogens is of concern.
Thus, from a tool that was originally conceived to facilitate single-nucleotide polymorphism identification in the human genome (3, 4, 7, 26) has evolved a tool to enable the rapid genomic characterization of important pathogens. For example, it has been possible to resequence several strains of a biowarfare pathogen, Bacillus anthracis, as well as to understand the population genetic structure of that organism (28), and a resequencing GeneChip based on the SARS virus has been validated to characterize the genomes of two human-pathogenic strains of this virus (21).
In addition to these developments, DNA-based microarrays have been designed to interrogate subsets of genes responsible for phenotypic differences in pathogenesis, virulence, evolution, and genetic variation (7, 17, 19, 22). A microchip containing 15 oligonucleotide probes for species-specific detection of OPVs (including 59 strains of variola, vaccinia, monkeypox, cowpox, and camelpox viruses) pathogenic to humans and animals was developed based on the CrmB gene (12). A similar approach was later described for rapid detection of and discrimination between four human-pathogenic OPVs, including VARV, based on the C23L/B29R gene (11).
This report describes the design of a GeneChip capable of resequencing the entire VARV genome and validates the accuracy of resequencing by hybridization using strains of VARV of both major and minor subtypes. The repeatability of ABACUS base calls was determined by one fully replicating experiment; independent amplification of the same genome sample (the reference strain CHN48_horn) was followed by hybridization on two different chip sets. The total number of calls made in replicate experiments was 377,781 (189,590 for replicate 1 and 188,191 for replicate 2). The number of discordant base calls was zero. The call rate in all experiments reported here was 96%, comparable to those for GeneChips previously fabricated by Affymetrix and employed for Bacillus anthracis resequencing (28) and human genotyping studies (4). Since achieving a high call rate is a major issue for successful hybridization, attempts were made to further increase the call rates by modifying default parameters of the base-calling algorithm. No improvement could be achieved over the default values for a total threshold of 30 and for a strand threshold of 2 (data not shown). These values were used in all data analysis presented here as well as in some previous studies (4, 28) and thus seem appropriate.
The resequencing chip set we have used here is not publicly available, though the sequences on which it is based are now in the public domain (6). Thus, our intent in publishing this study is not to propagate the use of this particular chip set but to suggest that there are likely many situations in which DNA microarray-based resequencing is appropriate and that one can have considerable confidence in the sequence data generated by them.
In conclusion, we believe that possession of the smallpox virus resequencing GeneChips poises us to respond rapidly with an accurate sequence identification of VARV should a suspected case of smallpox disease be discovered. Furthermore, the preliminary results of our ongoing studies suggest that large portions of a closely related, non-VARV orthopoxvirus genome (monkeypox virus) would be revealed by hybridization to this GeneChip set (data not shown). In such a case, even observation of hybridization with gaps and a low call rate (55%) would provide (i) good evidence that the infection in question is not authentic VARV and (ii) a useful guide to conventional sequencing primers which might be used to close the gap(s). Thus, in future the smallpox resequencing GeneChips may be used to detect possible human manipulations of the genome of this virus and to differentiate it from other orthopoxviruses and as a general tool to understand the evolution of the virus as it has spread into different geographic locations and populations.
Published ahead of print on 20 December 2006. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»