Journal of Clinical Microbiology, April 2006, p. 1209-1218, Vol. 44, No. 4
0095-1137/06/$08.00+0 doi:10.1128/JCM.44.4.1209-1218.2006
Copyright © 2006, American Society for Microbiology. All Rights Reserved.
*
Dominic Suciu,1,
Mark Elliott,1
Axel G. Stover,1
Marty Ross,1
Marcelo Caraballo,1
Kim Dix,1
James Crye,1
Richard J. Webby,2
Wanda J. Lyon,3
David L. Danley,1 and
Andrew McShea1
CombiMatrix Corporation, Mukilteo, Washington 98275,1 St. Jude Children's Research Hospital, Memphis, Tennessee 38105,2 AFIOH/SDE, Brooks City AFB, San Antonio, Texas 782353
Received 12 October 2005/ Returned for modification 24 November 2005/ Accepted 14 January 2006
|
|
|---|
|
|
|---|
Reference, clinical, and military laboratories must evaluate antigenic drift in the common influenza virus strains circulating each year so that these changes can be addressed in vaccine development. Also, avian influenza virus isolates must be monitored on a worldwide basis to detect virulent isolates that have the potential to infect humans and produce a future epidemic or pandemic. These needs have led to the creation of a global surveillance program to monitor outbreaks (13). The goal of surveillance is to gather information on the influenza virus subtypes that are circulating in human and animal populations so that recommendations can be made on the content of vaccines for the next season. This information is important, because genetic changes in certain influenza virus subtypes can occur rapidly. For example, variability of the H3N2 subtype has required 19 changes in the vaccine component over 29 years (from 1972) (13). New antigenic variants that require revisions in vaccine components can arise with a frequency of one every 1 to 2 years; and thus, diagnostic assays that are sensitive, specific, and accurate are required (13).
In many situations, identification of the circulating subtype is not sufficient and a specific gene sequence is required. For example, genotype Z, the dominant avian H5N1 influenza virus genotype currently circulating in Vietnam and Thailand, contains a mutation that is associated with resistance to amantadine and rimantadine (8, 36). Antiviral therapies generally should be given within 48 h of the onset of illness to be effective against human influenza (36). Thus, rapid and specific identification of this subtype and the availability of accurate sequence information are crucial for proper treatment.
Identification of influenza virus subtypes is routinely accomplished with viral detection (cell culture) and serological techniques, such as complement fixation, hemagglutination, hemagglutination inhibition assays, and immunofluorescence methods (3, 5, 25, 31). Traditional methods are generally effective but involve labor-intensive and highly trained personnel. Because of their speed, specificity, and sensitivity, genomic assays are ideal complements to serological assays for the identification of the genotype of an unknown specimen, especially in cases where antigenic tests are not specific enough to differentiate closely related groups (10, 20, 27, 29, 33, 37). Reverse transcription-PCR (RT-PCR) is widely used for virus identification (1, 14, 30). However, a positive amplification can be verified only by subsequent assays to elaborate sequence information. By overcoming this limitation, microarrays and biosensors have become valuable tools for viral discovery, detection, and genotyping (5, 6, 10, 15, 19, 20, 21, 28, 33, 34).
Microarrays that contain several thousand different DNA sequences (probes) can theoretically identify several thousand different organisms. However, by using standard thermal hybridization, microarrays have their own issues, such as suboptimal hybridization conditions for all probes, mismatches that are difficult to detect (G-G or G-T, for example), or mismatches that reside in a context (GC-rich) that makes detection problematic. Even well-designed probes can display differences in maximal hybridization capacity of 2 orders of magnitude under different hybridization conditions (7); and thus, it is difficult to find one set of conditions that is optimal for all probes on an array (16, 32). New technologies must be developed to overcome the difficulties associated with traditional microarray thermal techniques and also to develop rapid and less expensive target labeling systems and, eventually, replace expensive slide scanning devices (9).
Here we describe an improved assay that combines the sensitivity and specificity of enzymatic reactions with the cost-effective strategy of using a labeled common oligonucleotide primer that is extended to the site of the match or mismatch. This assay system can be used to identify influenza A virus subtypes and sequence the subtype of interest and requires only a 1.0- to 1.5-h hybridization and enzymatic extension-ligation step.
|
|
|---|
Broad scan array probe design. Viral sequence data were obtained from the GenBank database and from the Influenza Sequence Database (23). For HA serotypes, 1,614 animal isolates and 1,937 human isolates were selected; and for NA serotypes, 552 human isolates and 831 animal isolates were selected. Both data sets were treated in the same way by use of a modification of the method of Wang et al. (34), in which probe uniqueness was based on subtype differences. For each sequence, nonoverlapping appended primers were made, tiling the entire sequence. These oligonucleotides were designed to have similar annealing stabilities, as judged by a nearest-neighbor thermodynamic model (2), and were designed to have a melting temperature (Tm) of 50°C. Probes that had significant secondary structures (Tm > 40°C) were taken out of the set. Finally, probes from only the first 500 bp of sequence were used (bp 50 to 500). After tiling and culling, 23,568 HA probes and 15,191 NA probes were left. Each sequence and probe was grouped and labeled by its serotype, and databases were generated from the compiled HA and NA sequences of the isolates. Probes were selected to be exclusive to a given subtype, as judged by pairwise BLASTN search (4). Figure 1 shows the overall scope of the design. It shows the number of sequences in the initial database, the number of probes designed, and the final number of probes selected for each subtype. A poly(T)10 spacer was added to the 3' ends of all probes to avoid surface inhibition. Probe design files for array synthesis were generated with Layout Designer (CombiMatrix Corp., Mukilteo, WA). Oligonucleotide microarrays were synthesized on semiconductor microchips containing over 12,000 independently addressable electrodes (CustomArray; CombiMatrix Corp.).
![]() View larger version (42K): [in a new window] |
FIG. 1. Number of sequences in the initial database, number of probes designed, and final number of probes selected for each subtype. BinStr, subtype identity for each probe after BLASTN searches.
|
![]() View larger version (20K): [in a new window] |
FIG. 2. Diagram of the strategy for microarray DNA sequencing. Four target-specific (antisense) probes, one set for each base of desired sequence, are identical except for the 5'-terminal residue (the probe terminates in bases A, C, G, and T). After hybridization of the probes, target DNA, and Cy3-labeled primer, a mixture of enzymes, buffer, and deoxynucleoside triphosphates is added to the array. The labeled primer is extended and ligated to the matching probe, and finally, the array is washed with 0.1 N NaOH and scanned for fluorescence. High fluorescence intensity indicates that the labeled primer is covalently bound to the probe that is a perfect match to the target. The sequence can be extracted by joining bases that are associated with an elevated signal.
|
|
View this table: [in a new window] |
TABLE 1. Reference antigens for HA and NA subtypes
|
Standard hybridization conditions for the biotinylated target included preblocking for 15 min at 45°C with 6x SSPE (1x SSPE is 0.18 M NaCl, 10 mM NaH2PO4, and 1 mM EDTA [pH 7.7]) containing 0.05% Tween 20 (SSPET), 2.0 mM EDTA, 5x Denhardt's solution, and 0.05% sodium dodecyl sulfate, followed by hybridization of the biotinylated target in preblock solution for 1 h at 45°C. The arrays were then washed once for 5 min with 6x SSPET at 45°C and then for 30 s each with 3x SSPET, 0.5x SSPET, 2x PBST, and 2x PBS at room temperature. The hybridized array was then blocked with 5x casein-PBS buffer (BioFX Laboratories, Owings Mills, MD) for 15 min at room temperature and labeled for 30 min with Cy5-streptavidin (GE Healthcare, Amersham Biosciences, Piscataway, NJ) diluted 1:1,000 in 5x casein-PBS buffer. The arrays were scanned after they were washed twice with 2x PBST and twice with 2x PBS.
Data analysis. The microarrays were scanned for probe and target fluorescence intensity with a GenePix 4000B optical scanner (Molecular Devices). Image intensities were quantified with Microarray Imager (CombiMatrix Corp.) and graphed with Microsoft Excel software. Subtype identification was accomplished by averaging HA and NA subtype intensity values and then plotting the results in Excel software. HA and NA subtype DNA sequence information was generated from sequencing array intensity data with an Excel routine designed to interrogate units of 4 datum points and then associate the most intense signal with the nucleotide represented by that probe. Sequence strings were then used to search the GenBank nonredundant database with the BLASTN program.
Statistical analysis of influenza virus subtype. The data used for analysis comprised measurements carried out with 15 HA subtypes represented by 7,354 probes and 9 NA subtypes represented by 4,646 probes. Data were collected in three steps: after hybridization, after an enzymatic method was used to covalently attach the perfectly matched sequence at each probe, and after a stringent wash procedure was performed to remove spurious signals. In total, there were 177,264 observations. Each observation comprised four categorical variables, hybridization label (hyblabel), hybridization type (wash), protein type (code prot), and predicted bound sequence (imlabel), and seven continuous variables, sum of intensity (sumhit), average intensity (avg), standard deviation (standev), predicted number of BLAST hits (prednumhit), probes number of probes not hit by BLAST (predno), energy of hybridization (dG), and correlation of dG with average intensity (correl). The data were normalized and reported as standard scores (Z values). A binary variable (call) was coded as a correct influenza virus association (a value of 1) or an incorrect influenza virus association (a value of 0).
|
|
|---|
![]() View larger version (47K): [in a new window] |
FIG. 3. Graphed probe intensity values for an array hybridized with an H2N2 target. When the data were extracted immediately after hybridization (A), abundant cross hybridization was visible. After enzymatic reactions and stringent washing, the background signal was drastically reduced to expose specific hybridizations (B). Regions of the array that were populated with subtype-specific probes are shown (C). Data were sorted by gene (HA or NA), and the greatest intensities were identified (D).
|
![]() View larger version (34K): [in a new window] |
FIG. 4. Identification of unknown human influenza A virus isolates. Three influenza A virus isolates of unknown subtype were amplified with a universal forward and pooled reverse primers, followed by one-way amplification with the same pooled reverse primers; and then the isolates were hybridized to the broad-scan influenza virus array. After the array was scanned, the extracted fluorescence intensity data for each subtype probe group were averaged and plotted. The correct subtypes were indicated by the highest average signal. The background signal (BG) was calculated from the average signal of the quality control probes.
|
![]() View larger version (72K): [in a new window] |
FIG. 5. Visual identification of HA and NA subtypes for H2N2 (top) and H15N9 (bottom). Arrays were hybridized with reference subtype target, labeled primer was extended and ligated, and the arrays were finally washed and scanned. The schematic below the arrays shows regions populated with subtype-specific probes.
|
![]() View larger version (59K): [in a new window] |
FIG. 6. Mean values of probe hybridization intensities for hemagglutinin reference subtypes 1 through 15. For each individual graph, probe signal intensity values are indicated on the left and specific subtype probes are indicated at the bottom. An asterisk indicates that standard hybridizations with biotinylated targets were used instead of the extension-ligation method and that labeling was with streptavidin-Cy5.
|
![]() View larger version (32K): [in a new window] |
FIG. 7. Mean values of probe hybridization intensities for neuraminidase reference subtypes 1 through 9. For each individual graph, probe signal intensity values are indicated on the left, and specific subtype probes are indicated at the bottom. An asterisk indicates that standard hybridizations were used and that labeling was with streptavidin-Cy5.
|
![]() View larger version (34K): [in a new window] |
FIG. 8. Comparison of the number of probes per HA (A) and NA (B) subtype with the average signal intensity for each subtype. In many cases, the two values show an inverse relationship due to the averaging of large numbers of probe intensity values.
|
![]() View larger version (16K): [in a new window] |
FIG. 9. Combined histogram of normalized, average probe intensity values (Z scores). The correct subtype prediction is indicated (positive), and all negative values are bracketed (negative). Data statistics are a mean of 0.01785, a standard deviation of 0.12136, and a sample number of 177,264.
|
![]() View larger version (55K): [in a new window] |
FIG. 10. Example of sequencing of the neuraminidase gene for target DNA from H9N2 isolate HK/NT16/99 with probes designed from the same sequence (A/Chicken/Hong Kong/NT16/99 [H9N2]). Graphed probe signal intensities (A) and the results of a BLASTN search of the GenBank database with the resulting DNA sequence (B) are shown. The sequence was obtained with software designed to interrogate the signal intensities for each set of four probes and to associate a specific nucleotide with the highest signal and then string the sequence together in series.
|
|
|
|---|
For this study we have developed an influenza A virus array that contains specific probes for each of the 15 HA subtypes and 9 NA subtypes and then hybridized targets generated from all 15 HA and 9 NA subtypes by RT-PCR amplification of influenza A reference virus RNA. The array was developed with nonoverlapping probes with similar annealing stabilities that were generated from the influenza virus sequence database, which consists of over 3,000 HA sequences and over 1,000 NA sequences. Subtype-specific probes were then selected from a pool of over 23,000 HA sequences and 15,000 NA sequences and then compared to the database to ensure that each probe was unique to the respective subtype and would hybridize to the maximum number of variant sequences. Our goal is to have maximum coverage of all known influenza virus strains and provide useful information even when a novel strain is encountered. By shifting the burden of bioinformatics to the beginning of the process rather than relying on sequencing and a subsequent BLAST search and bioinformatic analysis, this system can be used by less sophisticated users or users in the field, where access to complex analysis tools may be limited.
All HA and NA subtypes were correctly identified with this assay platform. Weak average intensity profiles for some subtypes were due to dilution of a positive signal by subtype probes that were not hybridized or that weakly hybridized to the subtype target. In these cases the average signals will be reduced by dilution of the signal. By eliminating unnecessary or cross-reacting probes and limiting the probe number to approximately 100 to 200 of the most universal sequences for each subtype, we can correct this artifact of the array. This approach would also reduce the number of probes so that multiple assays can be run on an array that is divided into several sectors. This again could be used to provide cost savings per assay with a minimal loss in capability. A second approach is to subdivide the probes for each subtype into similar clusters and thus concentrate the positive probes, which would increase the average signal for a positive identification. These approaches would also produce positive probe sequences that are tiled across the viral sequence of interest and should result in an approximation or a best-fit sequence for the unknown subtype.
In addition, with the subtype sequencing array and protocol presented here, we are able to sequence approximately 500 or more nucleotides of the HA and NA genes after the subtypes have been identified with our broad-scan subtyping array. Our strategy for sequencing probe selection is based upon several criteria, including Tm, length, and location in the HA and NA gene sequences. Published structural analysis and antigenic epitope mapping indicate that the HA receptor-binding structure and surface antigenic epitopes are predominantly located within the 5' 720 bp (12, 17, 18).
With this sequencing assay, approximately 95% or more of the bases were accurately called (998 of 1,043 bases). Miscalls were predominately due to strong secondary structures, which can be predicted and avoided before the assay is carried out, and the ligated mismatches A-G, A-A, T-G, G-G, and T-T. These mismatches are generally more difficult to detect because of their low delta G values. For example, sequence errors resulting from A-G mismatches represented 1.4% of the total errors or 6% of the potential A-G mismatches (15 of 251). Strong secondary structures (hairpins and palindromes) interfere with probe-target hybridization and result in a reduced signal. These sequencing arrays will eventually contain either a consensus subtype sequence or a known subtype sequence that lacks a high degree of secondary structure. Replicate probes for each base of sequence should reduce artifacts due to difficult mismatches by averaging out the mismatch signal. Because microarray-based sequencing is based on probe-target hybridization, the target sequence cannot diverge significantly from the arrayed sequence. However, under nonstringent hybridization conditions, internal mismatches between the probe and the target sequences do not have as great an impact on hybridization and sequencing. This technique is best suited for the sequencing of similar viruses, such as seasonal quasispecies complexes, or for surveys for mutations in an isolate over time.
We have shown that influenza A virus HA subtypes 1 through 15 and NA subtypes 1 through 9 can be rapidly and specifically identified and sequenced by using oligonucleotide microarrays by a protocol that requires less than 1 h for target hybridization. This assay precludes the need for traditional target labeling systems and integrates an enzyme-based procedure that overcomes many of the shortfalls of traditional thermal hybridizations, such as optimal hybridization conditions and difficult mismatch detection (28). However, the influenza virus subtyping array is also compatible with traditional labeling, hybridization, and washing protocols that can be completed within 1.0 to 1.5 h. These methods lack the enzymatic step and have slightly reduced single-base mismatch discrimination. However, use of this method allows the array to be "stripped" and reused multiple times since there is no covalent coupling of the label to the array. In addition, this array can also benefit from the sectoring approach mentioned above to further bring costs to a minimum. This platform is a viable alternative to RT-PCR because of the combination of assay speed; array sectoring, which would allow multiple assays on one array; the potential to strip and reuse the chip up to five times (for conventional hybridizations); and the adaptability to inexpensive electrochemical scanning devices.
The target sample preparation system used here is similar to that used for standard RT-PCR-based methods, except that it uses a very redundant consensus priming system that maximizes the chance that novel strains of influenza virus will be amplified and thus minimizes false-negative results. The chip contains multiple probes that correspond to key distinguishing elements of each HA or NA subtype. It is laid out in a visual pattern so that it can be read visually for quick identification as well as analyzed with more advanced algorithms. The system can also identify rare versus more commonly seen genetic variants based on the organization of subtype-specific probes (i.e., the probes are arranged in order from more universal to more specific for each subtype).
Rapid identification of the HA and NA subtypes followed by sequencing from critical regions of the HA and NA genes, such as surface antigenic epitopes, will significantly decrease the time and cost for the identification of potential lethal virus strains. This study and studies in other laboratories are demonstrating that the detection, identification, and sequencing of viral genomes in samples by using the oligonucleotide microarray technology in combination with electrochemical detection is a viable rapid approach that can complement traditional methods.
We thank Philip Miller for help with statistical analysis of data and Luisa Dougan for microarray quality control tests.
M.J.L. and D.S. contributed equally to this study. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»