Departments of Medicine,1 Pathology,3 Epidemiology, Johns Hopkins Medical Institutions, Baltimore, Maryland,4 Southwest Hospital, Third Military Medical University, Chongqinq, Peoples Republic of China2
Received 24 December 2003/ Returned for modification 23 March 2004/ Accepted 7 June 2004
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
HCV sequence variation is informative, because HCV replicates at high levels while employing its error-prone polymerase and appears to tolerate sequence variation and also because infected individuals apply various selective pressures on the virus. This information will be most valuable if examined longitudinally and comprehensively. Such studies of simian immunodeficiency virus-infected macaques have revealed evidence of immune escape. These studies depended in part on an inbred population of animals. In studies of humans, larger regions of viral genomes must be examined to provide adequate surveillance for change, combined with regions under less selection pressure to serve as controls. In addition, studies linking changes in separate parts of the genome (covariation) may yield important information regarding functional or structural constraints on sequence variation.
The diversity, or average genetic distance, among variants in the HCV quasispecies depends on the genomic region studied. The N terminus of the E2 protein, termed the first hypervariable region (HVR1), has the highest diversity in studies of chronically infected adults (16, 30). Higher diversity of this region has been associated with progression of liver disease in some studies and resistance to interferon therapy in others, though these results have not been consistent (4, 7, 12, 20). Variation in other regions, such as the internal ribosomal entry site at the 5' end of the genome (13, 31), the Core gene located between the internal ribosomal entry site and the envelope genes (17), and the NS3 region in the middle portion of the genome (28), have also been examined with regard to HCV pathogenesis, but it has not previously been feasible to combine these regions in a single amplicon for controlled comparisons of variation.
A previous study employing cDNA cloning of a 1-kb amplicon from acutely infected adults demonstrated that diversity in the E1 and E2 regions differed between persons with clearance versus persistence of viremia (26). There was greater nonsynonymous (amino acid sequence-changing) diversity in the more conserved E1 region in persons with clearance, and there was more nonsynonymous HVR1 diversity in those who went on to develop chronic viremia. A subsequent study using a smaller amplicon showed similar results for HVR1 but did not include the E1 region (9). Examination of the E1 and E2 regions in specimens obtained from chimpanzees during serial acute-phase HCV passage demonstrated that each animal harbored a quasispecies but that the animals had remarkable stability of the HVR1 protein sequence despite high-level replication, even during chronic infection, indicating that HCV sequence variation is not simply due to replication but results from an interaction between the host and the virus, supporting the hypothesis that it is the immune response which drives sequence variation (25). Furthermore, the rate of nonsynonymous sequence variation in the E1 and E2 genes tends to be lower in persons with rapid progression of human immunodeficiency virus disease (22). In each of these studies comparing the E1 and E2 regions, comparison of substitution rates among different regions enhanced the analytical power.
We hypothesized that amplification of larger genomic segments would be feasible, would increase analytical power of future studies, and might reveal greater complexity if primers could be designed that represent highly conserved binding domains. A previous report of RT-PCR amplification of 5-kb segments of the HCV genome was only successful with a single serum specimen that had an HCV RNA titer of 106 equivalents/ml (27). In this study we show that amplification of a large region comprising more than half the HCV genome is feasible, that it is sensitive enough for studies of acute HCV infection when viral RNA concentrations are more variable than during chronic infection, and that this technique preserves the high complexity of the viral quasispecies.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Four additional specimens were used. Plasma from subject H was obtained via plasmapheresis during acute posttransfusion non-A, non-B hepatitis on 12 July 1977 (1) and was generously provided by Robert Purcell of the National Institutes of Health. Plasma from subject SR03 was obtained 35 weeks after the date of an occupational exposure that has been described previously (33). Acute-phase serum specimens were also obtained from experimentally infected chimpanzees x304 and x361 as described previously (25).
All specimens were obtained and handled in accordance with the institutional review and animal care boards of the respective institutions.
Sequence alignment and primer design. For design of highly conserved primers, a reference alignment of nearly full-length HCV genotype 1a and 1b sequences was assembled. GenBank was searched via the Entrez interface (http://www.ncbi.nlm.nih.gov/entrez) with the search string "hcv AND 8000[SLEN]:10000[SLEN]". Duplicate and misclassified sequences were removed, and the remaining sequences were aligned using ClustalX version 1.81 (ftp://ftp-igbmc.u-strasbg.fr/pub/ClustalX/) and the alignment was corrected by hand in BioEdit version 5.0.9 (http://www.mbio.ncsu.edu/BioEdit/bioedit.html). For primer design, a 90% consensus sequence was generated with N's in the variable positions. To reduce the effect of errors made in amplification and sequencing of those genomes, we did not use a 100% consensus sequence. This 90% consensus sequence was used as input to the Primer3 server (http://www-genome.wi.mit.edu/cgi-bin/primer/primer3_www.cgi) for primer design, and settings were matched to our reaction conditions. By using the "Max #N's" option in Primer3 analyses, we were able to control the number of variable residues in our primers.
Generation of a 6-kb HCV RNA transcript for assay development. To generate a reagent for subsequent optimization of the long-template RT-PCR assay, we cloned and then transcribed a 6-kb cDNA clone. RNA was extracted with the QIAamp viral RNA mini kit (QIAGEN, Valencia, Calif.) from an HCV RNA-positive serum specimen known to contain HCV RNA at high titer. RT was performed with an RNase H mutant of Moloney murine leukemia virus reverse transcriptase (SuperScript II; Invitrogen, Carlsbad, Calif.), and PCR was performed using high-fidelity Taq polymerase (Platinum Taq DNA Polymerase High Fidelity; Invitrogen). The primers we used are listed in Table 1. We purified and cloned the second-round PCR product by using the TOPO XL PCR Cloning kit (Invitrogen).
|
RNA extraction. HCV RNA was extracted from serum or plasma (10 to 280 µl) with the QIAamp viral RNA mini kit (QIAGEN) per the manufacturer's recommendations, with the following modifications: phosphate-buffered saline was used to adjust small specimen volumes to 140 µl prior to extraction, solutions containing viral RNA were gently mixed and never vortexed after virion lysis, and viral RNA was eluted from the spin column into a tube containing 80 U of RNasin (Promega, Madison, Wis.), with a final volume 60 µl. In preliminary experiments the efficacy of the QIAGEN QIAamp viral RNA kit was compared to that of RNA extracted by using the guanidinium isothiocyanate-based TRIzol reagent (Invitrogen) following the manufacturer's recommendations.
RT and nested PCR amplification of 5.2-kb cDNA. RT from the NS4 region of HCV followed by nested PCR amplification was utilized to amplify a 5.2-kb segment of the genome. Three RT primers are listed in Table 1. The first (H77-6095a16) was the most highly conserved and resulted in amplification of most specimens. When that amplification failed, primers H77-6093a16T (generally more effective for subtype 1a specimens) and H77-6093a16C (generally more effective for subtype 1b specimens) were used. For RT reactions, 10.5 µl of extracted RNA was reverse transcribed with 200 U of SuperScript II reverse transcriptase (Invitrogen) in the presence of strand buffer (Invitrogen), 0.1 to 0.2 µM primers (Table 1), 0.5 mM deoxynucleoside triphosphates (dNTPs), 10 mM dithiothreitol, and 20 U of RNasin in a 20-µl reaction volume. For RT reactions, template RNA, dNTPs, and primer were heated to 65°C for 5 min and cooled to 42°C, and then reaction components (strand buffer, dithiothreitol, RNasin, and SuperScript II enzyme) were preheated to 45°C and added and incubated for 60 min. After RT, the temperature was increased to 70°C for 15 min, and then 2 U of RNase H (Invitrogen) was added to the reaction mixture and incubated at 37°C for 20 min.
To generate the 5.2-kb amplicon, nested PCR was performed. First-round (outer) PCR amplification was carried out in a 50-µl reaction volume containing 2 to 5 µl of cDNA, 1x PCR buffer (Invitrogen), 0.2 µM each primer (Table 1), 0.2 mM dNTPs, 2 mM MgCl2, and 2 U of Platinum Taq Polymerase High Fidelity (Invitrogen). Amplification was performed with the following cycling conditions: initial denaturation at 94°C for 2 min, 15 cycles of 20 s at 94°C, and 6 min at 68°C, followed by 15 cycles of 20 s at 94°C and 6 min at 68°C, increasing 10 s per cycle. The second (inner) round of PCR was performed with 2 µl of first-round product in a 50-µl reaction volume containing the same PCR mixture as the first round except for the primers (Table 1).
Cloning of cDNA. For cloning the 5.2-kb amplicons, PCR products were electrophoresed in SeaKem GTG low-melting-point agarose (FMC Bioproducts, Rockland, Maine) followed by gel purification, ligation, and transformation utilizing the TOPO XL PCR cloning kit (Invitrogen). Transformed cells were added to Luria-Bertani agar plates containing 50 µg of kanamycin/ml and were grown overnight at 37°C. Forty colonies chosen randomly were cultured overnight in 300 µl of Luria-Bertani broth supplemented with 50 µg of kanamycin/ml. From these cultures, 100 µl was lysed with 0.1 N NaOH and shaking for 60 min at 25°C. The base was neutralized by addition of 0.1 N HCl.
HDA+SSCP analysis.
Heteroduplex mobility assay combined with single-stranded conformational polymorphism (HDA+SSCP) analysis was performed as previously described (35) to identify clonotypes (a clonotype is defined as a group of cDNA clones with identical gel shift patterns). Briefly, PCR amplification of a 453-nucleotide region including HVR1 was performed in a 25-µl PCR containing 1 µl of alkaline lysis product, 0.4 µM each primer (Table 1), 1.5 mM MgCl2, 0.2 mM dNTPs, and 0.625 U of Platinum Taq Polymerase (Invitrogen). Incubation for 2 min at 94°C was followed by 35 cycles of 10 s at 94°C, 15 s at 62°C, and 30 s at 72°C. PCR products were visualized on 1.5% agarose gels. Thirty-three positive PCR products (2.5 µl each) were each mixed with a driver (2.5 µl of PCR product from clone 40) and 1 µl of Triple Dye loading buffer (FMC Bioproducts), heated to 95°C for 5 min, and then plunged into an ice water bath. These samples were then subjected to electrophoresis in 1x MDE gel solution (FMC Bioproducts) according to the manufacturer's protocol, except for the addition of 10% (wt/vol) urea (previously found to increase resolution [35]) in a Protean IIxi (19 by 16 by 0.1 cm) cell (Bio-Rad Laboratories, Hercules, Calif.). Electrophoresis was carried out at 130 V for 4,500 V per h (
33 h). Gels were stained with SYBR green II (1:10,000 dilution; FMC Bioproducts) at 4°C for at least 30 min. Gels were documented with an Alpha Imager 2200 gel imaging system utilizing a SYBR green filter (Alpha Innotec Corp., San Leandro, Calif.).
A clonotype was defined as two or more cloned cDNAs that have indistinguishable patterns of electrophoretic migration by HDA+SSCP. In an earlier study, the mean (± standard deviation) genetic diversity of cloned cDNAs belonging to the same clonotype (intraclonotype diversity) was 0.6% (± 0.9%), with 98.7% differing by less than 2% (35). The complexity of the quasispecies was characterized by the clonotype ratio, calculated as the number of clonotypes divided by 33, which is the number of cloned cDNAs examined. The clonotype ratio therefore varies from 0.03 (homogenous) to 1 (highly complex).
Sequencing. Nucleotide sequences were determined from cDNA clones by using a PRISM version 3100 automated sequencer (ABI, Foster City, Calif.). Sequences were assembled and analyzed by using BioEdit (http://www.mbio.ncsu.edu/BioEdit/bioedit.html), with alignment performed using the ClustalW algorithm (15). Primer sequences were removed prior to analysis.
Nucleotide sequences. Nucleotide sequences described in this report have been submitted to GenBank and have been assigned accession numbers AY725959 through AY725976.
| RESULTS |
|---|
|
|
|---|
RT-PCR optimization. Conserved and compatible primers for a 5.2-kb amplicon were found in the 5' noncoding region and near the NS3-NS4A junction (Table 1). Optimization of reaction conditions was performed with a 6-kb RNA template generated via in vitro transcription of a cDNA clone generated during pilot experiments.
Three reverse transcriptase enzymes were evaluated over the manufacturers' recommended ranges of incubation temperatures, and yield was highest with SuperScript II at 42 to 45°C. Maintaining high temperature during RT reaction setup and inclusion of RNase H digestion were required for efficient and reproducible long-amplicon RT-PCR (data not shown). Chilling RT reaction mixture components prior to incubation, a common feature of RT protocols, reduced yield significantly, as others have reported (14). As illustrated with human serum (Fig. 1A), adjustment of primer concentration to template concentration was important, and the optimal RT primer length was 16 nucleotides.
|
High-yield amplification of HCV hemigenomes from acute-phase serum. The optimized RT-PCR protocol was applied to 52 RNA-positive acute-phase specimens obtained within 3 months of onset of HCV viremia. Two aspects of assay sensitivity were examined: generation of a visible band of the expected size on an agarose gel from serum specimens of various HCV RNA concentrations and generation of diverse cDNA clones from a serum specimen containing a diverse quasispecies. Fifty-two acute-phase specimens (obtained within 3 months of onset of viremia) with a range of HCV RNA levels were examined (Table 2). RNA was extracted from 10 to 280 µl of frozen serum, reverse transcribed, and amplified by using the 5.2-kb nested PCR protocol. Amplification was successful for 49 of 52 (94%) specimens overall, in a concentration-dependent manner for all 26 (100%) specimens with HCV RNA levels greater than 105 IU/ml, 15 of 16 (94%) at 104 to 105 IU/ml, 6 of 7 (86%) at 103 to 104 IU/ml, and 2 of 3 (67%) at less than 103 IU/ml.
|
|
|
|
|
| DISCUSSION |
|---|
|
|
|---|
Whereas high complexity in the HCV quasispecies has been described based on small amplicons (less than 0.5 kb), particularly during chronic infection, the high level of complexity detected by long-amplicon RT-PCR was somewhat unexpected. First, the validity of some quasispecies complexity estimates has been questioned based on methodological concerns (29). Second, the loss of sensitivity of RT-PCR and biases due to amplification efficiency are generally thought to increase rapidly with the size of the amplicon. The results reported here address both of these concerns.
Quasispecies diversity versus misincorporation artifact. Smith et al. raise appropriate concerns about inferences of quasispecies diversity based on sequences from cDNA clones (29). In this study, we performed RT followed by two rounds of 30-cycle PCR using increased-fidelity thermostable polymerase. While estimation of the error rates of thermostable polymerases is not standardized, if one assumes an error rate of 1 in 10,000 nucleotides incorporated, then 6 per 1,000 sites would be expected to contain a misincorporated residue. Because approximately two-thirds of sites are nonsynonymous, this would result in a rate of amino acid error of approximately 0.004. For the 31 amino acids in 18 clones analyzed for Table 5, that would predict between 2 and 3 erroneous amino acid inferences. The two clones that were not represented in the Farci et al. (10) analysis differed at one amino acid residue, each from clones that were present at high frequency. While at least one of these two clones also contained two changes in the binding site for the external antisense primer used by Farci et al., possibly explaining its absence from that analysis, amplification-based misincorporations could also explain this minor discrepancy. If Farci et al. used a polymerase with a similar error rate, the 104 sequences of 31 amino acids in the HVR1 region they reported would be predicted to contain 13 misidentified residues. The HVR1 sequences of 89 clones were represented among the 18 5.2-kb clones we examined, whereas the 15 not represented differed from a more frequent clone at only one position while 1 differed at three positions. Presuming the latter clone was truly missed by our analysis of 18 clones, the 14 clones differing by one residue come remarkably close to the predicted 13. Therefore, it seems prudent to recommend awareness of this rate of error but also to recognize that the effect on estimated complexity is not large, reducing the estimated complexity by 5 to 10% in the data summarized here. Others have also suggested that the impact of misincorporation is reduced for diverse populations of sequences (6).
To address the issue of misincorporation methodologically, direct sequencing of PCR products obtained by limiting dilution has been recommended (5, 29). That approach is not presently feasible with large amplicons, because the PCR product is limiting and because, in poorly conserved regions, direct sequencing often fails. Additionally, direct sequencing of less conserved regions might be expected to introduce bias due to selective amplification. Another suggested approach, biological cloning (5), is not feasible for HCV because no efficient in vitro model is available. Instead, sequencing of cDNA clones is necessary. Others have shown that extremely low error rates can be achieved with thermostable polymerases under the appropriate conditions (8). Clonal frequency analysis can be performed as described here to permit selection of clones that are representative of the quasispecies. In addition, measures to reduce the effect of misincorporation, such as sequencing multiple variants or pooling selected clones prior to sequencing, should be considered.
Another potential source of artifactual sequence heterogeneity is PCR-mediated recombination. We took steps to limit the effect of this phenomenon by using prolonged polymerase extension times in each cycle (24) and by sequencing multiple separate cDNA clones from the same clonotype. Nevertheless, long-amplicon RT-PCR would be expected to provide an ideal environment for such events, and researchers must consider repeating experiments or testing unexpected results in other ways to avoid misinterpretation.
Amplicon size versus sensitivity and template resampling. While the sensitivity of RT and PCRs is inversely proportional to the length of the target amplicon, the order of this relationship (linear, exponential, etc.) is not clear. Advances, such as thermostable polymerases with 3' to 5' exonuclease activity, have increased the range of amplicon sizes. Therefore, the sensitivity we report here can be attributed, at least in part, to reagents that are now available. However, we still found that a reproducible amplification protocol required multiple customizations of the manufacturers' recommendations, particularly the RT step. The optimized technique is sensitive enough even for specimens with HCV RNA levels of 103 to 105 IU/ml (Table 2), as occurs frequently during acute infection and less frequently during untreated chronic viremia.
An important consideration in assessing the complexity of a quasispecies is template resampling, as described by Liu et al. (19). When HCV RNA concentration is near the limit of detection, this phenomenon can result in artifactual homogeneity of recovered amplicons. This effect is a likely explanation for the lower complexity obtained when 40 µl of serum was used with a viral titer of 4,170 IU/ml (Table 4). Extrapolation from methodologic dilution from the viral titer suggests that the result for 40 µl of input was generated from 70 IU of HCV RNA input to the RT reaction, from which one-fourth of the cDNA was used for first-round PCR. Estimates of complexity must therefore be viewed with caution if the limit of detection of the assay is not known for a particular specimen (19). On the other hand, it is not surprising that, even under circumstances of limiting template, identification of major clonotype(s) is reproducible.
We report highly complex HCV variants recovered by using long-amplicon RT-PCR primers located in highly conserved regions of the HCV genome. The high complexity of variants obtained during acute infection is consistent with some reports for which shorter amplicons were used, and detailed analysis did not support the contention that such variability is primarily due to artifactual misincorporation of nucleotides during amplification. Comprehensive and genotype-independent analysis of whole-genome amplicons could take advantage of the high conservation of the 5' and 3' termini of HCV, but that goal remains elusive.
| ACKNOWLEDGMENTS |
|---|
National Institutes of Health grant R01-DK57998 supported this research.
| FOOTNOTES |
|---|
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Antimicrob. Agents Chemother. | Clin. Microbiol. Rev. |
|---|---|
| Clin. Vaccine Immunol. | ALL ASM JOURNALS |
|---|