ABSTRACT
Hepatitis C virus (HCV) exists as a swarm of genetically distinct but related variants, or a quasispecies, whose complexity and sequence evolution are critical to studies of viral pathogenesis. Because most studies of the HCV quasispecies have focused on a relatively small genomic segment, the first hypervariable region of the E2 gene, it is possible that viral complexity is occasionally underestimated (due to primer mismatch) and that sequence evolution is misperceived due to unrecognized covariation. This report describes a sensitive and reproducible method to amplify most of the HCV genome as a single 5.2-kb amplicon by using primers directed at relatively conserved genomic segments. Using 52 specimens obtained during acute infection over a range of viral RNA concentrations, the overall rate of successful amplification was 94% and varied in a concentration-dependent manner, with successful amplification in 26 of 26 (100%) specimens at greater than 105 IU/ml, 15 of 16 (94%) at 104 to 105 IU/ml, 6 of 7 (86%) at 103 to 104 IU/ml, and 2 of 3 (67%) at less than 103 IU/ml. Quasispecies complexity, determined by using this novel long-amplicon method followed by heteroduplex mobility assay combined with single-stranded conformational polymorphism (HDA+SSCP) analysis, was very high, even during acute HCV infection, when 10 to 21 (median, 16) different HDA+SSCP patterns were detected among 33 cDNA clones examined. Replicate analyses indicate that this diversity is not dominated by random errors generated during amplification. Therefore, the HCV quasispecies is highly complex even during acute infection and is accurately represented in amplicons representing more than half of the viral genome.
Hepatitis C virus (HCV) is a medically important human pathogen, infecting about 170 million persons worldwide and representing the single most common cause of liver disease requiring liver transplantation in the United States (2, 3, 34). The viral or host factors that determine the course of HCV infections are unclear, and these studies have been hampered by the lack of a well-characterized small-animal model. Observational studies of infected humans are feasible because of the high prevalence of HCV infection but are limited by small amplicons obtained by conventional reverse transcription-PCR (RT-PCR) methods. Amplification of small genomic segments of HCV by RT-PCR is commonplace and demonstrates the presence of a viral quasispecies or a swarm of genetically related but distinct variants (23).
HCV sequence variation is informative, because HCV replicates at high levels while employing its error-prone polymerase and appears to tolerate sequence variation and also because infected individuals apply various selective pressures on the virus. This information will be most valuable if examined longitudinally and comprehensively. Such studies of simian immunodeficiency virus-infected macaques have revealed evidence of immune escape. These studies depended in part on an inbred population of animals. In studies of humans, larger regions of viral genomes must be examined to provide adequate surveillance for change, combined with regions under less selection pressure to serve as controls. In addition, studies linking changes in separate parts of the genome (covariation) may yield important information regarding functional or structural constraints on sequence variation.
The diversity, or average genetic distance, among variants in the HCV quasispecies depends on the genomic region studied. The N terminus of the E2 protein, termed the first hypervariable region (HVR1), has the highest diversity in studies of chronically infected adults (16, 30). Higher diversity of this region has been associated with progression of liver disease in some studies and resistance to interferon therapy in others, though these results have not been consistent (4, 7, 12, 20). Variation in other regions, such as the internal ribosomal entry site at the 5′ end of the genome (13, 31), the Core gene located between the internal ribosomal entry site and the envelope genes (17), and the NS3 region in the middle portion of the genome (28), have also been examined with regard to HCV pathogenesis, but it has not previously been feasible to combine these regions in a single amplicon for controlled comparisons of variation.
A previous study employing cDNA cloning of a 1-kb amplicon from acutely infected adults demonstrated that diversity in the E1 and E2 regions differed between persons with clearance versus persistence of viremia (26). There was greater nonsynonymous (amino acid sequence-changing) diversity in the more conserved E1 region in persons with clearance, and there was more nonsynonymous HVR1 diversity in those who went on to develop chronic viremia. A subsequent study using a smaller amplicon showed similar results for HVR1 but did not include the E1 region (9). Examination of the E1 and E2 regions in specimens obtained from chimpanzees during serial acute-phase HCV passage demonstrated that each animal harbored a quasispecies but that the animals had remarkable stability of the HVR1 protein sequence despite high-level replication, even during chronic infection, indicating that HCV sequence variation is not simply due to replication but results from an interaction between the host and the virus, supporting the hypothesis that it is the immune response which drives sequence variation (25). Furthermore, the rate of nonsynonymous sequence variation in the E1 and E2 genes tends to be lower in persons with rapid progression of human immunodeficiency virus disease (22). In each of these studies comparing the E1 and E2 regions, comparison of substitution rates among different regions enhanced the analytical power.
We hypothesized that amplification of larger genomic segments would be feasible, would increase analytical power of future studies, and might reveal greater complexity if primers could be designed that represent highly conserved binding domains. A previous report of RT-PCR amplification of 5-kb segments of the HCV genome was only successful with a single serum specimen that had an HCV RNA titer of 106 equivalents/ml (27). In this study we show that amplification of a large region comprising more than half the HCV genome is feasible, that it is sensitive enough for studies of acute HCV infection when viral RNA concentrations are more variable than during chronic infection, and that this technique preserves the high complexity of the viral quasispecies.
MATERIALS AND METHODS
Subjects.We studied participants in a Baltimore-based longitudinal cohort of persons who acknowledged recent injection drug use (11). Serum or plasma was collected monthly, immediately separated from cells, and stored at −80°C. Specimens were tested for the presence of HCV antibodies by using an enzyme immunoassay (HCV EIA 3.0; Ortho Diagnostic System, Raritan, N.J.). To confirm newly HCV-positive results for a subject, specimens were retested in duplicate alongside the participant's previously negative samples. In addition, study participants positive for HCV antibodies were tested for the presence of HCV RNA by utilizing a quantitative RT-PCR assay (COBAS Amplicor HCV Monitor version 2.0; Roche Molecular Systems, Branchburg, N.J.).
Four additional specimens were used. Plasma from subject H was obtained via plasmapheresis during acute posttransfusion non-A, non-B hepatitis on 12 July 1977 (1) and was generously provided by Robert Purcell of the National Institutes of Health. Plasma from subject SR03 was obtained 35 weeks after the date of an occupational exposure that has been described previously (33). Acute-phase serum specimens were also obtained from experimentally infected chimpanzees ×304 and ×361 as described previously (25).
All specimens were obtained and handled in accordance with the institutional review and animal care boards of the respective institutions.
Sequence alignment and primer design.For design of highly conserved primers, a reference alignment of nearly full-length HCV genotype 1a and 1b sequences was assembled. GenBank was searched via the Entrez interface (http://www.ncbi.nlm.nih.gov/entrez ) with the search string “hcv AND 8000[SLEN]:10000[SLEN]”. Duplicate and misclassified sequences were removed, and the remaining sequences were aligned using ClustalX version 1.81 (ftp://ftp-igbmc.u-strasbg.fr/pub/ClustalX/ ) and the alignment was corrected by hand in BioEdit version 5.0.9 (http://www.mbio.ncsu.edu/BioEdit/bioedit.html ). For primer design, a 90% consensus sequence was generated with N′s in the variable positions. To reduce the effect of errors made in amplification and sequencing of those genomes, we did not use a 100% consensus sequence. This 90% consensus sequence was used as input to the Primer3 server (http://www-genome.wi.mit.edu/cgi-bin/primer/primer3_www.cgi ) for primer design, and settings were matched to our reaction conditions. By using the “Max #N′s” option in Primer3 analyses, we were able to control the number of variable residues in our primers.
Generation of a 6-kb HCV RNA transcript for assay development.To generate a reagent for subsequent optimization of the long-template RT-PCR assay, we cloned and then transcribed a 6-kb cDNA clone. RNA was extracted with the QIAamp viral RNA mini kit (QIAGEN, Valencia, Calif.) from an HCV RNA-positive serum specimen known to contain HCV RNA at high titer. RT was performed with an RNase H− mutant of Moloney murine leukemia virus reverse transcriptase (SuperScript II; Invitrogen, Carlsbad, Calif.), and PCR was performed using high-fidelity Taq polymerase (Platinum Taq DNA Polymerase High Fidelity; Invitrogen). The primers we used are listed in Table 1. We purified and cloned the second-round PCR product by using the TOPO XL PCR Cloning kit (Invitrogen).
Primers used in this study
For runoff transcription the 6-kb cDNA clone was linearized by digestion with HindIII for 6 h at 37°C. Restriction products were electrophoresed in low-melting-point agarose (SeaKem GTG), visualized with crystal violet, excised from the gel, and purified as described below for 5.2-kb products. RNA was transcribed by using T7 RNA Polymerase (Invitrogen) for 6 h at 37°C, digested with 10 U of DNase (DNase I, RNase-free; Roche Molecular Biochemical, Mannheim, Germany), and then purified by phenol-chloroform extraction and precipitation. RNA was resuspended in diethyl pyrocarbonate-treated water and was spectrophotometrically quantitated. The product appeared as a single sharp band of the appropriate size on electrophoretic analysis (data not shown).
RNA extraction.HCV RNA was extracted from serum or plasma (10 to 280 μl) with the QIAamp viral RNA mini kit (QIAGEN) per the manufacturer's recommendations, with the following modifications: phosphate-buffered saline was used to adjust small specimen volumes to 140 μl prior to extraction, solutions containing viral RNA were gently mixed and never vortexed after virion lysis, and viral RNA was eluted from the spin column into a tube containing 80 U of RNasin (Promega, Madison, Wis.), with a final volume 60 μl. In preliminary experiments the efficacy of the QIAGEN QIAamp viral RNA kit was compared to that of RNA extracted by using the guanidinium isothiocyanate-based TRIzol reagent (Invitrogen) following the manufacturer's recommendations.
RT and nested PCR amplification of 5.2-kb cDNA.RT from the NS4 region of HCV followed by nested PCR amplification was utilized to amplify a 5.2-kb segment of the genome. Three RT primers are listed in Table 1. The first (H77-6095a16) was the most highly conserved and resulted in amplification of most specimens. When that amplification failed, primers H77-6093a16T (generally more effective for subtype 1a specimens) and H77-6093a16C (generally more effective for subtype 1b specimens) were used. For RT reactions, 10.5 μl of extracted RNA was reverse transcribed with 200 U of SuperScript II reverse transcriptase (Invitrogen) in the presence of strand buffer (Invitrogen), 0.1 to 0.2 μM primers (Table 1), 0.5 mM deoxynucleoside triphosphates (dNTPs), 10 mM dithiothreitol, and 20 U of RNasin in a 20-μl reaction volume. For RT reactions, template RNA, dNTPs, and primer were heated to 65°C for 5 min and cooled to 42°C, and then reaction components (strand buffer, dithiothreitol, RNasin, and SuperScript II enzyme) were preheated to 45°C and added and incubated for 60 min. After RT, the temperature was increased to 70°C for 15 min, and then 2 U of RNase H (Invitrogen) was added to the reaction mixture and incubated at 37°C for 20 min.
To generate the 5.2-kb amplicon, nested PCR was performed. First-round (outer) PCR amplification was carried out in a 50-μl reaction volume containing 2 to 5 μl of cDNA, 1× PCR buffer (Invitrogen), 0.2 μM each primer (Table 1), 0.2 mM dNTPs, 2 mM MgCl2, and 2 U of Platinum Taq Polymerase High Fidelity (Invitrogen). Amplification was performed with the following cycling conditions: initial denaturation at 94°C for 2 min, 15 cycles of 20 s at 94°C, and 6 min at 68°C, followed by 15 cycles of 20 s at 94°C and 6 min at 68°C, increasing 10 s per cycle. The second (inner) round of PCR was performed with 2 μl of first-round product in a 50-μl reaction volume containing the same PCR mixture as the first round except for the primers (Table 1).
Cloning of cDNA.For cloning the 5.2-kb amplicons, PCR products were electrophoresed in SeaKem GTG low-melting-point agarose (FMC Bioproducts, Rockland, Maine) followed by gel purification, ligation, and transformation utilizing the TOPO XL PCR cloning kit (Invitrogen). Transformed cells were added to Luria-Bertani agar plates containing 50 μg of kanamycin/ml and were grown overnight at 37°C. Forty colonies chosen randomly were cultured overnight in 300 μl of Luria-Bertani broth supplemented with 50 μg of kanamycin/ml. From these cultures, 100 μl was lysed with 0.1 N NaOH and shaking for 60 min at 25°C. The base was neutralized by addition of 0.1 N HCl.
HDA+SSCP analysis.Heteroduplex mobility assay combined with single-stranded conformational polymorphism (HDA+SSCP) analysis was performed as previously described (35) to identify clonotypes (a clonotype is defined as a group of cDNA clones with identical gel shift patterns). Briefly, PCR amplification of a 453-nucleotide region including HVR1 was performed in a 25-μl PCR containing 1 μl of alkaline lysis product, 0.4 μM each primer (Table 1), 1.5 mM MgCl2, 0.2 mM dNTPs, and 0.625 U of Platinum Taq Polymerase (Invitrogen). Incubation for 2 min at 94°C was followed by 35 cycles of 10 s at 94°C, 15 s at 62°C, and 30 s at 72°C. PCR products were visualized on 1.5% agarose gels. Thirty-three positive PCR products (2.5 μl each) were each mixed with a driver (2.5 μl of PCR product from clone 40) and 1 μl of Triple Dye loading buffer (FMC Bioproducts), heated to 95°C for 5 min, and then plunged into an ice water bath. These samples were then subjected to electrophoresis in 1× MDE gel solution (FMC Bioproducts) according to the manufacturer's protocol, except for the addition of 10% (wt/vol) urea (previously found to increase resolution [35]) in a Protean IIxi (19 by 16 by 0.1 cm) cell (Bio-Rad Laboratories, Hercules, Calif.). Electrophoresis was carried out at 130 V for 4,500 V per h (∼33 h). Gels were stained with SYBR green II (1:10,000 dilution; FMC Bioproducts) at 4°C for at least 30 min. Gels were documented with an Alpha Imager 2200 gel imaging system utilizing a SYBR green filter (Alpha Innotec Corp., San Leandro, Calif.).
A clonotype was defined as two or more cloned cDNAs that have indistinguishable patterns of electrophoretic migration by HDA+SSCP. In an earlier study, the mean (± standard deviation) genetic diversity of cloned cDNAs belonging to the same clonotype (intraclonotype diversity) was 0.6% (± 0.9%), with 98.7% differing by less than 2% (35). The complexity of the quasispecies was characterized by the clonotype ratio, calculated as the number of clonotypes divided by 33, which is the number of cloned cDNAs examined. The clonotype ratio therefore varies from 0.03 (homogenous) to 1 (highly complex).
Sequencing.Nucleotide sequences were determined from cDNA clones by using a PRISM version 3100 automated sequencer (ABI, Foster City, Calif.). Sequences were assembled and analyzed by using BioEdit (http://www.mbio.ncsu.edu/BioEdit/bioedit.html ), with alignment performed using the ClustalW algorithm (15). Primer sequences were removed prior to analysis.
RESULTS
Multikilobase HCV RT-PCR amplicons obtained from subjects with acute HCV infection.Initial efforts demonstrated the feasibility of multikilobase amplification (i.e., amplification with amplicons that are each more than 1 kb in length) by using serum from subjects with acute HCV infection. Complexity was higher in 5-kb than in 1-kb amplicons from the same specimen (21). These results demonstrated the feasibility of this approach but were inconclusive, because most specimens were not amplifiable. Therefore, the most likely points of failure were addressed prior to extending this approach to more specimens.
RT-PCR optimization.Conserved and compatible primers for a 5.2-kb amplicon were found in the 5′ noncoding region and near the NS3-NS4A junction (Table 1). Optimization of reaction conditions was performed with a 6-kb RNA template generated via in vitro transcription of a cDNA clone generated during pilot experiments.
Three reverse transcriptase enzymes were evaluated over the manufacturers' recommended ranges of incubation temperatures, and yield was highest with SuperScript II at 42 to 45°C. Maintaining high temperature during RT reaction setup and inclusion of RNase H digestion were required for efficient and reproducible long-amplicon RT-PCR (data not shown). Chilling RT reaction mixture components prior to incubation, a common feature of RT protocols, reduced yield significantly, as others have reported (14). As illustrated with human serum (Fig. 1A), adjustment of primer concentration to template concentration was important, and the optimal RT primer length was 16 nucleotides.
Optimization of 5.2-kb nested RT-PCR conditions. (A) Effect of RT primer length and concentration on detection of HCV RNA by 5.2-kb nested RT-PCR and agarose gel electrophoresis of PCR products. Primer lengths and the RT primer concentrations are indicated. The concentration of HCV RNA in a human serum specimen was determined, followed by RNA extraction, dilution in water, and addition to RT reactions in the indicated amounts. (B) Effect of number of PCR cycles on detection of synthetic HCV RNA (transcribed from a 6-kb cDNA clone) using nested RT-PCR and agarose gel electrophoresis. Estimated RNA copies are indicated across the top. RT and PCR were performed as described in Materials and Methods, with the number of PCR cycles in the two nested rounds adjusted as indicated on the left, 20/30 indicating 20 cycles of PCR in the first (outer) round and 30 cycles in the second (inner) round. (−), Negative (water) control; nt, nucleotide.
To determine the number of PCR cycles necessary to achieve the greatest sensitivity in generating 5.2-kb amplicons, cDNA reverse transcribed from the in vitro-transcribed RNA was used as template. In preliminary experiments, 30 to 40 rounds of PCR were not sufficient (data not shown), so a nested PCR assay was used. The number of cycles of PCR in each nested round was varied, and 30 cycles for both the first and second round consistently amplified 10 RNA genome equivalents (Fig. 1B). Additional experiments using the additives betaine, dimethyl sulfoxide, and trehalose did not demonstrate enhanced amplification as others have reported (27, 32).
High-yield amplification of HCV hemigenomes from acute-phase serum.The optimized RT-PCR protocol was applied to 52 RNA-positive acute-phase specimens obtained within 3 months of onset of HCV viremia. Two aspects of assay sensitivity were examined: generation of a visible band of the expected size on an agarose gel from serum specimens of various HCV RNA concentrations and generation of diverse cDNA clones from a serum specimen containing a diverse quasispecies. Fifty-two acute-phase specimens (obtained within 3 months of onset of viremia) with a range of HCV RNA levels were examined (Table 2). RNA was extracted from 10 to 280 μl of frozen serum, reverse transcribed, and amplified by using the 5.2-kb nested PCR protocol. Amplification was successful for 49 of 52 (94%) specimens overall, in a concentration-dependent manner for all 26 (100%) specimens with HCV RNA levels greater than 105 IU/ml, 15 of 16 (94%) at 104 to 105 IU/ml, 6 of 7 (86%) at 103 to 104 IU/ml, and 2 of 3 (67%) at less than 103 IU/ml.
Sensitivity of 5.2-kb RT-PCR in specimens obtained during acute HCV infection
High complexity of amplified hemigenomes.To confirm the preliminary finding of high complexity in products of long-amplicon RT-PCR, seven specimens were subjected to both the 5.2-kb protocol and the previously described 1-kb E1/E2 amplification protocol. As illustrated in Fig. 2, the 5.2-kb clones were at least as complex as 1-kb clones.
Complexity of cDNA clones obtained by two methods. We amplified 1- or 5.2-kb amplicons from specimens from five study subjects with acute HCV infection. Amplicons were cloned, and clonotypes were identified by using the HDA+SSCP method. Clonotype ratio is the number of clonotypes divided by the number of clones examined. ID, identity.
Reproducible detection of high diversity.To ensure that the HDA+SSCP method was reproducible, cDNA clones from three specimens were examined by using two different driver sequences, with very similar results for three different specimens (Table 3). To evaluate the method's overall reproducibility, one human specimen was examined four times, once using the 1-kb RT-PCR and three times using the 5.2-kb RT-PCR and varying the amount of plasma (Table 4). Operators A and B obtained similar values for complexity (clonotype ratio), and the clonal frequency distributions were similar for estimated input RNA amounts greater than 100 IU. In addition, HVR1 sequences from one randomly selected clone representing the majority clonotype from each of these replicates yielded sequences that differed by no more than 1 nucleotide (data not shown).
Clonal frequency analysis by HDA + SSCP: effect of driver clone on clonotype distribution in acute-phase specimens
Reproducibility of clonal frequency analysis in a low-titer specimen, SR03, obtained 35 weeks after occupational exposure
To compare the results of the 5.2-kb RT-PCR protocol to a benchmark, long-amplicon RT-PCR was applied to plasma specimen H77. Specimen H77 contains what is perhaps the best-characterized HCV quasispecies, because it has been distributed widely as a reference reagent, was the source of the sequence for the first infectious HCV clone (18), and was previously examined in detail by small-amplicon RT-PCR and cloning (10). In the latter study, 573 nucleotides spanning HVR1 were amplified and cloned; 104 clones were selected at random and were sequenced. The sequences obtained by using the 5.2-kb method, with sequencing of 18 randomly selected clones, revealed the sequences displayed in Table 5. Of these 18 clones, 16 were also observed by Farci et al. (10) and represent 86% of the sequence patterns they observed, with a similar distribution. The clones observed by Farci et al. (10) but not detected by long-amplicon RT-PCR had clonal frequencies of 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, and 1 out of the 104 clones; none was a frequently observed sequence (<2% in all cases), and all of the singleton clones were one substitution different from a more frequent clone, suggesting that some of them may have been sporadic substitutions (29). It is also likely that if we had examined more clones we might have detected more of these low-frequency variants. Similarly, the two clones we observed that were not observed by Farci et al. (10) differed from more frequently observed clones at one position each and may also have been artifactual; however, sequencing of the binding region for the primer used by Farci et al. used for RT and first-round PCR identified two mismatches (at positions 2 and 6 from the 3′ end of the primer) between the primer and one of these two clones, possibly explaining the lack of detection (data not shown). The binding region of the sense primers was not sequenced.
Sequences of HVR1 in cDNA clones representing H77 acute-phase plasma
DISCUSSION
This report describes detection of the high complexity of HCV quasispecies during acute infection by using a method for amplifying most of the viral genome from human serum by nested RT-PCR at a level of sensitivity that makes it practical for studying acute and chronic HCV infection (Table 2). Additionally, the results obtained were reproducible by multiple laboratory personnel when more than 100 IU of input RNA was used (Table 4) and were comparable to results obtained with smaller amplicons in other laboratories (Table 5). The primers listed in Table 1 were designed based on available genotype 1 sequences (as described in Materials and Methods). Inspection of the primers and limited application to other genotypes suggest that modification is necessary to amplify other genotypes (data not shown). Genotypes other than 1 constitute approximately 5% of acute infections in our cohort (data not shown).
Whereas high complexity in the HCV quasispecies has been described based on small amplicons (less than 0.5 kb), particularly during chronic infection, the high level of complexity detected by long-amplicon RT-PCR was somewhat unexpected. First, the validity of some quasispecies complexity estimates has been questioned based on methodological concerns (29). Second, the loss of sensitivity of RT-PCR and biases due to amplification efficiency are generally thought to increase rapidly with the size of the amplicon. The results reported here address both of these concerns.
Quasispecies diversity versus misincorporation artifact.Smith et al. raise appropriate concerns about inferences of quasispecies diversity based on sequences from cDNA clones (29). In this study, we performed RT followed by two rounds of 30-cycle PCR using increased-fidelity thermostable polymerase. While estimation of the error rates of thermostable polymerases is not standardized, if one assumes an error rate of 1 in 10,000 nucleotides incorporated, then 6 per 1,000 sites would be expected to contain a misincorporated residue. Because approximately two-thirds of sites are nonsynonymous, this would result in a rate of amino acid error of approximately 0.004. For the 31 amino acids in 18 clones analyzed for Table 5, that would predict between 2 and 3 erroneous amino acid inferences. The two clones that were not represented in the Farci et al. (10) analysis differed at one amino acid residue, each from clones that were present at high frequency. While at least one of these two clones also contained two changes in the binding site for the external antisense primer used by Farci et al., possibly explaining its absence from that analysis, amplification-based misincorporations could also explain this minor discrepancy. If Farci et al. used a polymerase with a similar error rate, the 104 sequences of 31 amino acids in the HVR1 region they reported would be predicted to contain 13 misidentified residues. The HVR1 sequences of 89 clones were represented among the 18 5.2-kb clones we examined, whereas the 15 not represented differed from a more frequent clone at only one position while 1 differed at three positions. Presuming the latter clone was truly missed by our analysis of 18 clones, the 14 clones differing by one residue come remarkably close to the predicted 13. Therefore, it seems prudent to recommend awareness of this rate of error but also to recognize that the effect on estimated complexity is not large, reducing the estimated complexity by 5 to 10% in the data summarized here. Others have also suggested that the impact of misincorporation is reduced for diverse populations of sequences (6).
To address the issue of misincorporation methodologically, direct sequencing of PCR products obtained by limiting dilution has been recommended (5, 29). That approach is not presently feasible with large amplicons, because the PCR product is limiting and because, in poorly conserved regions, direct sequencing often fails. Additionally, direct sequencing of less conserved regions might be expected to introduce bias due to selective amplification. Another suggested approach, biological cloning (5), is not feasible for HCV because no efficient in vitro model is available. Instead, sequencing of cDNA clones is necessary. Others have shown that extremely low error rates can be achieved with thermostable polymerases under the appropriate conditions (8). Clonal frequency analysis can be performed as described here to permit selection of clones that are representative of the quasispecies. In addition, measures to reduce the effect of misincorporation, such as sequencing multiple variants or pooling selected clones prior to sequencing, should be considered.
Another potential source of artifactual sequence heterogeneity is PCR-mediated recombination. We took steps to limit the effect of this phenomenon by using prolonged polymerase extension times in each cycle (24) and by sequencing multiple separate cDNA clones from the same clonotype. Nevertheless, long-amplicon RT-PCR would be expected to provide an ideal environment for such events, and researchers must consider repeating experiments or testing unexpected results in other ways to avoid misinterpretation.
Amplicon size versus sensitivity and template resampling.While the sensitivity of RT and PCRs is inversely proportional to the length of the target amplicon, the order of this relationship (linear, exponential, etc.) is not clear. Advances, such as thermostable polymerases with 3′ to 5′ exonuclease activity, have increased the range of amplicon sizes. Therefore, the sensitivity we report here can be attributed, at least in part, to reagents that are now available. However, we still found that a reproducible amplification protocol required multiple customizations of the manufacturers' recommendations, particularly the RT step. The optimized technique is sensitive enough even for specimens with HCV RNA levels of 103 to 105 IU/ml (Table 2), as occurs frequently during acute infection and less frequently during untreated chronic viremia.
An important consideration in assessing the complexity of a quasispecies is template resampling, as described by Liu et al. (19). When HCV RNA concentration is near the limit of detection, this phenomenon can result in artifactual homogeneity of recovered amplicons. This effect is a likely explanation for the lower complexity obtained when 40 μl of serum was used with a viral titer of 4,170 IU/ml (Table 4). Extrapolation from methodologic dilution from the viral titer suggests that the result for 40 μl of input was generated from 70 IU of HCV RNA input to the RT reaction, from which one-fourth of the cDNA was used for first-round PCR. Estimates of complexity must therefore be viewed with caution if the limit of detection of the assay is not known for a particular specimen (19). On the other hand, it is not surprising that, even under circumstances of limiting template, identification of major clonotype(s) is reproducible.
We report highly complex HCV variants recovered by using long-amplicon RT-PCR primers located in highly conserved regions of the HCV genome. The high complexity of variants obtained during acute infection is consistent with some reports for which shorter amplicons were used, and detailed analysis did not support the contention that such variability is primarily due to artifactual misincorporation of nucleotides during amplification. Comprehensive and genotype-independent analysis of whole-genome amplicons could take advantage of the high conservation of the 5′ and 3′ termini of HCV, but that goal remains elusive.
ACKNOWLEDGMENTS
We are grateful to Robert Lanford for providing specimens from chimpanzees ×304 and ×361, to Robert Purcell for generously providing the 1977 plasma specimen from subject H, and most of all to subject H and other study subjects for their generous contributions.
National Institutes of Health grant R01-DK57998 supported this research.
FOOTNOTES
- Received 24 December 2003.
- Returned for modification 23 March 2004.
- Accepted 7 June 2004.
- Copyright © 2004 American Society for Microbiology