Previous Article | Next Article ![]()
Journal of Clinical Microbiology, July 2003, p. 3265-3272, Vol. 41, No. 7
0095-1137/03/$08.00+0 DOI: 10.1128/JCM.41.7.3265-3272.2003
Copyright © 2003, American Society for Microbiology. All Rights Reserved.
Department of Immunology/Microbiology, Rush Medical College, Chicago, Illinois,1 Department of Pathology, Johns Hopkins Medical Institutions, Baltimore, Maryland,2 New England Research Institute, Inc., Watertown, Massachusetts,3 Pediatrics, University of Medicine and Dentistry of New Jersey, New Jersey Medical School, Newark, New Jersey4
Received 26 December 2002/ Returned for modification 20 February 2003/ Accepted 23 March 2003
|
|
|---|
|
|
|---|
The need to edit sequence data can be influenced by several factors. Editing is typically required when there is some discrepancy between the data obtained from individual sequencing primers covering the same sequence region. This process is complicated by the fact that HIV drug resistance mutations are often present in clinical samples as mixtures (e.g., mutant plus wild type), reflecting either emergence or fading of the resistant viral variants in a genetically heterogeneous viral population. Visual inspection of sequence data from bidirectional primers is required to confirm the presence of nucleotide mixtures in clinical samples. Different laboratories use different criteria to confirm the presence of mixtures. A variety of technical problems can also produce peaks in sequencing electropherograms that suggest the presence of nucleotide mixtures. These artifacts can usually be identified by visual inspection of the sequence data from individual primers. Once identified, such artifactual mixtures can be removed from a consensus sequence before a resistance report is generated. In some instances, the generation of suboptimal sequence data is related to inherent characteristics of the HIV template or assay performance. For instance, poor sample preparation, inefficient reverse transcription or PCR amplification, nonspecific binding of a sequencing primer to an alternate region of the sequencing template, inadequate purification of sequencing products prior to electrophoresis, and problems in gel preparation, sample loading, or electrophoresis of sequencing gels can all contribute to poor-quality data (from Comparative PCR Sequencing: a Guide to Sequencing-Based Mutation Detection, 1995; Perkin-Elmer Corporation, Applied Biosystems, Foster City, Calif.). When the quality of data is poor, nucleotide mixtures may be introduced into a sequence, requiring the sequence editor to decide whether the observed mixtures are artifactual or real. The number of bases edited in a given sequence can also be influenced by the amount of experience a person has in evaluating electropherograms, as well as the strategy chosen to assemble and evaluate sequence data from individual primers to generate a consensus sequence. It is important that the data generated from the laboratories performing these assays accurately reflect the presence of true mixtures, especially for base positions linked to antiretroviral drug resistance.
In a previous study, two panels, each containing individual plasma samples from three HIV-1-infected persons, were sent to 10 laboratories participating in the Pediatric AIDS Clinical Trials Group Sequencing Working Group. The laboratories used a research-use-only genotyping system to genotype the samples. The consensus sequences generated for each plasma sample by the individual laboratories showed a very high concordance to a group consensus sequence, ranging from 98.0 to 100.0% for protease (PR) (297 bases) and 97.3 to 100.0% for reverse transcriptase (RT) (960 bases). However, the laboratories varied widely in the percentages of codons edited: 2.0 to 87.8% of PR codons and 4.7 to 63.6% of RT codons were edited (D. Huang, J. Bremer, D. Brambilla, S. Eshleman, R. Nutter, S. Hart, M. Wantman, and P. Palumbo, Abstr. 7th Conf. Retrovir. Opportunistic Infect., abstr. 792, 2000).
In this study, the influence of the editing process was assessed independently by requesting laboratories to produce consensus sequences directly from sequence data sets provided to them from the Virology Quality Assurance (VQA) Laboratory (Chicago, Ill.). The amount of editing used to form a consensus sequence for each sequence data set, the concordance of the consensus sequences among the laboratories, and successful identification of resistance mutations were examined.
|
|
|---|
|
View larger version (7K): [in a new window] |
FIG. 1. Diagram of the positions of sequencing primers in the HGS. The positions of selected nucleotides in PR and RT are shown. The blunt end of the arrow is positioned at the approximate nucleotide start site of the primer.
|
Using these guidelines, as well as in-house guidelines, each laboratory aligned and edited the sequence data to generate a single consensus sequence for each sample. The HGS software documents the editing of individual nucleotides in the consensus sequence by using a lowercase letter. Unedited bases in the consensus sequence remain in uppercase letters. Positions of mixed nucleotides are indicated with the appropriate International Union of Biochemistry (IUB) codes. Both the lowercase and the IUB designations for mixed bases are preserved in the FASTA (a standardized text format for nucleic acid sequences) text file when saved. Each laboratory submitted the following files to a central laboratory for analysis: (i) the edited "project" file, showing the alignment and editing of individual sequences; (ii) the consensus sequence for each sequence data set, saved in FASTA format; and (iii) a list of the mutations (variations from the reference sequence) identified by the software after editing.
Analysis of data from 10 testing laboratories. The 10 testing laboratories submitted their data to Frontier Science and Technology Research Foundation, Buffalo, N.Y., where the data were collated for analysis. The combined data set was then sent to the VQA Statistical Center at New England Research Institute, Watertown, Mass., for further analyses. These analyses included (i) identification of codons that differed among the 10 testing laboratories (discrepant codons) and (ii) determination of the percentage of codons edited by each laboratory. The consensus sequences submitted for each sequence data set were aligned by using Align Plus (Scientific Educational Software, Durham, N.C.) to form a group consensus sequence. The overall concordance of each laboratory's consensus sequence with the group consensus sequence (homology) was then determined. Mutations associated with antiretroviral drug resistance were identified by using the HGS software. Mutations identified by the 10 testing laboratories were tabulated.
Analysis of data from a questionnaire. After completing the analysis of the sequence data sets, participants from each of the 10 testing laboratories completed a questionnaire about their editing strategy. Each person who analyzed the sequence data sets completed the questionnaire. Answers to the questionnaire were tabulated.
|
|
|---|
|
View this table: [in a new window] |
TABLE 1. Regions of sequence ambiguity in sequence data sets
|
|
View this table: [in a new window] |
TABLE 2. Percent PR gene homology among laboratoriesa
|
|
View this table: [in a new window] |
TABLE 3. Percent RT gene homology among laboratoriesa
|
![]() View larger version (46K): [in a new window] |
FIG. 2. Frequency of editing of PR (99 codons) (A) and RT (320 codons) (B) sequences. The percentages of the edited codons for the sequence data sets were compared. Codons that contained any bases designated by lowercase letters, indicating editing, were tabulated for each of the 10 laboratories. The percentages of edited codons for the samples were determined for each laboratory.
|
|
View this table: [in a new window] |
TABLE 4. Identification of antiretroviral drug resistance mutations
|
![]() View larger version (75K): [in a new window] |
FIG. 3. Examples of sequence ambiguity for which discordant results were obtained after editing in the test laboratories. (A to D) Examples of sequence ambiguity requiring different editing strategies (see the text). The data on the left in each panel show the unedited sequence interpretation (preediting), including any nucleotide mixtures identified by the HGS software. The data on the right in each panel show an example of the same sequence file after editing. Note that different editing strategies were used by different laboratories; a single example is shown for each panel. The positions of PR and RT codons are indicated at the top of each panel (e.g., codons 96 to 99 for PR in panel A). Immediately below the codon positions are the amino acid interpretations for the positions in the consensus sequence produced. Two nucleotide sequences are shown below the codon sequences above each set of electropherograms. The upper nucleotide sequence is a reference sequence provided in the software for comparison. The lower nucleotide sequence is the consensus sequence derived from analysis of data from individual electropherograms (before or after editing). Unedited bases are shown in uppercase letters. Edited bases are shown in lowercase letters. Nucleotide mixtures are indicated with standard IUB codes (e.g., C + T = Y). The orientation of each electropherogram is indicated. Sequences that were trimmed by either the software or the user during editing are shaded; trimmed sequences are not used in the base-calling process. Ambiguous nucleotide positions are underlined in the consensus sequences. In panels A, C, and D, arrows indicate positions in the electropherograms that were interpreted differently by the testing laboratories (discordant results). In panel B, arrows indicate the positions of bases inserted and deleted during editing. The nucleotides reported by the testing laboratories at the ambiguous positions are shown below each set of electropherograms. The number of laboratories that provided each interpretation is indicated. Interpretations of the edited sequences are described in the text.
|
(ii) Panel B: sequence data set 03rg01, RT codons 244 to 247. The unedited consensus sequence at RT codons 244 to 247 was ATM RKY TGC-CA. This sequence encodes isoleucine (I) at codon 244; a mixture of serine (S), isoleucine (I), glycine (G), and valine (V) at position 245; cysteine (C) at position 246; and a frameshift at position 247 (single base deletion). Eight of the 10 laboratories edited the sequence to ATc agt ctg cCA, which encodes the amino acid sequence isoleucine (I), serine (S), leucine (L), and proline (P) at codons 244 to 247 and corrects the reading frame. The postediting panel indicates the position of a c, shown in the consensus line, that was added in editing and the position of a G, shown in the reference line, that was deleted in editing. The other two laboratories edited the sequence incorrectly. One laboratory edited codons 244 to 247 to ATa gtt TGC cCA. This sequence encodes isoleucine (I) at codon 244, valine (V) at position 245, cysteine (C) at position 246, and proline (P) at position 247. For this sequence, the editor did not include the inserted base, c, in codon 244 but placed an extra c, not present in the electropherogram, in codon 247, which corrected the frameshift. The other laboratory edited the codons to ATc agt TGC cCA. This sequence encodes isoleucine (I), serine (S), cysteine (C), and proline (P) at codons 244 to 247. This laboratory correctly inserted a c in codon 244 but also placed an extra base, c, in codon 247.
(iii) Panel C: sequence data set 03rg02, PR codon 10. The unedited consensus sequence at PR codons 9 and 10 was CYY HTC (Y = C + T; H = A + C + T). This sequence encodes a mixture of proline (P) and leucine (L) at codon 9 and a mixture of isoleucine (I), leucine (L), and phenylalanine (F) at codon 10. All 10 laboratories edited codon 9 from CYY to Ccc, which encodes proline (P). Six of the 10 laboratories edited codon 10 from HTC to cTC, which encodes the wild-type amino acid leucine (L). This strategy was consistent with the editing guidelines (see Materials and Methods), which require that mixed bases be observed in both forward and reverse directions. Three laboratories first modified the sequence by trimming in that region, generating the consensus sequence CTC. This alternative editing strategy produced the same result as the strategy used by the other six laboratories mentioned above. The last laboratory edited codon 10 from HTC to yTC, which encodes a mixture of the wild-type amino acid leucine (L) and the mutant amino acid phenylalnine (F). The mutation L10F is associated with lopinavir resistance (10).
(iv) Panel D: sequence data set 03rg04, PR codon 71. The unedited consensus sequence at PR codons 71 and 72 was GBT MTA (B = C + G + T; M = A + C). This sequence encodes a mixture of alanine (A), glycine (G), and valine (V) at codon 71 and a mixture of isoleucine (I) and leucine (L) at codon 72. All 10 laboratories edited MTA at codon 72 to aTA, which encodes the wild-type amino acid isoleucine (I). Four laboratories edited GBT at codon 71 to GyT, which encodes a mixture of the wild-type amino acid alanine (A) and the mutant amino acid valine (V). The mutation A71V is associated with resistance to lopinavir, nelfinavir, and indinavir (10). Three laboratories trimmed the sequence, generating the same result (GYT = alanine + valine) at codon 71. Two laboratories edited GBT to GcT, which encodes alanine only, and one laboratory did not edit the codon, leaving the sequence GBT, which encodes the A71V mutation as well as the A71G mutation. The A71G mutation has not been reported to be associated with resistance.
Some of the variability in the editing and interpretation of this sequence data set resulted because the laboratories differed in which primer sequences were used to generate the consensus sequence prior to editing individual base positions. In this genotyping system (Fig. 1), the two forward primers, A and D, serve as alternate primers for the PR gene. Either primer may be used in sequence interpretation. The editor has the option to use only one of these two primer sequences for interpretation. This choice may simplify the editing of individual bases if one of the sequences is of lesser quality. In editing this sequence data set, three laboratories used both primer A and primer D sequences, four laboratories used only the primer A sequence, and three laboratories deleted both primer A and primer D sequences, leaving only the reverse primer F sequence for interpretation in this region.
When both primer A and primer D sequences were removed, the unedited consensus sequence generated for this codon was GYT (rather than GBT, as shown in Fig. 3D); none of the three laboratories edited this codon. The codon GYT encodes a mixture of the wild-type amino acid alanine and the mutant amino acid valine. The mutation A71V is associated with resistance to lopinavir, ritonavir, nelfinavir, and indinavir. The decision not to edit the GYT codon by these three laboratories was not consistent with the editing guidelines (see Materials and Methods), which state that a nucleotide mixture (e.g., Y) can be confirmed only if it is present in both forward and reverse sequences. Since only the reverse primer F sequence was used to form the consensus sequence, this sequence should have been edited to the wild-type codon GcT, which encodes alanine.
Analysis of questionnaire on editing strategies. A questionnaire on editing practices was distributed to the 10 participating laboratories (Fig. 4). Analysis of the questionnaires revealed variability in editing strategies among the laboratories. The majority of the respondents indicated that they prescreened sequences prior to editing and trimmed sequences further after editing had begun if this was deemed necessary. In general, the quality of data from an individual primer was considered to be more important than the direction of the sequence (question 4). The majority of the respondents also would not edit a single base to a mixture unless the mixture was clearly present in both directions (question 5a). These strategies seemed to be applied by most of the respondents in most of those instances (e.g., in the examples shown in Fig. 3). Questions 6a and 6b describe examples similar to that shown in Fig. 3D. From their answers, 7 (58%) of 12 respondents would have changed the mixed base B or Y to c in codon 71, but only 3 (25%) of 12 would have made this change all of the time. However, in practice, only 2 (20%) of 10 respondents changed the mixed base to a pure base. In question 6b, the first and third situations are applicable to the example shown in Fig. 3D. The majority of laboratories indicated that they would have changed codon 71 to GcT; however, in practice, most did not.
![]() View larger version (53K): [in a new window] |
FIG. 4. Questions regarding evaluation of mixtures and tabulated responses. A questionnaire regarding editing strategy was sent with the sequence data sets. Every person (n = 12) who submitted edited data was asked to complete a questionnaire postediting. Only questions relevant to the examples shown in Fig. 3 are shown; the numbers of responses are summarized.
|
|
|
|---|
We examined the editing step performed in a research-use-only HIV-1 genotyping assay previously produced by Applied Biosystems, the HGS. This system provided software for analysis and editing of sequence data. By providing the laboratories with identical electronic sequence data sets, we were able to exclude any variations in interpretation due to technical issues associated with other steps in the assay (e.g., HIV-1 RNA isolation, reverse transcription, PCR amplification, cycle sequencing, or electrophoresis of sequencing products). We found a high level of concordance among the laboratories with regard to their final genotyping interpretations of sequence data sets from six selected samples. Interestingly, we found significant variability in the strategies used by the laboratories for editing sequencing data and in the percentages of codons edited. In some cases, differences in sequence editing influenced the identification of mutations associated with antiretroviral drug resistance. This finding may have implications for the clinical application of genotypic resistance data in the management of HIV-1-infected patients.
It was interesting to observe that relatively little editing was performed for the sequence data set generated from the plasmid-derived sample compared to the data sets generated from clinical plasma samples. The mixtures in the plasmid-derived sample were presumably all artifactual, since the template used for sequencing was clonal. In some cases, laboratories prepare quality-controlled reagents for analysis of mixtures by mixing defined proportions of homogeneous templates containing genetically engineered mutations at specific sites (7, 12). While this approach may provide some useful information, our data suggest that simple plasmid-derived mixtures may not be complex enough to thoroughly test proficiency in the performance of HIV-1 genotyping assays.
The variability among laboratories in the perception of what constitutes an "acceptable" sequence (prior to editing) was unexpected. Our data suggest that individuals within and between laboratories may have different perceptions of data quality. In order to evaluate a laboratory's genotyping performance, it will be necessary to define what decision-making factors are used in sequence analysis and the extent to which they need to be or can be monitored. It may be difficult to establish methods that account for an individual's perception of data quality.
In addition, evaluation of a questionnaire about editing was enlightening. The individuals performing the assay clearly knew the guidelines for editing that were presented during their training and apparently used these guidelines most of the time. However, among the 12 respondents from 10 laboratories, 9 developed their own set of guidelines for editing. The questionnaire also indicated that the guidelines developed for editing by the manufacturer were used inconsistently. The overall concordance of the sequence data indicates that the application of multiple sets of in-house editing guidelines, overlaid on those learned during training, was not generally detrimental to the consistency and quality of the data.
Editing decisions are inherently made during the use of all commercially available genotyping systems. The genotyping system described here has been modified in the Celera Diagnostics ViroSeq HIV-1 Genotyping System, recently granted 510K approval by the U.S. Food and Drug Administration for clinical use. Editing still must be performed despite the incorporation of a more simplified editing process and improved algorithms for base calling. Generally, the discrepancies noted in data interpretation indicate the need to form more specific editing guidelines that can be applied consistently to sequence-based genotyping assays currently in use. Future studies are needed to evaluate the impact of editing on each available sequence-based HIV genotyping system.
Our study suggests that global guidelines should be designed to help make editing practices consistent for all laboratories, regardless of the assay used. Editing strategies and evaluation of data quality are an integral part of any commercial or in-house sequence-based genotyping assay (5, 11, 12). The technical performance of genotyping assays, as well as inherent sample variability, is also likely to influence the amount of editing required for data interpretation. Editing procedures and practices should be evaluated as part of any proficiency testing, quality control, or quality assurance program for HIV-1 genotyping.
We thank Applied Biosystems for providing the HGS with software version 2.1.
Members of the Pediatric AIDS Clinical Trials Group Sequencing Working Group include Grace Aldrovandi, University of Alabama at Birmingham; Donald J. Brambilla, New England Research Institute, Inc.; Clark Brown, Applied Biosystems; Susan H. Eshleman, Johns Hopkins Medical Institutions; Susan Fiscus, University of North Carolina; Lisa Frenkel, University of Washington; Hasnah Hamdan, Nichols Institute; Stephen Hart, Frontier Science and Technology Research Foundation; Diana D. Huang, Rush Medical College; Andrea Kovacs, University of Southern California; Paul Krogstad, University of California at Los Angeles; Phillip LaRussa, Columbia University; Paul E. Palumbo, University of Medicine and Dentristry of New Jersey; Walter Scott, University of Miami; Stephen Spector, University of California at San Diego; John Sullivan, University of Massachusetts; Adriana Weinberg, University of Colorado Health Sciences Center; and Yu Qi Zhao, Northwestern University.
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»