Previous Article | Next Article ![]()
Journal of Clinical Microbiology, August 2002, p. 2973-2980, Vol. 40, No. 8
0095-1137/02/$04.00+0 DOI: 10.1128/JCM.40.8.2973-2980.2002
Copyright © 2002, American Society for Microbiology. All Rights Reserved.
Department of Virology, Regional Public Health Laboratory, Groningen,1 Department of Virology, University Medical Center Utrecht, Utrecht, The Netherlands,2 Department of Molecular Biology, Laboratory Dr. Schiwara and Partners, Bremen, Germany,3 Department of Virology, Manchester Royal Infirmary, Manchester,4 The Public Health Laboratory, Leeds,5 Chimerabio Ltd., Dundee, United Kingdom6
Received 4 September 2001/ Returned for modification 24 November 2001/ Accepted 23 May 2002
|
|
|---|
5,000 copies/ml were reported positive in only 71% and 77% of the cases with panel 1 and panel 2, respectively. Adequate or better scores on qualitative results (all results correct or only the low-positive samples missed) were obtained in 84% (panel 1) and 80% (panel 2) of the data sets. In the analysis of quantitative results, 60% (panel 1) and 73% (panel 2) of the data sets obtained an adequate or better score (
80% of the positive results within the range of the geometric mean ± 0.5 log10). Our results indicate that considerable improvements in molecular detection and quantitation of HCV have been achieved, particularly through the use of commercial assays. However, the lowest detection levels of many assays are still too high, and further standardization is still needed. Finally, this study underlines the importance of proficiency panels for monitoring the quality of diagnostic laboratories. |
|
|---|
Obviously, laboratories performing HCV RNA tests should report accurate and reliable results regardless of the type of assay used. One of the best ways to assess the performance of individual laboratories is to distribute proficiency panels and to evaluate all the test results. Early proficiency studies on the detection of HCV RNA showed high percentages of laboratories with specificity and sensitivity problems (1, 19). Similar problems have been reported for the molecular detection of hepatitis B virus (HBV) (9) and Mycobacterium tuberculosis (8).
An external quality assessment program for the evaluation of currently employed nucleic acid amplification methods was established by the members of the European Union (EU) Quality Control Concerted Action of Nucleic Acid Amplification in Diagnostic Virology (QCCA). Between 1997 and 2000, proficiency panels were distributed for the detection of enterovirus RNA (16), herpes simplex virus DNA (L. Schloss, P. Cinque, G. M. Cleator, J.-E. Echevarria, K. I. Falk, P. E. Klapper, J. Schirm, B. F. Vestergaard, H. G. M. Niesters, T. Popow-Kraupp, W. G. V. Quint, A. M. van Loon, and A. Linde, unpublished data), cytomegalovirus DNA, Chlamydia trachomatis DNA (R. P. Verkooyen, G. T. Noordhoek, P. E. Klapper, J. Reid, J. Schirm, G. M. Cleator, and G. Hoddevik, unpublished data), human immunodeficiency virus RNA (A. M. van Loon, J. Schirm, E. Valentine-Thon, J. Reid, P. E. Klapper, and G. M. Cleator, unpublished data), HBV DNA (15), and HCV RNA.
The present report describes the qualitative and the quantitative results obtained with the two QCCA HCV RNA panels distributed in 1999 and 2000. The results demonstrate that the quality of HCV RNA detection has clearly improved, particularly through the use of commercial assays. However, comparison of the quantitative results was hampered by the lack of standardization between different test types. Moreover, the lower detection limits of some of the quantitative assays used were too high for optimal monitoring of HCV-infected patients.
|
|
|---|
Composition. Each panel consisted of eight coded samples. Six samples contained HCV RNA with approximate target levels of 2 x 102 to 5 x 105 copies/ml. Two samples contained no virus and served as negative controls. To evaluate interassay reproducibility, four samples were included in both panels: 2 x 102 copies/ml (subtype 1), 5 x 103 copies/ml (subtype 1), 5 x 104 copies/ml (subtype 3), and 5 x 105 copies/ml (subtype 1). To assess a possible effect of HCV subtype, each panel contained pairs of samples with identical viral loads but different subtypes.
Distribution. All panels were distributed on dry ice by courier service from a central facility in Paris, France. Instructions for storage at -20°C or below and processing of the samples were enclosed, and a questionnaire was added in order to obtain technical information on the procedures employed by individual participants. The participating laboratories were asked to report receipt of the panel immediately by fax and to return the results as soon as possible but within 7 weeks to the Neutral Office, University of Manchester, Manchester, United Kingdom. If the panel did not arrive in good condition, a second shipment was made. A code number, known only to the Neutral Office, identified each laboratory. Laboratories participating in both proficiency studies were assigned the same code for both panels. Immediately after the closing dates, each participating laboratory was sent the code with target HCV RNA levels for individual performance assessment.
Analysis. All results were analyzed anonymously at the Department of Virology, Regional Public Health Laboratory, Groningen, The Netherlands. The overall evaluation for each panel was sent to the participants within a few months.
Analysis of qualitative results.
The results from the quantitative data sets were converted to qualitative data (i.e., positive/negative) and considered together with the truly qualitative data sets. To assess the performances of individual participants, the following scoring system was applied. One point was given for each correct result. In addition, one point was deducted for each false-positive or false-negative result, with the exception of results on the relatively weak positive samples (target levels of
5 x 103 copies/ml). Thus, the maximum possible qualitative score was 8 points. Scores of 7 and 6 points were considered adequate and mediocre, respectively, while
5 points was considered poor.
Analysis of quantitative results. Although the HCV RNA target levels of the samples in each panel were expressed in copies per milliliter, some participants expressed their quantitative results in either genome equivalents per milliliter (only laboratories with the Quantiplex bDNA version 2.0 test) or international units per milliliter. According to a statement from the manufacturer of the bDNA test, 1 genome equivalent/ml equals 1 copy/ml (D. Hendricks, Bayer, 1999, personal communication). The four laboratories expressing their results in international units per milliliter (in the 2000 panel 2 only) all used a new version of the Cobas/Amplicor HCV Monitor version 2.0 assay. This system has different conversion factors from international units per milliliter to copies per milliliter for different lot numbers, in practice ranging between 0.6 and 3.8 (11). Unfortunately, the individual conversion factors were very difficult to obtain. For these reasons, we decided, for the sake of simplicity, to consider all quantitative data in this study to be expressed in copies per milliliter. Consequently, all calculations below are based on copies per milliliter.
For evaluation of the quantitative data sets, the copies per milliliter results were first converted to log10 values, and then the overall geometric mean (GM) in log10 copies/ml and the standard deviation (SD) were calculated for each (positive) sample from all reported quantitative positive results. To assess the performances of individual participants, we calculated what percentage of the reported positive results of each data set was within the acceptable range of GM ± 0.5 log10. This range was chosen because viral load differences of <0.5 log10 are usually not considered clinically relevant. In addition, the SDs calculated for each sample in the present study were on average 0.45 log10 (see below), which is very close to the chosen acceptable range. When at least 80% of the positive results reported by one quantitative data set were within the acceptable range, the quantitative performance was qualified as good (100%) or adequate (80 to 100%). Data sets with 60 to 80% acceptable quantitative results were considered mediocre, and <60% was qualified as poor.
|
|
|---|
|
View this table: [in a new window] |
TABLE 1. Methods used to detect HCV RNAa
|
Analysis of qualitative results. (i) Panel 1. A total of 80 qualitative data sets were available for analysis, 45 truly qualitative data sets and 35 derived from quantitative data sets. One of the two negative samples was reported positive in two data sets (Table 2), both produced with the quantitative Bayer bDNA test version 2.0 (viral loads of 238,000 and 257,000 copies/ml). The other negative sample was reported negative in all data sets. Consequently, 2 of 160 (1.3%) of all tests performed on negative samples were false-positive (0% of the qualitative tests and 2.9% of the quantitative tests). The four high-positive samples (1, 2, 4, and 8) were correctly reported positive in 97% of all data sets. The sample with a target level of 5,000 copies/ml (sample 7) was missed in 11 data sets obtained by six of the eight laboratories with the quantitative bDNA method, one of three quantitative in-house PCR assays, and 4 of 14 noncommercial qualitative methods. The weakly positive sample (target level of 200 copies/ml) was correctly reported positive in 45 of 80 data sets. The 35 data sets with negative results on this sample were obtained with qualitative Roche assays (3 of 28), other qualitative methods (9 of 17), and most of the quantitative test systems: 14 of 23 Roche Monitor assays, 7 of 8 bDNA assays, and 2 of 4 quantitative in-house PCR assays.
|
View this table: [in a new window] |
TABLE 2. Overall qualitative results
|
5 points. |
View this table: [in a new window] |
TABLE 3. Performance scores for qualitative results
|
The three samples containing target levels of 5,000 IU/ml (2, 5, and 8) were correctly identified by 94.7%, 94.7%, and 92.0%, respectively, of the qualitative data sets and by 72.9%, 77.1%, and 72.9%, respectively, of the quantitative data sets. Most of the negative results reported for these three positive samples were obtained by the quantitative bDNA assay (32 of 33), qualitative in-house nested PCRs (12 of 39), and the quantitative Roche Monitor assays (4 of 102). Comparison of samples 2, 5, and 8 shows that there was little difference between the qualitative results for HCV genotypes 1 and 4. The weak-positive sample (target level of 200 copies/ml) was correctly reported positive in 59 of 123 data sets. The 64 data sets with negative results on this sample were obtained with qualitative Roche assays (5 of 50), other qualitative methods (16 of 23), and most of the quantitative test systems: 30 of 34 Roche Monitor assays and all bDNA assays (11 of 11) and quantitative in-house PCR assays (2 of 2).
Table 3 shows that a total of 52 data sets (42.2%) obtained the maximum qualitative performance score of 8 points. These data sets included 49 of 75 (65%) of the data sets obtained with qualitative methods and 3 of 48 (6%) of the qualitative data sets derived from quantitative results. An additional 46 (37.3%) data sets had a score of 7 points, 8 data sets (6.5%) had a score of 6 points, and 17 data sets (13.8%) had a score of
5 points.
HCV genotyping. Although HCV genotyping was not requested in our proficiency study, three and four laboratories performed HCV genotyping in 1999 and 2000, respectively. Altogether, 21 typing results were produced for the five samples containing HCV genotype 1. All results indicated the presence of genotype 1. Similarly, the seven typing results performed on the samples containing HCV genotype 3 were also correct. However, the genotyping of the samples containing HCV genotypes 2 and 4 was not entirely correct. For both samples, one laboratory incorrectly identified the virus as HCV genotype 1. This laboratory used an in-house multiplex RT-PCR method.
Analysis of quantitative results. (i) Panel 1. Quantitative HCV data were reported in 35 data sets, mostly (91%) obtained with commercial kits. The viral loads reported for the positive samples are summarized in Fig. 1 and Table 4. For each sample, the overall GM (log10) and SD were calculated from the positive results obtained with all assays. The GMs for the different samples were all somewhat higher (0.17 to 0.65 log10) than the target levels, especially for two of the samples with 50,000 copies/ml (2 and 4, both not used in panel 2). Table 4 shows that the percentage of positive results within the accepted range of GM ± 0.5 log10 varied from 63% to 97%. When, in addition, the GMs were calculated for different methods separately (23 Roche data sets, 8 Bayer bDNA data sets, and the remaining 4 data sets taken together), most of the GMs were quite similar (± <0.5 log10). However, for the strongest positive sample (number 1), relatively low viral loads were obtained with the Roche methods. In addition, the Bayer bDNA method gave consistently higher results (+0.62 to 1.63 log10) with the relatively weak positive sample 7 (genotype 1) and with HCV genotype 3 (sample 8) (data not shown). Figure 1 also shows that the coefficients of variation (CV) for the different samples varied from 4.9% for one of the high-positive samples to 26.3% for the lowest positive sample. For all samples tested, the CVs were smaller with the bDNA test (1.9 to 2.9%) than with the Roche assays (4.6 to 8.2%).
![]() View larger version (13K): [in a new window] |
FIG. 1. GM (log10), SD, and CV with various amplification methods for HCV RNA, panel 1.
|
|
View this table: [in a new window] |
TABLE 4. Summary of quantitative resultsa
|
80%. These data sets included 83% of the data sets obtained with Roche assays, 25% of the data sets obtained with bDNA tests, and none of the data sets obtained with other methods. Consequently, 17% of the laboratories using Roche assays and the vast majority of all the other laboratories need to improve their performance of quantitative HCV RNA testing. |
View this table: [in a new window] |
TABLE 5. Quantitative performances of HCV RNA data sets
|
|
View this table: [in a new window] |
TABLE 6. Interpanel reproducibility
|
80%. These data sets comprised 91% of the data sets obtained with Roche assays, including three of four data sets reporting in international units per milliliter, 36% of the data sets obtained with bDNA tests, and none of the data sets obtained with other methods. Consequently, 9% of the laboratories using Roche assays and the vast majority of all the other laboratories need to improve their performance of quantitative HCV RNA testing. Reproducibility. Intrapanel reproducibility could be evaluated by comparison of the results for samples 5 and 8 of panel 2, both containing target levels of 5,000 copies of HCV genotype 1 per ml. Table 2 shows that the percentages of qualitative correct results for samples 5 and 8 were 95% and 92%, respectively, for the true qualitative methods and 77% and 73%, respectively, for the quantitative assays. Table 4 shows that the GMs of the quantitative results obtained for samples 5 and 8 were 3.93 ± 0.50 log10 and 3.78 ± 0.40 log10, with 84% and 89% of the data in the range GM ± 0.50 log10, respectively.
Interpanel reproducibility could be evaluated from the results obtained with four positive samples included in both panels (Table 6). The results obtained with these samples on both occasions were similar, although the weak-positive sample (target level of 200 copies/ml) was detected by fewer laboratories in the panel 1 group (10.4%) than in the panel 2 group (34.3%) (P = 0.008). The percentages of quantitative results within the range of ± 0.5 log10 of the GM were slightly higher in panel 2 for all four samples.
Comparison of laboratory performances on panel 1 and panel 2. Sixty laboratories submitted either qualitative data sets (n = 33) or quantitative data sets (n = 27) for both panels. Their scores on both panels were compared. Poor qualitative scores were obtained by 5 of 33 (15.2%) and 0 of 33 (0%) of the qualitative data sets returned for panel 1 and panel 2, respectively (data not shown). Two laboratories, both changing from in-house methods with panel 1 to Roche methods with panel 2, improved their qualitative score enormously, from -1 to 8 and from 3 to 8. In contrast, the percentages of poor qualitative scores obtained with the quantitative data sets increased from 11% (3 of 27) with panel 1 to 25.9% (7 of 27) with panel 2. Finally, the percentages of participants with poor quantitative scores increased slightly from 25.9% (7 of 27) with panel 1 to 33.3% (9 of 27) with panel 2.
|
|
|---|
The same approach was used for an early proficiency study on the detection of HBV DNA, where 39 laboratories submitted 43 data sets on 12 plasma samples (seven positive and five negative). In that study, 15 (35%) data sets showed false-positive results, 16 (37%) showed false-negative results, and 12 (28%) showed correct results for all samples (9). Similar problems with sensitivity and specificity were reported some years ago for the detection of Mycobacterium tuberculosis DNA (8). All these studies clearly demonstrated large numbers of laboratories with sensitivity and specificity problems. In none of these studies was nucleic acid quantitated.
Compared with the studies mentioned above, the present study on HCV RNA shows a much better specificity: only 2.5% of the 80 data sets for panel 1 and 1.6% of the 123 data sets for panel 2 showed false-positive results. Similar low false-positive rates have recently been found in the EU QCCA proficiency studies on HBV DNA (15), human immunodeficiency virus RNA (A. M. van Loon, J. Schirm, E. Valentine-Thon, J. Reid, P. E. Klapper, and G. M. Cleator, unpublished data), enterovirus RNA (16), and Chlamydia trachomatis DNA (R. P. Verkooyen, G. T. Noordhoek, P. E. Klapper, J. Reid, J. Schirm, G. M. Cleator, and G. Hoddevik, unpublished data). In contrast, a recent EU QCCA proficiency study on the detection of herpes simplex virus DNA (L. Schloss, P. Cinque, G. M. Cleator, J.-E. Echevarria, K. I. Falk, P. E. Klapper, J. Schirm, B. F. Vestergaard, H. G. M. Niesters, T. Popow-Kraupp, W. G. V. Quint, A. M. van Loon, and A. Linde, unpublished data) still showed relatively large numbers of data sets with false-positive results (8% in 1999 and 18% in 2000). This may have been related to high viral loads in some of the test samples, which may have caused contamination in the laboratories of some participants, and the fact that all herpes simplex virus DNA testing was still performed by in-house methods. The low false-positivity rates in the other recent studies probably reflect the greater expertise of the participating laboratories in addressing the contamination issue compared to several years ago, coupled with the availability of commercial kits.
For the identification of HCV-infected patients, qualitative HCV RNA detection methods which are as sensitive as possible should be used. For quantitative HCV RNA detection, used for monitoring the efficacy of antiviral therapy, precision and reproducibility are considered most important. Nevertheless, there is also an increasing demand for more sensitive quantitative HCV RNA tests. Unfortunately, the rates of false-negative results are difficult to determine and to compare between studies because of the large variation in the detection limits of the various assays used and the different viral loads of the samples used in the studies. Of the 80 data sets for panel 1, no less than 46% showed negative results on the (low) positive samples (24% of the true qualitative data sets and 73% of the quantitative data sets). Of the 123 data sets for panel 2, the percentage of negative results on (low) positive samples was 58% (35% of the true qualitative data sets and 83% of the quantitative data sets).
This increase in the negativity rate on (low) positive samples and subsequent decrease in overall performance were most probably related to the lower average viral loads in panel 2 combined with the relatively high detection limits of some of the quantitative assays used. The bDNA version 2.0 assay, for example, has a lower detection limit of no less than 200,000 copies/ml, so that only the highest positive sample (target load of 500,000 copies/ml) should be positive. However, it is imaginable that in addition, due to run-to-run variation of the lower detection limit and the lack of standardization (see below), samples with target loads of 50,000 copies/ml can still be detected in the bDNA test. Indeed, 6 of 8 and 9 of 11 of the bDNA data sets on panel 1 and panel 2, respectively, were able to detect all the samples with target levels of 50,000 copies/ml. In addition, even in the samples with target levels of 5,000 copies/ml, HCV RNA was detected by 2 of 8 and 1 of 33 tests performed by users of the bDNA test. However, considering the high lower detection limit of the bDNA version 2.0 test, these three positive results might perhaps considered false positives.
In the present study, no penalty points were given when positive samples with
5,000 copies/ml were reported negative. Therefore, a result of <200,000 copies/ml for a sample containing 5,000 copies/ml was not penalized, although in our opinion a lower detection limit of 200,000 copies/ml will not meet all the critical requirements of a modern diagnostic laboratory. Fortunately, this opinion is now also shared by industry, since most of the recently developed commercial test systems, including the HCV RNA bDNA version 3.0 test, have lower detection limits of
1,000 copies/ml.
The primary purpose of proficiency testing is to determine whether a laboratory is capable of providing reliable results, not whether it is able to carry out a particular (commercial) test adequately. Although, in principle, the type of method used is less relevant, results of proficiency testing may also yield useful information on the performance of the particular methods used. However, it should be noted that there are some important restrictions on the interpretation of such comparisons. This is due to the composition of the panel, the relatively small number of samples, and the small number of laboratories using some of the assays. The interpretation of our quantitative data was inevitably biased because the vast majority of the data sets were obtained with only one test system, the Roche (Cobas)Amplicor Monitor assays. Nevertheless, we consider our quantitative scoring system the most useful and practical approach at the moment, since in practice the most widely used assay will more or less serve as the de facto standard. It does underline, however, the strong need for standardization of all quantitative HCV RNA detection methods, for instance, with the recently introduced World Health Organization standard (2, 11). Interestingly, this World Health Organization standard contains HCV genotype 1 only. It remains to be seen whether standard preparations for other HCV genotypes will also be necessary.
In the present study, the bDNA version 2.0 method gave consistently higher quantitative results than the (Cobas)Amplicor Monitor assays. This is in accordance with earlier data (14) and can be explained by the lack of standardization (11). It was also reported that whereas the Roche Monitor version 2.0 and the bDNA version 2.0 tests showed equal sensitivity for the most common HCV genotypes (7), HCV genotypes other than genotype 1 had been underestimated by earlier versions of the Roche tests (7, 14). This may explain why, in our panel 1, the largest difference between the bDNA test and the Roche tests was found for the sample containing HCV genotype 3.
Notwithstanding the lack of standardization, the intrapanel and interpanel reproducibility of our samples was excellent, except for the detection rate of the weak-positive sample, which was much higher on panel 1 than on panel 2 (Table 6). We cannot explain this difference. Our results also showed that the CV of the results obtained with the bDNA method were significantly smaller than those with the other amplification methods. This is in concordance with studies on human immunodeficiency virus RNA, indicating that signal amplification methods are less susceptible to variation than target amplification methods (13, 17; A. M. van Loon, J. Schirm, E. Valentine-Thon, J. Reid, P. E. Klapper, and G. M. Cleator, unpublished data).
Our study involved large and increasing numbers of participants (57 and 81 in panels 1 and 2, respectively) and data sets (80 and 123, respectively). The percentages of data sets with good or adequate qualitative scores decreased slightly from 84% in panel 1 to 80% in panel 2, which was probably due to the average lower viral loads in panel 2. In contrast, the percentages of data sets with good or adequate quantitative scores increased from 60% to 73%. This may be due to the more extensive use of later versions of commercial test systems. Laboratories using in-house methods often had poor results, and laboratories changing from in-house methods to commercial methods improved their performance.
In conclusion, our study indicates that considerable improvement of the molecular detection of HCV RNA has been achieved in recent years, particularly through the use of commercial assays. However, further standardization is still needed, and most laboratories should use more sensitive quantitative assays. Finally, the present study underlines that proficiency panels are important tools for monitoring the quality of diagnostic laboratory tests.
We thank all participating laboratories and reference laboratories (Roche Diagnostics and Bayer Diagnostics) for their contribution to the HCV RNA proficiency study. We thank Dirk. S. Luijt for helping with evaluation of the data and Kees van Slochteren for statistical analyses.
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»