Previous Article | Next Article ![]()
Journal of Clinical Microbiology, June 2003, p. 2358-2366, Vol. 41, No. 6
0095-1137/03/$08.00+0 DOI: 10.1128/JCM.41.6.2358-2366.2003
Copyright © 2003, American Society for Microbiology. All Rights Reserved.
United States Military Academy, West Point, New York,1 Armed Forces Research Institute of Medical Sciences, Bangkok,2 Vector Born Disease Control Office 1, Phrabuddhabat, Saraburi, Thailand,4 Naval Medical Research Center Detachment, Lima,3 Hospital de Apoyo, Iquitos, Peru,7 Walter Reed Army Medical Center, Washington, D.C.,5 Walter Reed Army Institute of Research, Silver Spring, Maryland,6 The Toronto General Hospital and University of Toronto, Toronto, Ontario, Canada,10 Brooke Army Medical Center, San Antonio,8 Wilford Hall Medical Center, Lackland Air Force Base, Texas9
Received 6 January 2003/ Returned for modification 13 February 2003/ Accepted 17 March 2003
|
|
|---|
500/µl. The specificity for the exclusion of P. falciparum was 93%. For P. vivax, the overall sensitivity was 87% for the final 1999 prototype. The sensitivities calculated for different levels of P. vivax parasitemia were 99% for parasite densities of >5,000/µl, 92% for parasite densities of 1,001 to 5,000/µl, 94% for parasite densities of 501 to 1,000/µl, and 55% for parasite densities of 1 to 500/µl. The specificity for the exclusion of P. vivax was 87%. The areas under the receiver operating characteristic curves for the diagnostic performance of the assay for the detection of P. falciparum and P. vivax were 0.8907 and 0.8522, respectively. These findings indicate that assays for rapid diagnosis have the potential to enhance diagnostic capabilities in those instances in which skilled microscopy is not readily available. |
|
|---|
The leading MRDD technology is the immunochromatographic strip (ICS) format, into which antigen-capture immunoassay methodologies are integrated. One of the earliest successful MRDDs, the ParaSight F test (Becton Dickinson Diagnostic Systems, Cockeysville, Md.) (32), used monoclonal-polyclonal antibodies directed against Plasmodium falciparum histidine-rich protein 2 (HRP-2) immobilized on a nitrocellulose strip. The ParaSight F test strip has been evaluated in a number of field studies and typically demonstrated clinically useful sensitivities and specificities (2, 4, 5, 7, 8, 10, 12, 15, 25; I. Traore, O. Koita, and O. Doumbo, Am. J. Trop. Med. Hyg., abstr. 502, p. 272, 1997). The initial capabilities of MRDDs were limited to the detection of P. falciparum, i.e., a single-species format. Second-generation MRDD prototypes have retained the ability to distinguish this most clinically significant cause of malaria, while they have added the capability to identify other Plasmodium species (16, 17, 28, 33, 34, 37). Such refinement offers an enhanced potential for clinical utility and commercial success.
The growing number of published field studies of MRDDs for the detection of multiple species display wide degrees of diversity in their designs and methodologies. These differences significantly hinder interstudy comparisons. Protocols are designed and executed with different objectives. The characteristics of the enrolled populations vary, often substantially. Many field studies fail to document rigorous standards for reference microscopy, a particularly disconcerting lapse, given the critical role of microscopy in determining true diagnostic outcomes. Study populations are generally limited in size, and studies with small populations lack the statistical power to discriminate significant differences in device performance. The latter consideration limits studies in their ability to correlate performance characteristics with defined levels of parasitemia. Furthermore, it is frequently unclear which prototype of a device has been evaluated in a study and under what controls the manufacturing process was regulated. The continual evolution of nonmicroscopic assays for the diagnosis of malaria has led to the availability of multiple test prototypes, often indistinguishable on the basis of packaging, labeling, or production date. Inconsistencies in the reported performances of devices bearing common trade names may be attributable, at least in part, to variances in manufacturing and modifications inherent in iterative product development. These issues of protocol design and control, variance in production quality, and prototype identity have major implications for interpretations of device performance.
This report presents data for three distinct ParaSight F+V prototype assays that detect both P. falciparum HRP-2 and a Plasmodium vivax-specific antigen in a single ICS assay. These prototypes were products of the manufacturer's internal research and development initiatives. The assays differed in systematic modifications to assay materials, reagent formulations, and device assembly. All three prototypes were manufactured under process-validated controls in a manner consistent with U.S. Food and Drug Administration (FDA) guidelines for the production of in vitro diagnostic devices. The study protocol was specifically designed to evaluate device performance with samples from large populations of symptomatic patients enrolled from geographically distinct sites and to use a rigorously defined and controlled standard for the performance of diagnostic microscopy.
|
|
|---|
38°C), (ii) history of fever over the past 72 h, or (ii) headaches. In 1998, patients providing a history of antimalarial drug therapy within the previous 2 weeks were excluded from the study. This exclusion was eliminated from the inclusion criteria for enrollment in 1999. Consenting participants donated 2 to 4 ml of venous blood (anticoagulated with EDTA) at the time of enrollment. Three slides per participant, in which each slide contained a thick smear and a thin smear, were prepared from the initial blood specimen. One slide was provided to the clinic personnel for their clinical diagnostic use. Clinic personnel, not study personnel, were responsible for staining and interpreting the results for the slide. The study's investigators retained the other two slides for staining as described previously (10). Device prototypes. Three different ParaSight F+V prototype devices (Becton Dickinson Diagnostic Systems) were evaluated over the course of the 2-year study. In the first year, a single prototype, referred to as FV98 in this report, was available for evaluation with 2,986 patient specimens (Peru, n = 836; Thailand, n = 2,150). Two additional prototypes were evaluated in the second year and are referred to as FV99-1 and FV99-2 in this report. The FV99-1 prototype was available at the outset of patient enrollment and data collection efforts in 1999 and was evaluated with 1,017 patient specimens (Peru, n = 351; Thailand, n = 666). The FV99-2 prototype became available later in the 1999 enrollment phase and was tested with a subsequent group of 870 patient specimens (Peru, n = 393; Thailand, n = 477). Assay procedures and interpretation of the results were performed in strict compliance with the instructions provided by the manufacturer. The ICS methodology used for the ParaSight F+V assay involved wicking of a whole-blood lysate onto a nitrocellulose strip and the resulting capture of malarial antigens by antibodies immobilized in the nitrocellulose matrix. Immobilized antibodies were directed against recombinant P. falciparum HRP-2- and P. vivax-specific antigens. The captured antigens were subsequently visualized by the addition of liposome-encapsulated dye-conjugated secondary ligands. The dye detection system produced a pink test line for bound P. falciparum antigen and a blue test line for bound P. vivax antigen. The test lines were located approximately 5 mm apart on the assay strip to help with visual recognition and to aid in the interpretation of positive results. Each test strip also contained two internal process-control dotted lines that appeared as positive confirmation of procedure and reagent viabilities. The ParaSight F+V test had five possible outcomes on the basis of the presence or absence of test and control lines: negative for Plasmodium, positive for P. falciparum, positive for P. vivax, positive for P. falciparum and P. vivax, or uninterpretable (i.e., the assay failed to produce a valid positive or negative outcome). Any detectable test line, no matter how faint, was read as a positive result. In addition to being read as a positive result, visible test lines were also compared to a series of photographic standards provided by the manufacturer to determine the relative intensity of the positive test line. Line intensity was scored as 0 (no visible line, a negative test result), 0.25 (the faintest visible line), 0.5, 1, 2, 3, or 4 (the darkest, most intense line color).
Reference microscopy. Technicians serving as study microscopists were prospectively certified as skilled through documentation of training, experience, and past performance (instituted in 1998) and were further required to successfully pass a prestudy microscopy competency examination. For the microscopists to pass this prestudy test, they were required to correctly identify the presence or absence of parasitemia and species for a minimum of 14 of 15 well-characterized Giemsa-stained blood smears (instituted in 1999). One of the two slides of Giemsa-stained blood smears prepared at the time of patient enrollment was independently examined by two study microscopists, as described previously (10). The final diagnostic end point for all positive smears was a calculated level of species-specific parasitemia, expressed as the number of asexual parasites per microliter of whole blood. Each microscopist reviewed 200 oil-immersion high-power fields of a smear before interpreting it as parasite negative. Microscope optics were standardized at both field sites by using microscopes of identical models (model CH-2; Olympus America, Inc., Melville, N.Y.), with identical objectives (x100 DPlan 100 1.25 oil 160/0.17), and with equivalent eyepieces (x10 WHK 10x/20L or WHK 10x/20L-H).
Quality assurance and data integrity. (i) Blinding of results. Results from each phase of the study were blinded throughout the data collection and analysis period. Separate teams were organized to conduct patient enrollment, laboratory (MRDD and leukocyte [WBC]) testing, diagnostic microscopy, data entry, and data analysis. In all cases, the results of the ParaSight F+V tests were determined prior to the completion of diagnostic microscopy, and the technicians examining the stained smears were strictly blinded to the rapid test results.
(ii) Microscopy.
In order to ensure a rigorous reference standard, our study team instituted minor changes in the control and validation of microscopy, based on observations from the first year of the study. In 1998, to yield a final microscopic interpretation, the two study microscopists needed to independently agree on three criteria: (i) on the presence or absence of asexual stages of Plasmodium; (ii) on the species of Plasmodium, when one was present; and (iii) on the calculated level of parasitemia within a factor of 2. The last criterion was refined by two modifications in 1999. The first change was that results were considered concordant if both study microscopists determined asexual parasitemia levels
100/µl, even if the differences between their counts were more than twofold. The second change required that even if counts differed by less than twofold, and so were otherwise concordant, they must also differ by
50,000/µl in absolute terms. The mean for the levels of parasitemia counted by the two microscopists was accepted as the true diagnostic outcome for all slides with concordant results. Results discordant by one or more of these criteria were discarded. Both study slides prepared from that sample were then examined by a third, senior microscopist in a blinded fashion. The cumulative results from both slides, as interpreted by the referee microscopist, were accepted as the true diagnostic outcome. In addition to discordant slide results, the referee microscopists also reviewed 5% of all concordantly interpreted slides for quality assurance purposes (instituted in 1999).
(iii) WBC determinations. The Thai study team determined WBC counts using a semiautomated cell counter (Coulter Ac-T 10; Beckman-Coulter, Inc., Fullerton, Calif.). A three-level commercial hematology control material (4C Plus; Beckman-Coulter, Inc.) was analyzed daily, and replicate testing was conducted to monitor the reproducibility between two identical analyzers. At the Iquitos, Peru, test station, a centrifugal hematology system (QBC II; Becton Dickinson Diagnostic Systems) was used to provide WBC counts. The Peruvian study team performed daily instrument calibration checks prior to specimen analysis to ascertain analyzer function within the parameters defined by the manufacturer. Manual WBC counts (UNOPETTE; Becton Dickinson Clinical Laboratory Solutions, Franklin Lakes, N.J.) were performed weekly at both study sites to maintain a backup method for WBC counts as well as to apply an external measure of analyzer function.
(iv) Assays with prototype MRDDs. Known positive and negative control materials were supplied by the device manufacturer and were used to verify assay performance on a daily basis. Positive control material consisted of a proprietary formulation known to react at both the P. falciparum and P. vivax test lines in a manner analogous to that for the targeted malarial antigens. The results for patient specimens were reported as negative, positive for P. falciparum, positive for P. vivax, positive for P. falciparum and P. vivax, or uninterpretable, according to the specific instructions provided in the test kits. The assay procedures and interpretation of the results were essentially identical for each of the prototypes. Tests with results reported as uninterpretable, i.e., failure to observe a control line or the presence of a darkly stained background that obscured the test lines, were repeated in an attempt to resolve the discrepant event. Repeatedly uninterpretable results were excluded from calculations of device performance. Investigators and technicians were trained and certified by the study's principal investigator prior to the initiation of testing of patient specimens and operated within a system of daily review and oversight. The ParaSight F+V prototype assays evaluated in this study were manufactured in a process compliant with present good manufacturing practices standards established by the FDA to govern the production of in vitro diagnostic devices.
(v) Protocol review and approval. The study protocol was reviewed by the Institutional Review Board, Walter Reed Army Institute of Research, and the Human Subjects Safety Review Board, U.S. Army Medical Research and Materiel Command, and were approved as Walter Reed Army Institute of Research Protocol 687 (version 2.21, 1998; version 2.3, 1999). The study protocol was approved annually by the Ministry of Health (MINSA) in Iquitos, Peru, and was performed under the direction of the Direccion de Salud de Loreto. Similarly, the study protocol was approved by the Thai Ministry of Public Health for each year of implementation and operated with oversight from Vector-Borne Disease Control Office No. 1, Phrabuddhabat, Thailand.
Data analysis. The "true" diagnostic outcome was defined as the final microscopic interpretation of peripheral blood smears, with microscopy performed as described above. These results were the basis for calculating device sensitivity and specificity. Standard definitions of sensitivity, which was equal to the number of true-positive results/(number of true-positive results + number of false-negative results), and specificity, which was equal to number of true-negative results/(number of true-negative results + number of false-positive results), were used. With regard to MRDD assays interpreted as positive for both P. falciparum and P. vivax for specimens subsequently found to contain a single species by microscopy, the results were treated as true positive for the species identified by microscopy and false positive for the species not identified by microscopy. Device sensitivity was also determined for each of several prospectively defined ranges of parasitemia: >0 to 500, 501 to 1,000, 1,001 to 5,000, and >5,000 parasites/µl. Receiver operating characteristic (ROC) curves were constructed and analyzed to determine the relative accuracy of each prototype at diagnostic thresholds corresponding to test line intensity for positive MRDD assay results (13, 43, 44). ROC curves plot the fraction of false-positive results of each prototype (1 minus the specificity; the ability to distinguish true-negative results from false-positive results) against the fraction of true-positive results (sensitivity; the ability to distinguish true-positive results from false-negative results) for each of the six possible positive line intensity values (0.25 through 4). This analysis displayed trade-offs between sensitivity and specificity that occurred as the result of incremental changes in positive test line intensity. The overall accuracy of each device was quantified with an area under the curve (AUC) value with the standard error and an estimate of its 95% confidence interval (CI95) (9, 38). Each prototype assay's performance for the detection of single species was further evaluated by comparison of the optimal diagnostic cutoff values. The optimal cutoff point was represented as the test line intensity with the maximum sum of sensitivity and specificity for each prototype. Diagnostic accuracy was interpreted by considering a combination of greater AUC values and lower diagnostic thresholds, i.e., the optimized relationship between test line intensity and a correct diagnostic result. Documentation was maintained to ensure that all study data, interpretations, and calculated values were traceable to original source material. Statistical significance was set at a P value <0.05, and AccuROC (version 2.5) software (Accumetric Corporation, Montreal, Quebec, Canada) was used to analyze ROC curve data.
|
|
|---|
Diagnostic microscopy. Microscopic examination of 4,873 peripheral blood smears showed that a total of 2,051(42.1%) were positive for malaria, with the remaining 2,822 (57.9%) determined to be negative for asexual parasite stages (Table 1). Species representation varied slightly between enrollment years, with the largest variation noted in the number of P. falciparum-positive blood smears detected in the Peruvian cohort: 115 (13.8%) in 1998 and 63 (8.5%) in 1999. The only P. malariae-positive blood smears were found in the Thai cohort, and both of these were detected in participants enrolled in 1998. A total of 50 (1.0%) mixed infections with both P. falciparum and P. vivax were detected and were found in 31 (1.0%) specimens obtained in 1998 and 19 (1.0%) specimens obtained in 1999. Study microscopists examining blood smears reported interpretations concordant for all three microscopic criteria (presence of parasitemia, species, and parasite density within twofold) for 93.8% of smears in 1998 and for 95.7% of smears in 1999.
|
View this table: [in a new window] |
TABLE 1. Summary of microscopic findings
|
|
View this table: [in a new window] |
TABLE 2. Cross-tabulation of ParaSight F+V results for P. falciparum against microscopy resultsa
|
The sensitivity of the FV98 prototype for detection of P. falciparum relative to the reference microscopy results was 98% (CI95, 97 to 100%) (Table 3). The FV99-1 and FV99-2 assays demonstrated overall sensitivities of 94% (CI95, 89 to 97%) and 98% (CI95, 94 to 100%), respectively. Both the FV99-1 and FV99-2 assays were 100% sensitive in detecting P. falciparum when the level of parasitemia exceeded 500/µl. There were no significant differences in device sensitivities between the Peruvian and Thai cohorts for any of the parasitemia ranges examined or among any of the three prototypes. Each of the three assays showed progressively decreasing sensitivities as the parasite concentrations decreased. The specificity of FV98 for exclusion of the presence of P. falciparum was 22% (CI95, 20 to 24%), with values of 83% (CI95, 80 to 86%) and 93% (CI95, 91 to 95%) for FV99-1 and FV99-2, respectively. The specificities of the FV98 prototype differed significantly at the Peru and Thai study sites.
|
View this table: [in a new window] |
TABLE 3. Performance characteristics of ParaSight F+V prototypes for detection of P. falciparuma
|
|
View this table: [in a new window] |
TABLE 4. Cross-tabulation of ParaSight F+V results for P. vivax against microscopy resultsa
|
|
View this table: [in a new window] |
TABLE 5. Performance characteristics of ParaSight F+V prototypes for detection of P. vivaxa
|
![]() View larger version (11K): [in a new window] |
FIG. 1. ROC curves for diagnostic performance of ParaSight F+V prototype assays. (A) Prototype performance for detection of P. falciparum. (B) Prototype performance for detection of P. vivax.
|
|
View this table: [in a new window] |
TABLE 6. Diagnostic accuracy of ParaSight F+V prototypes determined by ROC analysis
|
|
|
|---|
The findings generated over the course of 2 years show the progressive refinement of the ParaSight F+V assay to a point, exemplified by prototype FV99-2, that it provided an accurate means of detection of both P. falciparum and P. vivax in two distinct study cohorts, one in Asia and the other in South America. Compared to the reference microscopy results, the FV99-2 prototype device correctly identified asexual parasites in 98% of all P. falciparum-positive peripheral blood specimens and 87% of all P. vivax-positive peripheral blood specimens. These sensitivity values were accompanied by specificities of 93 and 87% for P. falciparum and P. vivax, respectively. Of further note, the FV99-2 device achieved 100% sensitivity with peripheral blood samples (n = 140) containing P. falciparum parasites at >500/µl. Such highly reliable sensitivities at a relatively low level of parasitemia suggest that, in patients suspected of having malaria, if the result of the initial test is negative, perhaps due to a very low level of parasitemia, testing of serially collected blood samples, perhaps at 12- to 24-h intervals, would yield very accurate sensitivities. Such a testing strategy would mirror that already used by competent microscopists for serial examination of blood smears (14). This level of diagnostic performance exceeds that of clinical microscopy reported in a variety of settings and has the potential to significantly enhance diagnostic capabilities in those instances in which skilled microscopy is not readily available (1, 6, 21, 23, 26).
It is not entirely clear why performance differences were observed in some instances between the two study sites. Considerable rigor was exerted in standardizing the execution of the protocol between the two study sites and in validating the performance parameters. Confounding variables, such as tester-dependent bias or subtle differences in sample handling, may have played a role in the variation in the results obtained between the Thai and Peruvian study sites.
The P. falciparum portion of the ParaSight F+V prototypes was based on a predicate device, the ParaSight F test, which was manufactured and distributed for several years prior to the development of the ParaSight F+V format. As reported previously, the ParaSight F device demonstrated a high level of performance under study conditions similar to those used in the present study (10). The development of a multispecies prototype assay based on the predicate assay for a single species likely required refinements in materials, device construction, and manufacturing processes and optimization of reagent formulations. While these refinements are the proprietary property of the developer, it is clear that the incorporation of additional test lines into the assay matrix affected assay performance during the progressive development of the device. The presence of additional antibody-impregnated test and control lines on the nitrocellulose dipstick and the resulting impacts on sample flow and secondary ligand binding were factors in the observed performances of early prototypes. The precise modifications to the FV99-2 prototype that alleviated the deleterious effects of false-positive results because of low line intensities are not known to investigators in the field, but the findings of the present study demonstrate a marked improvement in the performance of this final prototype device.
It was difficult to reliably assess the ability of the ParaSight F+V prototypes to accurately discriminate true mixed infections. The lack of a precise method to determine the contribution of each species to the total parasitemia in a mixed infection and the small number of microscopically detected mixed infections in each cohort prompted the exclusion of mixed infections from calculations of device performance. A better reference method for identifying mixed infections and discriminating the proportional contribution of each individual species to the total parasitemia is necessary to fully assess MRDD assay performance in the detection of mixed infections. This is an area of investigation that may benefit from the use of a molecular diagnostic technology to augment skilled microscopy as a reference standard (24, 29-31, 35, 41, 42).
The data presented in Tables 2 and 4 demonstrate an increase in device sensitivities, i.e., the ability to detect Plasmodium when it was present, with increased parasite concentrations. It is not clear why a small number of specimens with parasitemias >5,000 parasites/µl tested falsely negative, although this observation is not without precedent in either HRP-2-based assays (4, 10, 15, 19, 22, 28, 39) or MRDD assays that exploit other malarial antigens (18, 19, 27). While the number of specimens with false-negative results and high levels of parasitemia was small (<1.0% for each prototype), it is an area of continuing concern and suggests the interaction of other factors that may influence test outcome for a small subpopulation of symptomatic patients.
ROC analysis provides insight into the performances of the prototypes by illustrating the relationship between line intensity and diagnostic accuracy. The AUCs calculated from the ROC curves in Fig. 1 are indicative of the accuracies of the devices across the full range of potential test line intensities. Initial assessment of AUCs suggested that the FV98 MRDD was the most accurate device for the detection of P. falciparum in the enrolled patient population. Closer scrutiny of maximized diagnostic cutoffs revealed that the FV98 assay was optimized at a line intensity of 1 and, consequently, experienced significant performance degradation at the 0.25 and 0.5 line intensities. The FV99-1 prototype experienced a similar pattern of performance optimized at an intensity cutoff of 1. Although the AUC for the FV99-2 prototype was lower than that for the FV98 prototype, the FV99-2 prototype achieved an optimal cutoff at a line intensity of 0.5, indicating an improvement in diagnostic performance in association with lower-intensity test lines. A similar observation was made with respect to test accuracy for the detection of P. vivax, with the FV99-2 assay exhibiting optimized performance at a line intensity cutoff of 0.25, the faintest line visible to test readers. An MRDD test system that can be accurately and reliably interpreted as positive with the presence of any visible test line, regardless of intensity, enjoys a decided advantage in a point-of-care diagnostic setting. Conversely, an assay that offers an equivocal outcome at low line intensities will erode end-user confidence. The analysis of optimized line intensity is not intended to suggest that rapid diagnostic assays be interpreted on the basis of line intensity. Rather, these data illustrate the trade-offs between sensitivity and specificity and demonstrate the manufacturer's success in optimizing performance across the full spectrum of assay reactivities.
The present study documents the performance of three distinct prototypes resulting from the iterative development and refinement of the ParaSight F+V assay. We have demonstrated the benefits of using large-scale, multisite field trials to assess product refinement and for product evaluation. This investigation facilitated a critical assessment of the diagnostic performances of multiple generations of prototypes with specimens from a large population of symptomatic patients. The findings presented herein identify the ParaSight F+V test, in the FV99-2 format, as a mature MRDD assay capable of clinically acceptable diagnostic performance in a field setting. Notwithstanding the performances of the prototypes evaluated in this study, the manufacturer has no plans at present to complete the development and commercialization of the ParaSight F+V MRDD test.
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»