Previous Article | Next Article ![]()
Journal of Clinical Microbiology, October 2004, p. 4749-4758, Vol. 42, No. 10
0095-1137/04/$08.00+0 DOI: 10.1128/JCM.42.10.4749-4758.2004
Copyright © 2004, American Society for Microbiology. All Rights Reserved.
Health Sciences Center, Louisiana State University, New Orleans, Louisiana,1 University of San Francisco, San Francisco,2 San Joaquin County Public Health Department, San Joaquin, California,4 University of Alabama at Birmingham, Birmingham, Alabama,3 Johns Hopkins University, Baltimore,5 National Institute of Allergy and Infectious Disease, National Institutes of Health, Bethesda, Maryland6
Received 11 October 2003/ Returned for modification 7 January 2004/ Accepted 5 July 2004
|
|
|---|
|
|
|---|
To address this problem, investigators applied an alternative target amplification assay to the putative false-positive results (discrepancy analysis). There was no alternative at the time, as tests with similar sensitivity were not available for use in a composite infected-patient definition along with culture. This approach later was shown to be biased towards overestimating both sensitivity and specificity (6). Though the extent of bias appeared to be minimal, the ensuing controversy resulted in a lack of acceptance of this approach for determining the performance characteristics of new NAATs (5, 10, 11, 12).
Now that multiple NAATs are available and cleared for clinical use by the Food and Drug Administration, it is possible to design protocols to assess the performance of newer NAATs for the detection of C. trachomatis and Neisseria gonorrhoeae without the use of culture. Johnson et al. have shown that a single NAAT substituted for culture significantly improved performance estimates of another NAAT (8). In this study the combined results of two NAATs were used to estimate the performance of a third NAAT. It is now recognized that the problem with this approach is that variation in the sensitivity and specificity of the comparator NAATs could significantly influence the performance estimates of the other test.
Recently a multicenter trial was carried out to determine the performance of the APTIMA Combo 2 (Combo 2) transcription-mediated amplification assay (Gen-Probe Incorporated, San Diego, Calif.) for detection of C. trachomatis and N. gonorrhoeae in endocervical swabs, male urethral swabs, and urine from both men and women (3). Both the Abbott LCx ligase chain reaction (Abbott Laboratories Inc., Abbott Park, Ill.) and the Roche Amplicor PCR (Roche Diagnostic Systems, Indianapolis, Ind.) assays were used to devise a comparator standard for C. trachomatis that did not include culture. Women were defined as infected if any two of four comparator test results (endocervical swab or urine by PCR or ligase chain reaction) were positive. For men, a PCR urethral swab was not obtained, so the definition for infected status was if any two of the three comparator test results were positive. While these definitions seemed rational, they were not evidence based.
Since these data were derived from a large trial in which patients were tested with three different NAATs and two different specimens were tested in most cases, they provided a unique opportunity to look at the effect of varying the infected-patient definition on the performance estimates of a third NAAT. Therefore, we performed an analysis to better understand the use of NAATs as the infected-patient gold standard for measuring the performance of new C. trachomatis diagnostic assays. Additionally, the data provided an opportunity for a head-to-head comparison of the performance of Combo 2 with both Amplicor and LCx.
|
|
|---|
Male and female patients provided 25 ml of a first-catch urine. Three urethral swab specimens were obtained from males for the following assays: N. gonorrhoeae culture, Combo 2, and LCx. Swab specimens were collected before the first-catch urine specimen. Females provided four endocervical swab specimens for one N. gonorrhoeae culture and all three NAAT assays (Combo 2, LCx, and Amplicor). For women, the first-catch urine specimen was collected before the swab specimens. For men and women, the N. gonorrhoeae culture swab was collected first, and the collection order of the subsequent swabs was randomized.
Collection, storage, and transport of the GC culture swab followed site-specific protocols. All other specimens were collected, stored, and transported to the laboratory according to each assay manufacturer's instructions. Male swabs were not collected for Amplicor testing. Only the chlamydia data are analyzed here.
Main analysis. The effect of reducing the available NAATs used to define the infected patient from four tests to three tests to two tests was explored. The details of the definitions used are provided in Table 1. With these definitions, curves were constructed by plotting sensitivity on the y axis against 1 specificity on the x axis. It should be noted that the resultant curves resemble receiver-operator curves but they are distinct. Receiver-operator curves are based on a single gold standard test and display the effect of changing the definition of positive for an evaluated test. What we have done in this study is different. Here, for each family of curves, we are using a single "evaluated" test which has a predetermined definition of positive to assess multiple different gold standard definitions. In a sense, our analysis is the opposite of a receiver-operator curve analysis. The importance of this distinction is the fact that the points on our curves (each representing a different infected-patient gold standard) are not equally accurate and that one of the points is likely to be more accurate than the others. However, it should be noted that it may not be possible to determine from our analysis which definition truly is the best.
|
View this table: [in a new window] |
TABLE 1. Definitions
|
![]() View larger version (11K): [in a new window] |
FIG. 1. Amplicor female swab specimen performance curves. The legend to the right shows which specimens and assays were used as comparators for each curve. sam, swab by Amplicor; uam, urine by Amplicor; slc, swab by LCx; ulc, urine by LCx; scb, swab by Combo 2; ucb, urine by Combo 2. The individual points in each curve were determined as described in Table 1. The letters refer to the specific infected-patient definitions used to calculate each point on the curves as detailed in Table 1.
|
![]() View larger version (11K): [in a new window] |
FIG. 6. Combo 2 female urine specimen performance curves. Each figure represents the evaluation of a single specimen by LCx, Amplicor or Combo 2. The legend to the right shows which specimens and assays were used as comparators for each curve. sam, swab by Amplicor; uam, urine by Amplicor; slc, swab by LCx; ulc, urine by LCx; scb, swab by Combo 2; ucb, urine by Combo 2. The individual points in each curve were determined as described in Table 1. The letters refer to the specific infected-patient definitions used to calculate each point on the curves as detailed in Table 1.
|
![]() View larger version (11K): [in a new window] |
FIG. 7. Amplicor male urine specimen performance curves. The legend to the right shows which specimens and assays were used as comparators for each curve. sam, swab by Amplicor; uam, urine by Amplicor; slc, swab by LCx; ulc, urine by LCx; scb, swab by Combo 2; ucb, urine by Combo 2. The individual points in each curve were determined as described in Table 1. The letters refer to the specific infected-patient definitions used to calculate each point on the curves as detailed in Table 1.
|
|
|
|---|
Figures 1, 2, 3, 4, 5, and 6 show the families of curves generated by calculating sensitivity and specificity for each assay with a decreasing number of comparator assay results and/or different combinations of specimens to define the infected female patient. Each point on each curve represents a different definition of the infected patient (Table 1). There appear to be only four points on the four comparator curves for Combo 2 because the sensitivity and specificity for definitions c and d are exactly the same. Only one family of curves could be generated for males (Amplicor urine evaluation), since only two urethral swabs were obtained (Fig. 7). The results of this analysis closely matched those derived from the female data.
![]() View larger version (10K): [in a new window] |
FIG. 2. Amplicor female urine specimen performance curves. The legend to the right shows which specimens and assays were used as comparators for each curve. sam, swab by Amplicor; uam, urine by Amplicor; slc, swab by LCx; ulc, urine by LCx; scb, swab by Combo 2; ucb, urine by Combo 2. The individual points in each curve were determined as described in Table 1. The letters refer to the specific infected-patient definitions used to calculate each point on the curves as detailed in Table 1.
|
![]() View larger version (12K): [in a new window] |
FIG. 3. LCx female swab specimen performance curves. Each figure represents the evaluation of a single specimen by LCx, Amplicor, or Combo 2. The legend to the right shows which specimens and assays were used as comparators for each curve. sam, swab by Amplicor; uam, urine by Amplicor; slc, swab by LCx; ulc, urine by LCx; scb, swab by Combo 2; ucb, urine by Combo 2. The individual points in each curve were determined as described in Table 1. The letters refer to the specific infected-patient definitions used to calculate each point on the curves as detailed in Table 1.
|
![]() View larger version (12K): [in a new window] |
FIG. 4. LCx female urine specimen performance curves. The legend to the right shows which specimens and assays were used as comparators for each curve. sam, swab by Amplicor; uam, urine by Amplicor; slc, swab by LCx; ulc, urine by LCx; scb, swab by Combo 2; ucb, urine by Combo 2. The individual points in each curve were determined as described in Table 1. The letters refer to the specific infected-patient definitions used to calculate each point on the curves as detailed in Table 1.
|
![]() View larger version (11K): [in a new window] |
FIG. 5. Combo 2 female swab specimen performance curves. The legend to the right shows which specimens and assays were used as comparators for each curve. sam, swab by Amplicor; uam, urine by Amplicor; slc, swab by LCx; ulc, urine by LCx; scb, swab by Combo 2; ucb, urine by Combo 2. The individual points in each curve were determined as described in Table 1. The letters refer to the specific infected-patient definitions used to calculate each point on the curves as detailed in Table 1.
|
Interestingly, the any-two-positive-of-three-results definition (definition h) appears to perform as well as definitions b and c, which require four comparator assay results. However, there are two different ways of creating the infected-patient definitions with three comparators; one swab and two urine comparators or two swabs and one urine comparator could have been used. As can be seen in Fig. 8 and 9, this choice does have an effect on the sensitivity and specificity estimates. The highest combined estimates of swab sensitivity and specificity are derived by using two swab specimens and one urine specimen as comparators. Similarly, the highest combined estimates of urine performance are provided by two urine specimens and one swab specimen as comparators.
![]() View larger version (15K): [in a new window] |
FIG.8. Effect on performance estimates for swab specimens of varying the definition of the infected patient by changing the mix of urine and swab specimens used as comparators. All curves were constructed with the three-comparator definitions f, g, h, and i (see Table 1). Solid lines represent data based on two swabs and one urine comparator. Dashed lines represent data based on one swab and two urine comparators. Solid square, swab by Combo 2 versus swab by LCx, urine by LCx, and swab by Amplicor; open square, swab by Combo 2 versus swab by LCx, urine by LCx, and urine by Amplicor; solid circle, swab by LCx versus swab by Amplicor, urine by Amplicor, and swab by Combo 2; open circle, swab by LCx versus swab by Amplicor, urine by Amplicor, and urine by Combo 2; solid triangle, swab by Amplicor versus swab by LCx, urine by LCx, and swab by Combo 2; open triangle, swab by Amplicor versus swab by LCx, urine by LCx, and urine by Combo 2.
|
![]() View larger version (14K): [in a new window] |
FIG. 9. Effect on performance estimates for urine specimens of varying the definition of the infected patient by changing the mix of urine and swab specimens used as comparators. All curves were constructed with the three-comparator definitions f, g, h, and i (see Table 1). Solid lines represent data based on two swabs and one urine comparator. Dashed lines represent data based on one swab and two urine comparators. Solid square, swab by Combo 2 versus swab by LCx, urine by LCx, and swab by Amplicor; open square, swab by Combo 2 versus swab by LCx, urine by LCx, and urine by Amplicor; solid circle, swab by LCx versus swab by Amplicor, urine by Amplicor, and swab by Combo 2; open circle, swab by LCx versus swab by Amplicor, urine by Amplicor, and urine by Combo 2; solid triangle, swab by Amplicor versus swab by LCx, urine by LCx, and swab by Combo 2; open triangle, swab by Amplicor versus swab by LCx, urine by LCx, and urine by Combo 2.
|
|
View this table: [in a new window] |
TABLE 3. Comparison of sensitivity and specificity of the Combo 2 and Amplicor assays for female and male urine specimens and female swab specimens
|
|
|
|---|
It is known that in some infected women, C. trachomatis can be found only in the endocervix, while in others it can be detected only in the urine specimen (9). Therefore, exclusive use of multiple swab specimens or multiple urine specimens could significantly bias performance estimates of a new test. The dilemma of what does constitute the best definition of a NAAT-based infected-patient gold standard then arises. If it is necessary to use both a swab and urine specimen to define the infected-patient gold standard, is a single Food and Drug Administration-cleared NAAT adequate for these tests? If so, should it be required that both tests be positive, or is only one positive of the two adequate? If two specimens tested by only one NAAT is inadequate and more than one assay is to be used to define the infected-patient gold standard, is it necessary to test both urine and swab specimens by both assays? For males, the more urethral swabs required by a clinical protocol, the more difficult it is to recruit study subjects. Could an adequate male infected-patient gold standard be created by testing a urine sample by two different Food and Drug Administration-cleared NAATs plus a single urethral swab specimen tested by only one of the methods?
In this study we attempted to answer these questions by examining the effect of varying the number of comparator assays and specimen types used to define the infected patient. These results are summarized in Fig. 1 through 7. Theoretically, the more points available to construct such curves, the more reliable the results. Based on this consideration, the curves generated with four available comparator results would be considered the standard for comparison with curves that are constructed with fewer comparators. As can be seen from the figures, curves with only three comparator results closely approximated the curves with four comparators results. On the other hand, using only two comparators results to define the infected patient does not appear to be adequate. Requiring that two of two assays be positive (definitions j and i) biases results towards low specificity, while requiring that only one of two be positive (definitions k and m) has the opposite effect.
If three or four comparator results are used to formulate the infected-patient gold standard definition, the effect of requiring that all comparator results be positive (definitions a and f) biases performance estimates towards high sensitivity and low specificity. Requiring only one test to be positive (definitions e and i) has the opposite effect. An ideal infected-patient gold standard would result in estimates of 100% sensitivity and 100% specificity for a perfect C. trachomatis test. It follows that infected-patient gold standard definitions resulting in estimates that are nearest to the ideal might be the most accurate. With four comparator results, the points on the curves defined by infected-patient gold standard definitions b and c appear to be closest to meeting this criterion. Using three comparator results and defining the infected patient as any two positives of the three possible results (definition h) appear to provide estimates for both the sensitivity and specificity between those of definitions b and c. A three-component infected-patient gold standard would be less costly than a four-component infected-patient gold standard.
If three comparator amplification assay results are adequate for defining the infected patient, is there a difference if two urine results and one swab result are used as opposed to one urine and two swab results? The curves in Fig. 8 and 9 suggest that there is. If swab specimens are being evaluated, using two swabs and one urine specimen as the comparators will result in higher combined sensitivity and specificity estimates than one swab and two urine comparators. Similarly, for evaluating urine specimens, two urine specimens and one swab specimen as the comparators result in higher combined sensitivity and specificity results than the converse.
Based on these observations, we recommend the following approach to evaluation of new diagnostic tests for C. trachomatis in women. Since multiple endocervical swabs are not difficult to obtain, a swab specimen result for the evaluated assay in women can be compared to one urine and two swab results with two different Food and Drug Administration-cleared NAATs. The evaluated test's urine result would be compared to one swab and two urine results. Any two positive results out of the possible three comparator results would define the infected-patient gold standard (definition h). Based on our analysis, this algorithm appears to provide estimates for a new diagnostic test's performance with both female swab and urine specimens that are as good as or better than those of any other combination of assays and specimens.
For male studies the infected-patient gold standard could also be defined by three comparators, including a swab and urine run by one Food and Drug Administration-cleared NAAT and urine run by another. This strategy would result in optimal performance estimates for new urine tests. However, our data indicate that the swab performance estimates with this approach will be slightly lower than they would be if one urine and two swab specimens were used to formulate the infected male patient definition. This is a reasonable trade-off given the fact that it is difficult to obtain more than two urethral swab specimens from men for the purposes of a clinical trial.
Of course these recommendations are not based on a rigorous statistical analysis of the data. Given the novelty of our analysis, there do not appear to be well-established mathematical approaches to the data. It has been suggested that the latent class model approach could be applied, but this is relatively new and it is not clear that how it would be applied to our data or that the end result would lead to conclusions that would be any more acceptable than the opinions offered above. Our data are available to anyone with an interest in developing such analytic approaches. In the meantime it is our hope that the graphic presentation of the data shown here will enable anyone with an interest in developing new diagnostic tests for C. trachomatis, other sexually transmitted diseases, and possibly infectious diseases in general to gain a sense of how differences in NAAT-based infected-patient gold standard definitions affect sensitivity and specificity estimates.
Comparison of the performance curves for the Combo 2 assay to those for the Amplicor and LCx assays in Fig. 8 and 9 suggested that Combo 2 might be more sensitive but less specific than these other two tests. Direct comparisons of Combo 2 and LCx were done with the two Amplicor results. Similarly, Combo 2 was compared to the Amplicor assay with the LCx results. While the estimates derived from these comparisons suffer from bias towards lower sensitivity and higher specificity, as discussed above, relative performance comparisons between any two assays with a third assay remain valid. On this basis, the APTIMA Combo 2 does appear to be a more sensitive test. The lower specificity may reflect the greater sensitivity of the assay (the infected-patient definition is missing some truly infected cases) or could reflect more false-positive results. Testing of specimens that appeared false positive by Combo 2 in a transcription-mediated amplification assay that targets alternative nucleic acid sequences suggests that the former is the case (3). This suggests that the true specificity of APTIMA Combo 2 is higher than that reflected by the analyses shown here.
|
View this table: [in a new window] |
TABLE 2. Comparison of sensitivity and specificity of the Combo 2 and LCx assays for female urine and swab specimensa
|
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»