Diagnostic Performance of a Molecular Test versus Clinician Assessment of Vaginitis

ABSTRACT Vaginitis is a common complaint, diagnosed either empirically or using Amsel's criteria and wet mount microscopy. This study sought to determine characteristics of an investigational test (a molecular test for vaginitis), compared to reference, for detection of bacterial vaginosis, Candida spp., and Trichomonas vaginalis. Vaginal specimens from a cross-sectional study were obtained from 1,740 women (≥18 years old), with vaginitis symptoms, during routine clinic visits (across 10 sites in the United States). Specimens were analyzed using a commercial PCR/fluorogenic probe-based investigational test that detects bacterial vaginosis, Candida spp., and Trichomonas vaginalis. Clinician diagnosis and in-clinic testing (Amsel's test, potassium hydroxide preparation, and wet mount) were also employed to detect the three vaginitis causes. All testing methods were compared to the respective reference methods (Nugent Gram stain for bacterial vaginosis, detection of the Candida gene its2, and Trichomonas vaginalis culture). The investigational test, clinician diagnosis, and in-clinic testing were compared to reference methods for bacterial vaginosis, Candida spp., and Trichomonas vaginalis. The investigational test resulted in significantly higher sensitivity and negative predictive value than clinician diagnosis or in-clinic testing. In addition, the investigational test showed a statistically higher overall percent agreement with each of the three reference methods than did clinician diagnosis or in-clinic testing. The investigational test showed significantly higher sensitivity for detecting vaginitis, involving more than one cause, than did clinician diagnosis. Taken together, these results suggest that a molecular investigational test can facilitate accurate detection of vaginitis.

addition, the majority of real-world diagnoses are empirical and less than half of all treatments are based on objective assays (8), which can result in incorrect diagnosis and treatment (9). Molecular assays that target bacterial vaginosis, Candida spp., and Trichomonas vaginalis have the potential to improve diagnostic accuracy and reduce time to result compared to those for culture (10). This may be especially important for bacterial vaginosis, which involves multiple organisms of the vaginal microbiota (11).
The Food and Drug Administration-approved BD MAX vaginal panel (investigational test; Becton, Dickinson and Company, BD Life Sciences-Diagnostic Systems), using the BD MAX system, involves amplification-based DNA detection for all three common causes of vaginitis. This article provides results of additional analysis from a research study that was previously described by Gaydos et al. (12) in which sensitivity and specificity of at least 90% and 85%, respectively, were reported for bacterial, fungal, and protozoan causes. While the work by Gaydos et al. compared the investigational test to the reference methods for diagnostic performance in detection of vaginitis causes and included performance for both clinician-collected and self-collected samples, this study compared the clinician collected investigational test, in-clinic testing, and clinician diagnosis to reference methods defined as Nugent score for bacterial vaginosis and culture for both Candida spp. (followed by bidirectional sequencing) and Trichomonas vaginalis.

MATERIALS AND METHODS
Study design. The STARD statement was used to ensure accurate reporting in this article (13). The study design was a diagnostic accuracy, cross-sectional study that has been previously described (12). All eligible subjects were recruited consecutively between May and September 2015 if they reported symptoms of vaginitis (at least one of the following symptoms: abnormal vaginal discharge; painful or frequent urination; vaginal itching, burning, or irritation; painful or uncomfortable intercourse; and vaginal odor) during routine clinic visits. Institutional review board approval was obtained by all 10 participating centers, which were either academic medical center clinics or community clinics. Only specimens meeting study inclusion criteria were included in analyses (Fig. 1).
Data collection. The following vaginal swabs were obtained by a predetermined, rotating order of collection: one investigational test swab (BD MAX specimen collection swab; Becton, Dickinson and Company, BD Life Sciences-Diagnostic Systems; Sparks, MD), one cotton swab each for wet mount and Trichomonas vaginalis culture (Puritan Medical Products, Guilford, ME), and one BD liquid Amies elution swab collection and transport system (Becton, Dickinson and Company, BD Life Sciences-Diagnostic Systems) for Nugent scoring and Candida culture.
Data collection for this study was planned prior to performance of the index and reference tests. Results indicating test positivity for the reference methods were prespecified and were based on the presence or absence of vaginitis causes determined by the three assays described below. As previously used in the parent study (12), the reference method used in this study for bacterial vaginosis was Nugent scoring (4), the accepted gold standard, with score values of 0 to 3 (normal flora) or 7 to 10 (bacterial vaginosis) for bacterial vaginosis. For these analyses, only positive or negative scoring was considered; intermediate scores were not considered because no correlate result for intermediate is reported by the investigational test. Therefore, Amsel's criteria, used to resolve intermediate Nugent scores (4 to 6) (12), were not analyzed. For vulvovaginal candidiasis (all Candida spp. were combined), cultures were established, the current clinical standard for diagnosis, followed by bidirectional sequencing of the its2 gene (from purified isolates). The InPouch TV culture system (Biomed Diagnostics, Inc.; White City, OR), incubated for 5 to 7 days (the latest recommended incubation time), was used as the reference method for trichomoniasis (14,15).
Investigational test. The investigational test swab was stored in buffer during specimen transport to the laboratory. The investigational test (BD MAX vaginal panel; Becton, Dickinson and Company, BD Life Sciences-Diagnostic Systems) is a molecular test and was performed with the BD MAX system (Becton, Dickinson and Company, BD Life Sciences-Diagnostic Systems). The assay uses real-time PCR for amplification of specific DNA targets, followed by fluorogenic, target-specific probes to differentially detect bacterial vaginosis markers, including Lactobacillus spp. (Lactobacillus crispatus and L. jensenii), Gardnerella vaginalis, Atopobium vaginae, bacterial vaginosis-associated bacterium 2 (BVAB-2), and Megasphaera 1; Candida group (Candida albicans, C. tropicalis, C. parapsilosis, and C. dubliniensis), C. glabrata, and C. krusei; and Trichomonas vaginalis. For the purposes of this analysis, Candida group, C. glabrata, and C. krusei were combined (Candida spp.). Specimen turnaround time was approximately 3 h (including upfront processing time of less than 20 min) from initiation of testing to result. For bacterial vaginosis, the proprietary algorithm determined a positive or negative status based on the presence and concentration of each of the markers mentioned. For Candida spp. and Trichomonas vaginalis, test positivity was prespecified and determined by the presence or absence of target DNA (results reported as positive or negative).
In-clinic tests. As described previously (12), in-clinic tests were performed for bacterial vaginosis, Candida spp., and Trichomonas vaginalis. For bacterial vaginosis, Amsel's criteria were used (vaginal pH Ͼ 4.5; clue cells seen on wet mount microscopy; "whiff test"; and thin, homogeneous, grayish, or off-white vaginal discharge) (3). On wet mount microscopy, which was read by physicians or nurse practitioners (depending on the site), visualization of pseudohyphae or budding yeast was considered positive for vulvovaginal candidiasis. Visualization of motile trichomonads on wet mount microscopy was used to identify Trichomonas vaginalis. Further details of these methods have been described previously (12).
Clinician diagnosis. Overall, clinician diagnosis was based on clinical assessment of subject history, signs, and symptoms and was recorded in case report forms; findings from in-clinic testing (all in-clinic tests were utilized for the diagnosis) were also utilized.
Clinician investigators performed all the in-clinic tests and provided their diagnosis before obtaining any results from the reference methods or the investigational test. Results of the investigational test and reference methods were unknown, respective to each other, prior to their completion. The full study protocol may be accessed by contacting the corresponding author.
Statistics. Sensitivity, specificity, overall percent agreement (OPA), positive predictive value (PPV), and negative predictive value (NPV) were calculated according to standard equations. The confidence intervals (CI) were calculated using the Wilson score method (16). For the investigational test and in-clinic testing, within the 1,677 eligible specimens, results of not compliant for bacterial vaginosis (6.1%), Candida spp. (3.6%), and Trichomonas vaginalis (4.3%), or indeterminate/failed for bacterial vaginosis (1.1%), Candida spp. (0.1%), and Trichomonas vaginalis (0.1%), were not utilized for data analysis (Fig. 1). Logistic modeling was performed to determine whether sensitivity and specificity were statistically different between the investigational test and either in-clinic testing or clinician diagnosis, relative to the reference method. A significance level of 0.05 was used. The statistical difference for OPA values was determined using the Cohen's kappa coefficient. The Wald confidence intervals are provided for the kappa statistic (17). A kappa statistic of Ͼ 0.90 indicates almost perfect agreement, 0.80 to 0.90 indicates strong agreement, 0.60 to 0.79 indicates moderate agreement, 0.40 to 0.59 indicates weak agreement, 0.21 to 0.39 indicates minimal agreement, and 0 to 0.20 indicates no agreement between the two populations being studied (beyond chance) (18). Test accuracy was determined by the formula (prevalence of vaginitis cause ϫ test sensitivity) ϩ (1 Ϫ prevalence of vaginitis cause ϫ test specificity) (19). Sample size for this study was based on determinations described by Gaydos et al. (12).

RESULTS
A total of 1,763 subjects were enrolled in the study, with 1,740 subjects completing study procedures (Fig. 1). Reasons for exclusion of 23 subjects included subject withdrawal (13), incorrect informed-consent process (7), enrollment of asymptomatic subject (2), and previous enrollment of subject in this study (1). Demographic data were described previously (12). Of the 1,740 subjects completing study procedures, 1,677 had evaluable specimens for at least one cause of vaginitis. For bacterial vaginosis, 1,338 subjects had a Nugent score of 0 to 3 or 7 to 10 (reference method). All four of Amsel's criteria were present for 1,301 subjects. For Candida spp., 1,613 subjects had an evaluable result with the reference method. Potassium hydroxide preparation results were also available for 1,598 of these subjects. For Trichomonas vaginalis, 1,600 subjects had culture (reference method) and wet mount results that were compliant and reportable. The age range for subjects in this study was 18 to 81 years (see Table S1 in the supplemental material).
Performance of the investigational test, Amsel's criteria (modified or original), and clinician diagnosis for detection of bacterial vaginosis was assessed by comparing their results to Nugent scores 0 to 3 and 7 to 10 ( Table 1 and Fig. 2). Compared to the original Amsel's test, the investigational test resulted in a significantly higher sensitivity (75.6% versus 92.7%, respectively; P Ͻ 0.0001), with specificities of 94.1% and 91.5%, respectively. The modified Amsel's test (2/3), which omits discharge as a diagnostic criterion, had the highest sensitivity of all in-clinic tests but had a lower sensitivity (82.0%; P Ͻ 0.0001) than the investigational test; the modified Amsel's test (2/3) had a specificity (90.6%) that was similar to that of the investigational test. The investigational test had an OPA of 92.2% and a kappa value of 0.84. Whereas the original Amsel's test had a lower OPA of 83.3% (P Ͻ 0.0001) and a kappa value of 0.67, the modified Amsel's test (2/3; no discharge) also had a lower OPA (85.6%; P Ͻ 0.0001) compared to the investigational test and a kappa value of 0.71. The sensitivity of clinician diagnosis for bacterial vaginosis (77.3%) was significantly lower than that of the investigational test (P Ͻ 0.0001), whereas the specificities were similar (92.3% for clinician diagnosis). Clinician diagnosis had a lower OPA (83.6%; P Ͻ 0.0001) and kappa value (0.67) than did the investigational test (Table 1).
Consistent with the relatively high sensitivity for the investigational test, the NPV for the investigational test was 90.1%, which was higher than those of the original Amsel test (73.5%; P Ͻ 0.0001), the modified Amsel test (2/3; no discharge) (78.4%; P Ͻ 0.0001), and clinician diagnosis (74.5%; P Ͻ 0.0001), respectively ( Table 1). The PPV of the investigational test was higher than those of the other two methods, but no statistically significant difference was found. The prevalence of bacterial vaginosis in this study was 58%. Figure S1A contains likelihood ratios for comparison of PPV and NPV for the investigational test versus clinician diagnosis and in-clinic testing for bacterial vaginosis. Figure 3A shows the change in accuracy of the investigational test, clinician diagnosis, and in-clinic testing as disease prevalence increases (from 0% to 100%). Clinician diagnosis and in-clinic testing show a decrease in accuracy with increasing prevalence, whereas the accuracy for the investigational test remains relatively constant. At very low disease prevalence, clinician diagnosis has a relatively high accuracy, which falls as prevalence values exceed 5%.
Performance of potassium hydroxide preparation (in-clinic test), clinician diagnosis,  Fig. 2). The investigational test had a higher sensitivity (90.7% versus 57.5%, respectively; P Ͻ 0.0001) and a higher specificity (93.6% versus 89.4%, respectively; P Ͻ 0.0005) than did potassium hydroxide preparation. The sensitivity (56.8%; P Ͻ 0.0001) and specificity (89.2%; P ϭ 0.0002) of clinician diagnosis for Candida spp. was significantly lower than that of the investigational test. The investigational test had a higher OPA (92.7%) than both potassium hydroxide preparation (79.0%; P Ͻ 0.0001) and clinician diagnosis (78.7%; P Ͻ 0.0001); the investigational test also had a higher corresponding kappa value (0.84) than did both potassium hydroxide preparation (0.50) and clinician diagnosis (0.49). As shown in Table 2 and in Fig. S1B, the PPV and NPV of the investigational test (87.2% and 95.5%, respectively) were significantly greater than those for potassium hydroxide preparation (72.2% [P Ͻ 0.0001] and 81.4% [P Ͻ 0.0001], respectively) and clinician diagnosis (71.5% [P Ͻ 0.0001] and 81.1% [P Ͻ 0.0001], respectively). Potassium hydroxide preparation and clinician diagnosis accuracy both decreased with increasing prevalence, whereas the accuracy of the investigational test remained high (Fig. 3B). The population prevalence of Candida spp. in the study was 32%. Table 3 demonstrates the comparative performance for the detection of Trichomonas vaginalis with wet mount (in-clinic test), clinician diagnosis, and the investigational test compared to that of culture (reference method). The investigational test had a sensitivity of 96.7% for Trichomonas vaginalis, which was statistically greater than that for wet mount (69.7%; P Ͻ 0.0001) and clinician diagnosis (68.9%; P Ͻ 0.0001 [ Fig. 2]), whereas no statistical difference was found for the specificity of the investigational test (99.1%) versus wet mount (99.5%; P ϭ 0.1336) or clinician diagnosis (99.1%; P ϭ 0.8273). The investigational test had a significantly greater OPA (98.9%, versus 97.2% and 96.8%, respectively; P Ͻ 0.0001 for both comparisons) and a higher kappa value (0.92 versus 0.78 and 0.75, respectively) than wet mount and clinician diagnosis.
As shown in Table 3 and in Fig. S1C, the PPV for the investigational test was 89.4%, compared to 86.6% for clinician diagnosis and 91.4% for wet mount. The NPV for the investigational test (99.7%) was greater than for wet mount (97.5%; P Ͻ 0.0001) and clinician diagnosis (97.5%; P Ͻ 0.0001). Wet mount and clinician diagnosis accuracy both decreased with increasing prevalence, whereas the investigational test accuracy remained relatively high and constant over increasing prevalence (Fig. 3C). The population prevalence of Trichomonas vaginalis in the study was 8%. Table 4 shows the percentages in cases involving coinfection for vaginitis detected by the investigational test and clinician diagnosis. The investigational test had greater . Change in test accuracy is plotted (y axis; 0% to 100%) as population prevalence changes (x axis; 0% to 100%). The actual prevalence in this study for each of the three causes in panels A to C is indicated with a vertical red line. The vertical blue line in (A) indicates the prevalence for bacterial vaginosis found in the study of Gaydos et al. (Nugent scoring 0 to 3 and 7 to 10 plus modified Amsel's criteria 2/3 without discharge for Nugent scoring 4 to 6; compared to Nugent in this study using 0 to 3 and 7 to 10) (12).

Molecular Test versus Clinician-Based Vaginitis Diagnosis
Journal of Clinical Microbiology      Figure S1D to G show the likelihood ratios of the investigational test compared to clinician diagnosis and reflect the consistently high sensitivity of the investigational test compared to that of clinician diagnosis (Fig. 2).

DISCUSSION
The investigational molecular test used in this study is the first Food and Drug Administration-cleared nucleic acid amplification test for detection of the three major causes of vaginitis: bacterial vaginosis, Candida spp., and Trichomonas vaginitis. For these three causes, the investigational test consistently outperformed in-clinic testing and clinician diagnosis for sensitivity, with no depreciation in specificity (Tables 1 to 3 and Fig. 2 and 3). Importantly, the investigational test had the highest OPA with the reference test and better NPV for all causes compared to in-clinic testing and clinician diagnosis. Finally, the investigational test resulted in high diagnostic accuracy and likelihood ratios across all three vaginitis causes.
Traditionally, a diagnosis of vaginitis has been performed through clinical findings, medical history, and in-clinic testing, with the last representing an essential component for the establishment of a clinician diagnosis. For bacterial vaginosis, some combination of the Amsel's criteria is the mainstay for standard of care diagnosis in the clinic. CDC guidelines (5) suggest that three out of four Amsel's criteria should be positive (Amsel's original). However, Amsel's criteria are known to be highly subjective and open to interpretation (20,21). In the current study, of all Amsel's components, pH had the highest sensitivity, while the whiff test had the highest specificity (Table 1). Other studies have reported that the presence of clue cells is the key pathognomonic feature of bacterial vaginosis, but this requires high technical expertise and appropriate laboratory infrastructure (22). Also, previous data showed better agreement between the Nugent score and Amsel's criteria when the latter did not include vaginal discharge as a criterion (12,23). Our findings confirm this, as we showed that removing discharge as a criterion and looking for two out of three positive Amsel's criteria (modified Amsel 2/3 without discharge) improved test sensitivity, NPV, and OPA compared to Amsel's original testing. In this study, clinician diagnosis reported an OPA that matched better with the Amsel's original test than the modified Amsel 2/3 without discharge (Table 1). This suggests that clinician diagnosis in our study likely involved Amsel's original test. When considering applicability, it should be noted that Amsel's criteria were applied during this study within a highly controlled research environment involving consistent prestudy and ongoing training and quality monitoring; this may not accurately reflect the empirical nature of Amsel's criteria performance as typically used in clinical practice.
We determined the accuracy for detecting vaginitis from the three testing methods. The empirical accuracy for all three diagnostic methods depends on several factors, including test performance, prevalence, and the actual cause of vaginitis. For all three vaginitis causes, this report shows that as prevalence values increase, the accuracy of the investigational test remains relatively high and constant, while the accuracy for clinician diagnosis decreases (Fig. 3). However, this conclusion assumes that the operation characteristics of in-clinic testing and clinician diagnosis do not change at high prevalence values for vaginitis causes, which may not be the case.
Consistent with diagnosis of vaginitis by single causes, the investigational test outperformed clinician diagnosis of vaginitis that was due to multiple causes. The investigational test was more sensitive and had relatively high likelihood ratios for vaginitis with multiple causes (Table 4; see also Fig. S1D to G). It has previously been observed that the sensitivity of Amsel's criteria is diminished when Trichomonas vaginalis or Candida spp. are also present (24). They could explain the drastic drop in diagnostic performance that occurred for clinician diagnosis from single to multiple causes. Our results suggest that the investigational test may be resistant to reductions in diagnostic sensitivity when multiple causes of vaginitis are present.
This study had limitations that prevent an exact interpretation of the findings. Several analyses presented here involved observations for each type of infection that were excluded due to noncompliance or inability to report. It is possible, for example, that listing these types of observations as "not compliant" or "not reportable" for the investigational test, in lieu of "failure to correctly diagnose," may have artificially improved its operating characteristics. Other limitations include the fact that the investigational assay may have resulted in an overdiagnosis of vaginitis, as it cannot distinguish nonpathogenic colonization from pathogenic growth; this would be considered for clinician diagnosis (25,26). However, the clinical cutoff for the investigational test was set by the current reference standard for diagnosing Candida spp. (positive fungal culture report), and therefore, the results are consistent with everyday practice. Moreover, bacterial vaginosis may be detected by the Nugent score (7 to 10) but also be asymptomatic (27). The investigational test showed the best agreement with the Nugent score, which is the gold standard, but may have included asymptomatic bacterial vaginosis. The bacterial vaginosis algorithm for the investigational test was set by the composite reference method of concordant positive and negative Nugent and Amsel's criteria. Therefore, only unambiguous specimens for bacterial vaginosis status were used to develop the algorithm. Additionally, this study employed a cross-sectional design that did not evaluate clinical outcomes for patients with discordant reference method results and clinician diagnosis. Only clinics with expertise and resource availability for detection of the four Amsel's criteria and wet mount procedures were chosen as study sites. Therefore, clinician diagnosis benefited from reliability of in-clinic results in a way that might not occur under real-life conditions. Thus, the actual difference in clinician diagnosis versus the investigational test may likely be greater than that seen in this study. Finally, in this study we omitted the intermediate values for Nugent scoring (4 to 6, as described in Materials and Methods), whereas Gaydos et al. used the composite reference method of Nugent score combined with the modified Amsel 2/3 criteria without discharge to discriminate intermediate Nugent scoring (4 to 6). We may have missed some cases of bacterial vaginosis, the exclusion of which could have led to either an over-or underestimation of performance in the investigational test. However, the prevalence of bacterial vaginosis in this study (58%) was very close to that reported by Gaydos et al. (55.8%) (12).
The results from the current study support the potential utility of the investigational test in the differential diagnosis of vaginitis (28). While some laboratory tests take 2 to 7 days to provide results, the investigational test results are generally available within 24 h. Although future work is required to establish the cost/benefit ratio regarding the application of this investigational test in a practical setting, its high sensitivity, specificity, and accuracy (across a large spectrum of disease prevalence) should impart benefits and decrease the chance of needless treatment of patients that are negative for the disease (29). This may prove especially important with cases of vaginitis that involve multiple causes, where the sensitivity of clinician diagnosis may be limited.