Chagas Disease Serological Test Performance in U.S. Blood Donor Specimens

Chagas disease affects an estimated 300,000 individuals in the United States. Diagnosis in the chronic phase requires positive results from two different IgG serological tests. Three enzyme-linked immunosorbent assays (ELISAs) (Hemagen, Ortho, and Wiener) and one rapid test (InBios) are FDA cleared, but comparative data in U.S. populations are sparse. We evaluated 500 seropositive and 300 seronegative blood donor plasma samples.

ing the need for reliable diagnostic testing for both individual and public health needs in the United States (11).
In the chronic phase, confirmed diagnosis requires positive results by two serological tests for IgG antibodies to T. cruzi, preferably based on different antigens (12). Currently, four serological assays, namely, the Ortho T. cruzi enzyme-linked immunosorbent assay (ELISA) (Ortho Clinical Diagnostics, Raritan, NJ), Hemagen Chagas' kit ELISA (Hemagen Diagnostics, Inc., Columbia, MD), Wiener Chagatest Recombinante v.3.0 ELISA (Wiener Laboratories, Rosario, Argentina), and InBios Chagas Detect Plus (CDP) rapid test (InBios International, Inc, Seattle, WA), are cleared by the FDA for diagnostic use (13). The Ortho and Hemagen ELISAs are based on native parasite proteins (14)(15)(16). The other two assays are based on recombinant proteins. The Wiener ELISA uses trypomastigote-shed acute-phase antigens (SAPA) and recombinant epimastigote antigens 1, 2, 13, 30, and 36 (17). The InBios test is based on the recombinant multiepitope fusion antigen ITC8.2 (18). All four assays report high sensitivity and specificity in their FDA 510(k) clearance applications (reported percent sensitivity/specificity: Ortho, 98.9/ 99.99; Hemagen, 100/98.7; Wiener, 99.3/98.7; InBios, 95 to 100/87 to 98). However, comparative performance data are lacking for at-risk populations in the United States, as well as for those in Mexico and Central America, the predominant regions of origin of U.S. immigrants (19). Emerging evidence suggests variation in test sensitivity by geographic location and a high rate of discordance between serological test results, particularly in Mexico (20)(21)(22)(23). Comprehensive studies are needed to provide the basis for development of reliable testing algorithms. In this study, we compared the performances of the four FDA-cleared serological tests in specimens from U.S. blood donors to provide the first systematic evidence to improve laboratory diagnosis of Chagas disease in the United States.

MATERIALS AND METHODS
Ethical approval. This study was approved by the American Red Cross (ARC) institutional review board and was deemed exempt from review by the Human Research Protection Program at the University of California, San Francisco (UCSF).
Sample selection and preparation. We evaluated archived plasma samples from 800 blood donations (BDs) collected by the ARC between September 2006 and June 2018. Specimen selection was based on confirmed T. cruzi infection status in ARC BD testing algorithms at the time of blood donation (8). ARC provided a list of 1,091 seropositive specimens, defined by repeat reactive results generated by an FDA-licensed screening test (Ortho ELISA or Abbott PRISM [Abbott Laboratories, Abbott Park, IL]) followed by confirmed-positive results generated by a supplemental test (radioimmunoprecipitation assay [RIPA], performed by Quest Diagnostics [Chantilly, VA], or Abbott enzyme strip assay [ESA]) (8). We prioritized selection of BD-positive specimens with country-of-birth data; the remainder of the 500 BD-positive specimens were selected at random. A random sample of 300 specimens was compiled from a list of 3,938 seronegative blood donations, frequency matched by region of donation to the BD-positive specimen set. No country-of-birth data were available for seronegative specimens.
Donated plasma units from each donation were frozen at -20°C within 24 h of collection. Plasma units retrieved from ARC collections used for research purposes were thawed in a temperature-controlled water bath, divided aliquots and placed into multiple tubes, and refrozen. Aliquots tested by Hemagen, Wiener, and InBios assays at UCSF were thawed and refrozen only once. For the current analysis, the Ortho ELISA was rerun on all 800 specimens in 2019. Aliquots used for current Ortho testing were thawed and refrozen twice.
Ortho ELISA testing for this study was conducted at Innovative Blood Resources, Minneapolis, MN, using the fully automated Ortho Summit system (24). The Ortho ELISA has FDA approval for blood donation screening and clearance for diagnostic purposes but is not yet marketed for the latter use. For the Ortho ELISA, signal-to-cutoff (S/CO) ratios of 1.00 or greater are considered representative of reactive results; in the blood donation screening algorithm, all reactive units are retested two more times. A blood donation is considered repeat reactive if at least 2 of 3 sample results have an S/CO ratio greater than 1.00.
Hemagen ELISA, Wiener ELISA, and InBios rapid tests were conducted at UCSF. Plasma samples were thawed at 4˚C and spun at 2,300 relative centrifugal force for 10 min to pellet any precipitate. Samples were divided into aliquots and placed at randomly assigned positions in 96-deep-well plates to blind readers performing the InBios rapid test. Plasma aliquots of 10 l (Hemagen and Wiener) or 5 l (InBios) were tested and interpreted in accordance with package inserts using the kit reagents, a ELx405 Select microplate washer (BioTek, Winooski, VT), and a SpectraMax Plus 384 microplate reader (Molecular Devices, San Jose, CA). The InBios package insert defines any visible test line as representative of a positive result. For quantification of the results this assay, a set of 7 quality control samples was used to construct a semiquantitative scale ranging from 0 (negative) to 6 (strongly positive) (see Fig. S1 in the supplemental material). InBios test results were scored by two independent readers blind to other assay results. The only deviation from package insert protocols was the use of plasma for Hemagen tests; the manufacturer recommends use of serum only.
Data analysis. We conducted three analyses to assess diagnostic test performance. Two analyses compared assay results to different reference standards: classification in prior BD testing (8) and a consensus classification based on positive results by two or more diagnostic assays in the current study. For InBios testing, reader 1 scores were used for performance calculations, and reader 2 scores were used to calculate interreader agreement statistics. The Hemagen and Wiener kits both include an indeterminate zone; results that fell in this zone were included as positive in the performance analyses, because they would necessitate confirmatory testing in real-world scenarios. This definition may overestimate the sensitivity and/or specificity of these two tests (depending on whether the gray-zone results predominantly correspond to seropositive or seronegative specimens). Exact binomial 95% confidence intervals (CI) were calculated for each of the performance parameters. Analyses were conducted in SAS 9.4 and R version 3.5.2.
The third performance assessment consisted of a latent class analysis (LCA). LCA comprises a group of mathematical modeling techniques developed to evaluate diagnostic tests in the absence of a true gold standard (25)(26)(27)(28). We assumed two latent classes and conditional independence of test outcomes. We used bootstrapping to generate multiple samples from the data set and then applied an expectationmaximization (EM) algorithm to estimate sensitivity and specificity for each test. The distributions of the bootstrapped samples were used to generate 95% CIs. We tested the robustness of the two-class assumption by comparing fit between models assuming two versus three latent classes, using the Akaike information criterion (AIC) and Bayesian information criterion (BIC). The latent class analysis was conducted in R version 3.5.2 and RStudio version 1.1.463 using the BayesLCA package (29).

RESULTS
California and the southeastern states accounted for nearly three-quarters of the blood donations included in the study (Table 1). BD-positive specimens were significantly more likely than BD-negative ones to be from donors who identified themselves as Hispanic. Among 282 positive donors with country-of-birth data, 33% were from Mexico, 31% from Central America, and 26% from South America. Approximately 10% of donors with country-of-birth data were born in the United States, but the sources of their infections likely represented a mixture (congenital, travel, or locally acquired); this group of donations was not included in the analyses that were based on birth country.
The three analyses (BD status, consensus, and LCA) yielded similar results, with a Seronegative specimens were frequency matched to seropositive specimens by donation region. b Blood donors with positive test results were significantly more likely to report Hispanic ethnicity (P Ͻ 0.0001). c Data were available for 282 blood donors identified as seropositive in blood donation testing; no data were available for 218 seropositive and 300 seronegative specimens. overlapping 95% CIs for each parameter across analyses of the same test ( Table 2). The highest sensitivity estimates resulted from the LCA and the lowest from the BD comparisons; the reverse trend was seen for specificity. The 2-class LCA showed better fit than a 3-class analysis both by AIC (Ϫ2,059.089 versus Ϫ2,027.283) and BIC (Ϫ2,101.25 versus Ϫ2,092.867).
In all three analyses, InBios CDP had the highest sensitivity (97% to 99%) but the lowest specificity (88% to 92%). Reader agreement on InBios scores was high (weighted kappa ϭ 0.9315; 95% CI, 0.9209 to 0.9420). Agreement on determination of positive (scores 1 to 6) versus negative (score 0) results was higher than 99% (795/800 [99.4%]; kappa ϭ 0.9865; 95% CI, 0.9746 to 0.9983). There were only five discordant results: two specimens positive by reader 1 and negative by reader 2 and three specimens with the converse outcome. The majority of apparent false-positive InBios results had intensity scores of 1 (87% for BD, 83% for consensus analysis). Hemagen displayed the lowest sensitivity (88% to 92%) but high specificity (99% to 100%). Eleven specimens had Hemagen readings in the indeterminate zone; all were BD positive. Sensitivity for the Wiener ELISA ranged from 94% to 97%, with specificity ranging from 97% to 99%. Six specimens, including four BD-positive and two BD-negative specimens, had indeterminate results by Wiener. Of the 500 specimens classified as confirmed positive in BD testing, those with negative results by current assays (apparent false negatives) had significantly lower median Ortho S/CO values in prior BD testing than those with positive results in current testing (apparent true positives) (see Fig. S2 in the supplemental material).
Ortho ELISA sensitivity ranged from 92% to 97% in the current analysis, with specificity of 99% to 100%. Of 500 BD-positive specimens, 489 had positive Ortho results in BD testing; 11 specimens were positive by Abbott PRISM and a supplemental test (RIPA and/or Abbott ESA) but negative by Ortho in BD testing. Four of the 11 previously Ortho-negative specimens had positive results in the current Ortho testing, but 31 previously Ortho-positive specimens had negative results. Current Ortho S/CO values were 15.9% (median) lower than in BD testing (P Ͻ 0.001). Specimens corresponding to earlier collected donations showed a smaller decline in S/CO values than more recent ones (Y ϭ 0.007334 * X Ϫ 1.534; R 2 ϭ 0.05758; P Ͻ 0.001 [linear regression analysis of percent decline in S/CO versus specimen age in months]).
Finally, we stratified results by region of birth to explore geographic variation in test sensitivity (Table 3). Compared to BD or consensus status, sensitivity for Ortho, Wiener, and Hemagen tended to be lowest in specimens from those born in Mexico and highest in those from South America, with Central American specimens showing intermediate results. Analyses of antibody reactivity were consistent with these results, with the lowest reactivity seen in the specimens from Mexico (Fig. 1).

DISCUSSION
Our data provide initial evidence for an appropriate diagnostic algorithm for Chagas disease in the United States. The direct comparison of the four FDA-cleared tests demonstrated a range of sensitivity and specificity estimates across tests as well as consistent variation in sensitivity by country of origin. On the basis of these findings, we can develop preliminary guidance for optimal use of these tests, anticipate associated challenges, and identify where improvements are needed.
In common with recommendations for syphilis and early algorithms for HIV (30,31), definitive diagnosis of chronic T. cruzi infection requires positive results by two distinct tests (3,4). This algorithm was developed to address issues of both sensitivity and specificity. Simultaneous use of two tests optimizes both parameters and may be cost-effective in high-prevalence settings. However, when low prevalence is anticipated, universal testing by two assays is impractical. Most programs will use one test as a screen and analyze only the screen positives by the second assay. In these circumstances, the order is crucial; a high-sensitivity screening test is essential to minimize the risk of missing true infections (Fig. 2). At the same time, if specificity is not high, an assay will result in many false positives, potentially undermining confidence in testing. For example, in a setting of 1.5% prevalence (32), any specificity lower than 98.5% will result in more false-positive than true-positive results. a Data represent blood donors born in El Salvador (n ϭ 67), Guatemala (n ϭ 10), Honduras (n ϭ 7), Costa Rica (n ϭ 1), Nicaragua (n ϭ 1), or an unspecified location in Central America (n ϭ 2). b Data represent donors born in Bolivia (n ϭ 32), Argentina (n ϭ 13), Chile (n ϭ 5), Paraguay (n ϭ 2), Uruguay (n ϭ 1), Brazil (n ϭ 6), Colombia (n ϭ 9), Ecuador (n ϭ 2), or an unspecified location in South America (n ϭ 3).
No single test had optimal performance characteristics in our data, despite the high sensitivity and specificity figures reported in their FDA 510(k) clearance applications and package inserts (16,17,33,34). In part, this may be attributable to the differences in performance in a setting closer to "real world" diagnostic testing versus the more controlled setting of a clinical trial. However, a major issue with respect to the available data is that many of the current diagnostic tests were developed using specimen sets from the Southern Cone, where discrete typing units (DTUs) TcII, TcV, and TcVI are most prevalent (17,33,34). Only the Ortho evaluations reported results in specimens from at-risk populations in Mexico, Guatemala, and the United States during test development (16,24). Published data confirm high rates of discordance and false-negative results by other assays in Mexico (21,23), and the lower antibody reactivity seen in our data poses a challenge to achieving adequate sensitivity. Given the high proportion of U.S. T. cruzi infections with Mexican origins, investigating and addressing the underlying cause of this phenomenon will be central to the effort to improve diagnostic test performance in the United States. TcI, the predominant T. cruzi DTU in Mexico, is widely distributed throughout the Americas (35). TcI also predominates in human infections in northern South America and Central America (36). Thus, the low reactivity in Central America compared to South America may be linked to differences between TcI and TcII, TcV, and TcVI, but the markedly lower reactivity in Mexican specimens was not solely a result of TcI predominance. Poorly understood strain differences within the TcI DTU may also be responsible for the observed geographic variability in immune response (20,22).
On the basis of the performances reflected in our data, the Wiener Recombinante 3.0 and Ortho ELISAs showed the best balance of sensitivity and specificity, but both had suboptimal sensitivity in Mexican specimens. The InBios rapid test had the best sensitivity, with high sensitivity even in Mexican specimens, but its low specificity would result in a substantial number of false positives requiring confirmatory testing. The low sensitivity of Hemagen, especially in Mexican specimens, raises the risk of false negatives and concerns for its use as a screening test. In all cases, discordant results between screening and confirmatory testing should prompt the use of a third test as a tie-breaker, such as the IgG trypomastigote excreted-secreted antigen (TESA) blot or the Abbott ESA, the latter having received FDA licensure for confirmatory use in the blood donor screening algorithm. The use of surplus blood donation specimens has both limitations and advantages. Blood donor populations are not representative of the general U.S. population; donors are younger and healthier than the population at large, and although the rate of donation by Hispanics has increased markedly over the past decade, this group remains underrepresented (37,38). However, given the design of the study, these differences should not affect the validity of the test performance estimates. Although three of the four tests are validated for both serum and plasma, the Hemagen package insert specifies the use of serum; we had only plasma available, which may have had an impact on our estimates for this assay. However, the other T. cruzi serology kits and other similar assays for infectious diseases have reported equivalent results between serum and plasma (39,40). The decrease in reactivity by the Ortho ELISA in current versus prior BD testing is perplexing. Length of storage was inversely related to the magnitude of the decline, making antibody degradation an unlikely explanation. The Ortho ELISA uses cultured parasite lysate as its antigen source, possibly introducing biological variability.
A critical review of diagnostic studies suggests that a double-blinded prospective cohort provides the optimal study design, because testing of positive and negative groups selected on the basis of prior test results introduces a bias toward overestimates of performance characteristics, especially if discordant specimens are excluded (41). However, prospective testing by multiple assays in a very-low-prevalence population would incur prohibitive costs. Ortho ELISA was the BD screening test for many of the specimens and was also part of the evaluation; this was unavoidable, given that the assay is approved for both applications, but raises the potential for selection bias. We attempted to minimize bias by incorporating specimens that were discordant in BD testing and specimens across the entire range of antibody responses and by using two different comparators (BD and consensus) and a latent class analysis. Our results demonstrate how performance estimates may vary depending on the comparator and analysis method. Our study design was strengthened by the large sample size, which included specimens across the spectrum of reactivity levels and infections acquired in different geographic regions, characteristics difficult to replicate in the United States in the absence of a large, well-funded multicenter study. Our results do not preclude such a study. On the contrary, additional rigorous analyses of data from robust specimen sets with broad geographic coverage are essential to better understand and improve the performance of the available tests in U.S. populations at risk of T. cruzi infection.
Conclusion. In an analysis of U.S. blood donor specimens, the InBios Chagas Detect Plus rapid test had the highest sensitivity but lowest specificity, while the Hemagen assay had the lowest sensitivity among the FDA-cleared tests. The Hemagen, Ortho, and Wiener ELISAs all had equivalently high levels of specificity. Sensitivity was lowest for the Ortho, Wiener, and Hemagen ELISAs in specimens from donors born in Mexico, intermediate for those born in Central America, and highest for those born in South America, consistent with differences in the distributions of antibody reactivity in these groups. Use of a high-sensitivity screening test, followed by a second higher-specificity test, offers the best current algorithm for diagnostic screening in the United States.
SUPPLEMENTAL FILE 1, PDF file, 2.4 MB. This study was supported by the Mundo Sano Foundation. C.B. receives partial salary support from Mundo Sano Foundation. The participation of J.D.W. was supported in part by a grant from the National Heart, Lung, and Blood Institute of the National Institutes of Health under award number R38 HL143581. The participation of C.A.B., E.L.G., and J.A.S. was supported in part by the Bill and Melinda Gates Foundation under award number OPP1017584. The funding sources had no role in the study design, collection, analysis and interpretation of the data, preparation of the manuscript, or the decision to submit for publication. This contents of this publication are solely our responsibility and do not necessarily represent the official views of their sponsors.