Molecular Diagnosis of Orthopedic-Device-Related Infection Directly from Sonication Fluid by Metagenomic Sequencing

ABSTRACT Culture of multiple periprosthetic tissue samples is the current gold standard for microbiological diagnosis of prosthetic joint infections (PJI). Additional diagnostic information may be obtained through culture of sonication fluid from explants. However, current techniques can have relatively low sensitivity, with prior antimicrobial therapy and infection by fastidious organisms influencing results. We assessed if metagenomic sequencing of total DNA extracts obtained direct from sonication fluid can provide an alternative rapid and sensitive tool for diagnosis of PJI. We compared metagenomic sequencing with standard aerobic and anaerobic culture in 97 sonication fluid samples from prosthetic joint and other orthopedic device infections. Reads from Illumina MiSeq sequencing were taxonomically classified using Kraken. Using 50 derivation samples, we determined optimal thresholds for the number and proportion of bacterial reads required to identify an infection and confirmed our findings in 47 independent validation samples. Compared to results from sonication fluid culture, the species-level sensitivity of metagenomic sequencing was 61/69 (88%; 95% confidence interval [CI], 77 to 94%; for derivation samples 35/38 [92%; 95% CI, 79 to 98%]; for validation samples, 26/31 [84%; 95% CI, 66 to 95%]), and genus-level sensitivity was 64/69 (93%; 95% CI, 84 to 98%). Species-level specificity, adjusting for plausible fastidious causes of infection, species found in concurrently obtained tissue samples, and prior antibiotics, was 85/97 (88%; 95% CI, 79 to 93%; for derivation samples, 43/50 [86%; 95% CI, 73 to 94%]; for validation samples, 42/47 [89%; 95% CI, 77 to 96%]). High levels of human DNA contamination were seen despite the use of laboratory methods to remove it. Rigorous laboratory good practice was required to minimize bacterial DNA contamination. We demonstrate that metagenomic sequencing can provide accurate diagnostic information in PJI. Our findings, combined with the increasing availability of portable, random-access sequencing technology, offer the potential to translate metagenomic sequencing into a rapid diagnostic tool in PJI.

of arthroplasties performed worldwide, PJI are a significant health care burden and cause of expense. For individual patients, PJI often require multiple surgeries, intensive, long-term antimicrobial therapy, and a prolonged period of rehabilitation. Fast, accurate, and reliable diagnosis of PJI is necessary to inform treatment choices, particularly for antibiotic-resistant organisms. Culture of multiple periprosthetic tissue (PPT) samples remains the gold standard for microbial detection (2)(3)(4). However, culture can be relatively insensitive, with only 65% of causative bacteria detected in infections even when multiple PPT samples are collected (2,5). Infections with fastidious organisms or infections in a patient who has received prior antimicrobial treatment are often culture negative.
Culture of sonication fluid from explanted prostheses may improve microbiological yield in PJI by disrupting the bacterial biofilm. Since sonication was first applied to explanted hip prostheses in 1998 (6), several clinical studies have reported the improved sensitivity of sonication fluid culture over PPT culture for the diagnosis of hip, knee, and shoulder PJI (7,8), and sonication has been adopted by many centers, either alone or in combination with PPT culture. Additionally, several molecular assays have been investigated to improve the sensitivity of PJI diagnosis. PCR assays using DNA extracted from sonication fluid (9,11,12,40) have reported sensitivities ranging from 70% to 96%. However, this approach can identify only pathogens in a predefined multiplex panel and thus may miss atypical or rare pathogens not targeted in the assay design. Other studies identify pathogens by amplification and sequencing of the universal bacterial 16S rRNA gene (10,13,14). A drawback of these methods is the potential for generating false-positive results from contaminating bacterial DNA.
The potential of high-throughput sequencing as a diagnostic tool for infectious diseases is widely recognized (15)(16)(17). Metagenomic sequencing offers the possibility to detect all DNA in a clinical sample, which can then be compared to reference genome databases to identify pathogens. Additionally, a profile of common laboratory and kit contaminants can be generated from negative controls sequenced concurrently, and this information can be taken into account (18,19). In addition to diagnostic data, whole-genome sequencing can also simultaneously provide characterization of infection outbreaks (20,21), track transmission (22)(23)(24), and predict antimicrobial resistance (25)(26)(27)(28). At present, most whole-genome sequencing studies rely on sequencing DNA extracted from a cultured isolate, and extending these approaches to metagenomic sequencing data is an active area of research. An advantage offered by sequencing is the speed at which it can deliver genetic information (29) compared to that of traditional microbiological culture and antimicrobial susceptibility testing, which can take days to weeks depending on the pathogen. By removing a culture step and sequencing directly from clinical samples, the time taken to diagnosis can be reduced further (30), and pathogens not identified by conventional methods can be detected (31)(32)(33). Here, we investigated if metagenomic sequencing of total DNA extracts obtained directly from sonication fluid can provide an alternative rapid and sensitive tool for diagnosis of PJI, without the need for a culture step. isolated from 22% of sonication fluids and 29% PPTs, and Staphylococcus epidermidis, isolated from 13% of sonication fluids and 25% of PPTs, were the two most frequently cultured species (Table 1).
The 97 sonication fluid samples passing sequencing quality-control checks were obtained predominantly from knee (42/97, or 43%) and hip (32, or 33%) PJI, with other samples from ankle (6, or 6%) and shoulder (3, or 3%) PJI and other orthopedic device infections (14, or 14%) ( Table 1). The median sonication fluid volume was 200 ml (IQR, 100 to 400 ml; range, 15 to 400 ml) (see Table S1 in the supplemental material). On culture, 35 (36%) sonication fluid samples had no growth or less than 50 CFU of an organism not considered to be highly pathogenic (skin and oral flora), 55 (57%) samples had a single organism isolated, and 7 (7%) samples had two organisms isolated. Greater than 10 6 reads were achieved in 91/97 (94%) samples. Taxonomic classification by Kraken identified a median of 0.07% (IQR, 0.01 to 0.41%; range, Ͻ0.01% to 24.0%) of reads as bacterial, with Ͻ1% of bacterial reads in 84/97 (87%) samples. Human reads accounted for Ͼ90% of reads in 94/97 (97%) of samples. Six test samples were processed with and without the NEBNext microbiome DNA enrichment kit. Use of the kit did not reduce the amount of human DNA sequenced. The mean proportion of reads classified as human was 98.4% with the enrichment kit and 98.2% without it (P ϭ 0.06) (Table S2).
Optimal thresholds for determining if samples contained low-level contamination or true infection were determined by numerical optimization, choosing thresholds that maximized the sensitivity and specificity of sequencing (Fig. 2). The final thresholds chosen to determine the presence of true infection were Ն1,150 reads from a single species or Ն125 reads from a single species if Ն15% of the total bacterial reads also belonged to that same species.
Samples extracted and sequenced as replicates showed good reproducibility.
in triplicate. One of the three replicates (176a) had an apparent contaminating species identified (also not found in sonication fluid or PPT culture). Table 2 compares sonication culture results with metagenomic sequencing findings, applying our sequencing data thresholds. PPT culture results and the consensus microbiology diagnosis based on both sonication and PPT samples are also given for comparison. Compared to sonication fluid culture, metagenomic sequencing had an overall species-level sensitivity of 61/69 (88%; 95% CI, 77 to 94%). Sensitivity was 35/38 (92%; 95% CI, 79 to 98%) in the derivation samples and 26/31 (84%; 95% CI, 66 to 95%) in the validation samples. Three samples were identified to the genus level only. Hence, overall genus-level sensitivity was 64/69 (93%; 95% CI, 84 to 98%). Of the other five samples where the species cultured was not identified on sequencing, two samples cultured a coagulase-negative Staphylococcus not identified on tissue culture, one sample was polymicrobial (where several species found in sonication fluid or tissue were identified, but not all), and the remaining two samples were negative for a pathogen found in sonication fluid and tissue.
Overall species-level specificity was 78/97 (80%; 95% CI, 71 to 88%). However, of 19 samples where additional species were identified on sequencing compared to results with sonication culture, three (samples 400, 414, and 502) had the same species found in tissue culture but not in sonication fluid (or the level was Ͻ50 CFU).

FIG 2
Sequencing data filtering calibration heat maps. Two thresholds (threshold 1 and threshold 2) and three parameters (parameter a, parameter b, and parameter c) were used to determine true infection. Samples meeting either threshold were determined to be true infection. The final parameter values were chosen by maximizing the Youden index, calculated as follows: (sensitivity ϩ specificity) Ϫ 1. For threshold 1, samples with more reads from a given species than an upper-read cutoff (parameter a; plotted on each x axis) were included. For threshold 2, samples with more species-specific reads than a lower-read cutoff (parameter b; the six panels show six different values for parameter b: 50, 100, 125, 150, 200, and 250, which are indicated within each y-axis title) and with the percentage of species-specific reads as a proportion of all bacterial reads present above a percentage cutoff (parameter c, plotted on each y axis) were included.  wise identified. In some cases these were clearly laboratory contaminants, e.g., sample 219 contained Achromobacter xylosoxidans reads, and an A. xylosoxidans culturepositive sample was sequenced in the same batch from a concurrent study. Notably P. acnes was a common contaminant occurring in 7/97 (7%) samples overall. Adjusting for plausible fastidious causes of infection, species found in concurrently obtained PPT samples, and prior antibiotics, i.e., assuming these samples were actually genuinely positive for the species found on sequencing, species-level specificity was 85/97 (88%; 95% CI, 79 to 93%) overall, 43/50 (86%; 95% CI, 73 to 94%) in the derivation samples, and 42/47 (89%; 95% CI, 77 to 96%) in the validation samples. Figure 3 shows the relationship between the proportion of sequence reads obtained that were classified as bacterial, the sonication fluid culture CFU counts, and the concordance between sonication fluid culture and sequencing. Sequencing falsepositive results were more likely when cultures were negative.
More simplistic thresholds based on a single cutoff for determining true infection performed less well. Within the derivation samples, using a single cutoff for the proportion of bacterial reads from a given species, irrespective of the absolute numbers of bacterial reads present, the optimal cutoff value was 25%. Using this threshold, species-level sensitivity was 57/69 (83%) and adjusted specificity was 80/97 (82%). Similarly, if only a single absolute read number cutoff is used, the optimal value is 410 reads from a single species, and sensitivity is 54/69 (78%) and adjusted specificity is 87/97 (90%).
Sequencing results were also compared to a consensus microbiology diagnosis based on guidelines of the Infectious Diseases Society of America (IDSA) (4), considering any species isolated twice or any virulent species isolated as a cause of infection, combining sonication and PPT culture results (Table S1). These results showed that 66/97 (68%) samples demonstrated complete agreement between the consensus species list from culture and sequencing, 14/97 (14%) samples had a partial match with at least one species found on culture also found on sequencing, 15/97 (15%) samples had none of the species cultured found on sequencing, and 2/97 (2%) samples had a plausible additional species found on sequencing not found on culture. The sensitivity of sonication fluid sequencing compared to that of combined sonication fluid and PPT culture was 67/99 (68%), and specificity was 80/97 (82%); as above, specificity, adjusting for plausible fastidious causes of infection and prior antibiotics, was 85/97 (88%).

DISCUSSION
Diagnosis of PJI by culture of sonication fluid and PPT is not always conclusive and may take up to 10 to 14 days for slow-growing organisms. Here, we assess, for the first time, the use of metagenomic sequencing of total DNA extracts obtained directly from sonication fluid in the diagnosis of PJI. We developed a novel filtering strategy to ensure that low-level contaminating DNA is successfully ignored while infections are detected accurately. Compared to sonication fluid culture, metagenomic sequencing achieved a species-level sensitivity of 88% and specificity of 88%, after adjusting for plausible fastidious causes of infection, species found in concurrently obtained PPT samples, and prior antibiotic use. Importantly we demonstrated similar performance of our method and a filtering algorithm in the subset of samples that formed an independent validation set, with sensitivity of 84% and adjusted specificity of 89%.
Sequencing failed to identify an organism cultured from sonication fluid for eight samples. For two samples, a coagulase-negative Staphylococcus was cultured but only from sonication fluid and not from tissue samples. These isolates, therefore, could plausibly have been plate contaminants and not present in the DNA sequenced. For three other samples, identification to the genus level was possible. One sample contained Staphylococcus condimenti, which was not included in our custom Kraken database, highlighting the limitation that, despite including 2,786 bacterial genomes, this approach is only as good as the database that is used. Another sample was identified as a Bacillus spp. both on culture and by sequencing, and the third was identified by sequencing as Staphylococcus spp. in the context of a mixed Staphylococcus infection. For the three remaining samples, sequencing failed to identify a pathogen found on culture.
Sequencing was also able to detect potential pathogens not identified by culture of sonication fluid. For three samples we identified additional species from sequencing that were supported by the tissue culture findings, suggesting that in some settings sequencing may be more sensitive than sonication fluid culture alone without PPT culture although this might also be explained by the additional centrifugation prior to sequencing to ensure sufficient DNA yields, which was not done prior to culture. Perhaps as expected, PPT cultures identified pathogens not found on sonication fluid culture or sonication fluid sequencing; the sensitivity of sequencing of sonication fluid compared to the consensus species found combining sonication fluid and PPT cultures was only 68%. We also identified using sequencing four examples of probable anaerobic pathogens not identified by routine anaerobic culture of sonication fluid or PPT: Fusobacterium nucleatum, Veillonella parvula, Finegoldia magna, and Parvimonas micra. It is possible that these organisms may have been cultured had fastidious anaerobe agar been used as we used Columbia blood agar (CBA) plates for anaerobic culture, as previously described (7). We were also able to identify a plausible pathogen in two patients who had received prior antibiotics where the routine microbiology was uninformative.
Controlling for contamination during sampling and culture is a major challenge in investigating PJI and underlies why using multiple independent PPT samples remains the gold standard for diagnosis. Contamination is an even greater concern in molecular diagnostic assays, including metagenomic sequencing, given the additional potential for DNA contamination. There are published reports demonstrating the potential for contamination leading to misinterpretation of sequencing data from clinical specimens (34,35). In our laboratory, samples were handled in laminar flow hoods and extracted in a dedicated pre-PCR extraction laboratory. DNA was handled in a PCR hood, and sequencing libraries were manipulated in a dedicated post-PCR sequencing laboratory. Despite these measures, we still observed contamination in some of our samples. During the derivation phase of our study, it is likely that one or more of the reagents used became contaminated with DNA from other sequencing projects in our laboratory. Although we were able to account for this in our analysis and then validate our findings in a separate set of samples having addressed this specific form of contamination, contamination remained a concern during the validation phase, as evidenced by an adjusted specificity of only 89% and by contamination of one of the negative controls leading to a batch of samples being discarded. This demonstrates that rigorous laboratory practice would be key to deploying our method. There may also be a role for sealed systems that perform DNA extraction and sequencing in a separated environment. Our experience also reinforces the requirement that negative controls are included in each sequencing batch, as is routine in molecular microbiology diagnostic assays, to ensure that contamination is detected if it does occur. A limitation of our study is that the saline used for sonication was not PCR grade, and this could be considered in future work.
Excluding the specific issue of contamination by other sequencing projects, P. acnes was the most common apparent contaminant. It affected one of the negative controls during the validation phase, and, overall, false-positive results for P. acnes were found in 7% of samples. Species-specific filtering may be required to address this; our one true-positive sample with P. acnes present on culture had Ͼ10 5 P. acnes reads. However, larger data sets are required than ours to address this definitively. In the meantime, even with molecular diagnostics, the value of multiple samples per patient remains.
Sonication fluid can be a large-volume sample, typically 50 to 400 ml. As a result, the microbial cells released from the orthopedic device during sonication are likely to be heavily diluted. This, coupled with the simultaneous release of any human cells from the prosthesis and transfer of blood along with the device, results in a sonication fluid sample that is both low in bacterial cells and high in contaminating host cells. An effective microbial DNA extraction protocol is necessary to isolate as much bacterial DNA as possible while limiting the amount of host DNA in the final extract. Our results demonstrate that despite efforts to filter out human cells or remove human DNA postextraction, host DNA accounted for Ͼ90% of reads in the majority of samples sequenced. Use of a specialist microbiome enrichment kit did not improve bacterial DNA yield. However, if the efficiency of human DNA removal can be improved in the future, this might significantly add to the precision of metagenomic sequencing as more sequencing efforts would be appropriately directed toward potential pathogens.
In addition to the issues around contamination with bacterial and human DNA, a further limitation of our study as designed is that it undertakes a laboratory-level comparison of sonication fluid culture and metagenomics sequencing. As this study was conducted as laboratory method development, we made use of information available to the microbiology laboratory only at the time of sampling and did not review patient notes, and so we were unable to compare sonication fluid sequencing to the presence of a final overall diagnosis of infection. Future studies should consider how sequencing might contribute to the overall diagnosis of PJI as part of an assessment that jointly considers clinical, histological, and microbiological data.
This study demonstrates as a proof of principle that metagenomic sequencing can be used in the culture-free diagnosis of PJI directly from sonication fluid. Improvements to the method of human DNA removal from direct samples before sequencing are ongoing, and if these are successful, this is likely to greatly improve the efficiency, and therefore accuracy, of metagenomic sequencing. Generating greater numbers of bac-terial reads directly from clinical specimens may make prediction of antimicrobial susceptibilities directly from samples possible, as has been achieved from wholegenome sequencing of cultured organisms (25)(26)(27)(28). If this can be achieved reliably and if contamination from human and other bacterial DNA can be minimized, it is possible that sequencing can offer a complete microbiology diagnosis without the need for culture. The increasing availability of portable, rapid, random-access strand sequencing technology offers the potential that in the future sequencing may become a same-day diagnostic tool. Applications of rapid sequencing in PJI might include perioperative microbiological diagnosis to guide the use of local intraoperative antimicrobials, for example, in cement or beads. Earlier diagnosis may also ensure that postoperative antimicrobials are more focused, improving antimicrobial stewardship, while treating resistant organisms effectively. Earlier diagnosis may also reduce hospital stays and therefore reduce costs. Sequencing is also likely to be helpful in situations where multiple samples containing the same commensal species are identified. Sequencing will be able to determine whether these are clonal, suggesting true infection rather than contamination, instead of having to rely on current proxies such as antimicrobial susceptibility profiles, which only imperfectly distinguish nonclonal isolates. Ultimately, same-day sequencing may significantly improve the precision, efficiency, and cost of PJI care. This study provides a foundation for further development toward this goal.

MATERIALS AND METHODS
Sample collection and processing. Intraoperative samples from the Nuffield Orthopaedic Centre (NOC) in Oxford University Hospitals (OUH), United Kingdom, between June 2013 and January 2017 were investigated. The NOC is a tertiary-level specialist musculoskeletal hospital, including a dedicated Bone Infection Unit, undertaking approximately 200 revision arthroplasties annually. A subset of samples submitted was chosen at random following culture to provide a ratio of approximately 2:1 bacterial culture-positive samples to culture-negative samples. For this study, no ethical review was required, because the study was a laboratory method development study focusing on bacterial DNA extracted from discarded samples identified only by laboratory numbers, with no personal or identifiable data. Sequencing reads identified as human on the basis of Kraken were counted and immediately permanently discarded.
Prosthetic joint implants and metalwork, received into the OUH microbiology laboratory following revision arthroplasty and operative management of other orthopedic device-related infection, were placed directly into single-use sterile polypropylene containers (Lock & Lock brand) and covered with between 10 ml and 400 ml of sterile 0.9% saline solution (Oxoid, Ltd., Basingstoke, United Kingdom) depending on the size of the prosthesis/device, with sufficient fluid to cover at least 90% of the prosthesis/device, up to a maximum of 400 ml. Sonication was performed as described previously (7) with minor modifications. Briefly, the implant was vortexed for 30 s, subjected to sonication for 1 min, followed by additional vortexing for 30 s. Sonication was performed in a Bransonic 5510 ultrasonic water bath (Branson, Danbury, CT, USA) at a frequency of 40 kHz. The resulting sonication fluid was plated in 0.1-ml aliquots onto Columbia blood agar (CBA) and chocolate agar plates (Oxoid, Ltd., Basingstoke, United Kingdom) for aerobic incubation and on CBA plates for anaerobic incubation. Aerobic incubation was performed at 35 to 37°C with 5% CO 2 for up to 5 days. Anaerobic incubation was performed at 35 to 37°C for 10 days. All cultured microorganisms were identified by matrix-assisted laser desorption ionization-time of flight (MALDI-TOF) mass spectrometry on a Microflex LT using Biotyper, version 3.1 (Bruker Daltonics, Billerica, MA, USA). Samples were considered culture positive when growth of Ն50 CFU/ml was observed and additionally when growth of a highly pathogenic organism (including Staphylococcus aureus and Enterobacteriaceae) at Ͻ50 CFU/ml was observed.
Periprosthetic tissue samples were also collected during surgery, at the start of each procedure and using different surgical instruments for each sample, and processed by the microbiology laboratory. Briefly, Bactec bottles were inoculated with 0.5 ml of an inoculum generated by vortexing each tissue sample in 3 ml of 0.9% saline with sterile Ballotini balls for 15 s. Bottles were incubated under aerobic (Plus Aerobic/F culture vials) and anaerobic (Lytic/10 Anaerobic/F culture vials) conditions in a BD Bactec FX system (BD Biosciences, Sparks, MD, USA) for up to 10 days. Any bottles that flagged positive were subcultured onto agar plates and processed as described above to determine species.
Bacterial DNA extraction from sonication fluid. Prior to DNA extraction, sonication fluids were concentrated by centrifugation. Forty milliliters of fluid was transferred to a sterile, disposable 50-ml polypropylene tube and centrifuged at 15,000 ϫ g in a Sorvall RC5C Plus centrifuge (SLA-1500 rotor with custom-made inserts) for 1 h at 16°C. Samples with a Ͻ40-ml starting volume of sonication fluid were made up to 40 ml with the same saline used for sonication. All but approximately 1 ml of the supernatant was discarded, and the pellet was resuspended in this volume of fluid before being passed through a 5-m-pore-size syringe filter to deplete the number of human cells present and, therefore, the amount of human DNA in the final extract. Bacterial cells passing through the filter were pelleted, washed with 1 ml of 0.9% saline, and resuspended in 500 l of molecular-biology-grade water before being mechanically lysed in Pathogen Lysis tubes (S) (Qiagen, Hilden, Germany) with a FastPrep 24 tissue homogenizer (MP Biomedicals, Santa Ana, CA, USA) (three times for 40 s at 6.5 m/s). DNA was extracted by ethanol precipitation, using GlycoBlue (Life Technologies, Paisley, UK) as a coprecipitant, and resuspended in 50 l of 1ϫ Tris-EDTA (TE) buffer. DNA was purified using AMPure XP solid-phase reversible immobilization (SPRI) beads (Beckman Coulter, High Wycombe, United Kingdom) and eluted in 26 l of TE buffer. DNA concentration was measured using a Qubit 2.0 fluorometer (Life Technologies, Paisley, United Kingdom). A subset of samples was treated with an NEBNext microbiome DNA enrichment kit (New England BioLabs, Ipswich, MA, USA) for human DNA removal before an additional purification step using AMPure XP SPRI beads and final elution in 15 l of TE buffer. Samples were extracted in batches, with a negative control of sterile 0.9% saline prepared alongside each batch using this same protocol.
Library preparation and Illumina MiSeq sequencing. DNA extracts quantified as Ն0.2 ng/l were sequenced on a MiSeq desktop sequencer (Illumina, San Diego, CA, USA). Libraries were prepared as previously described, using a variation of the Illumina Nextera XT protocol (36). Briefly, 1 ng of DNA was prepared for sequencing following the Illumina Nextera XT protocol, with the modification of 15 cycles during the index PCR. Libraries were quantified using a Qubit 2.0 fluorometer, and their average sizes were determined with an Agilent 2200 TapeStation (Agilent Technologies, Santa Clara, CA, USA) before being manually normalized. Libraries were prepared and sequenced together in the same batch. Paired-end sequencing was performed using a 600-cycle MiSeq reagent kit (version 3), and samples were sequenced in batches of between 1 and 13 on a single flow cell.
Bioinformatics analysis. Raw sequencing reads were adapter trimmed using BBDuk (https:// sourceforge.net/projects/bbmap/) and the adapter sequence file provided within the BBMap package; the following parameters were used: minlength, 36; k,19; ktrim, r; hdist, 1; mink, 12. Taxonomic classification of trimmed reads was performed using Kraken (37) and a bespoke database constructed from all bacterial genomes deposited in the NCBI RefSeq database as of January 2015 (updated January 2017 for the validation set; see below), with default parameters and no k-mer removals. Where no RefSeq genome was available for an organism cultured from a PJI at OUH since June 2013, available wholegenome assemblies were also added to the database where available in NCBI. Additionally, the Genome Reference Consortium Human genome build 38 (GRCh38) was included in the database to allow detection of host DNA. An optimum filtration threshold, using a Kraken filter that balanced false-positive removal and sensitivity, was determined using simulated data sets of reference genomes. Reference genomes representative of common pathogenic species were used to generate simulated Illumina MiSeq data sets and analyzed with Kraken using different filtration thresholds. A threshold value of 0.15 provided optimum read classification sensitivity while minimizing spurious results. Kraken output was visualized using Krona (38).
Statistical analysis. The performance of metagenomic sequencing was assessed by comparing the species identified from sequencing data with the species isolated from sonication fluid samples considered culture positive (i.e., Ն50 CFU/ml or growth of a highly pathogenic organism at Ͻ50 CFU/ml). In order to correct for samples which may contain small numbers of contaminating and nonspecific bacterial reads, a threshold was determined to identify the presence of true infection, using the first 50 samples sequenced as a derivation set. Two thresholds (1 and 2), and three parameters (a to c), were used to determine true infection: (i) samples with more reads from a given species than an upper-read cutoff (a) were included; (ii) samples with more species-specific reads than a lower-read cutoff (b) and with the percentage of species-specific reads as a proportion of all bacterial reads present above a percentage cutoff (c) were also included. Parameter values were selected by numerical optimization, using R, version 3.3.2, comparing sequencing results to sonication fluid culture results and maximizing the value of the Youden index (39) (sensitivity ϩ specificity Ϫ 1). Sensitivity was calculated taking each species identified from each culture-positive sonication sample as a separate data point; thus, culture-negative samples did not contribute to the denominator, culture-positive samples with a single species contributed once, and culture-positive samples with two species contributed twice. Specificity was calculated using the total number of sonication samples as the denominator; as such samples contaminated by more than one species were counted as one false positive.
To ensure that read cutoff parameters were chosen without a penalty for potentially difficult to culture anaerobic species, the specificity value optimized was adjusted. Potential false-positive sequencing results with plausible fastidious anaerobic causes of infection (including Fusobacterium nucleatum, Propionibacterium acnes, and Veillonella parvula) in culture-negative samples were excluded when the specificity value used for parameter optimization was calculated.
Where bacterial reads were detected over the thresholds described above in a negative control, that sample was deemed to be contaminated. In the derivation set, in order to maximize the number of sequences available for analysis, only samples with evidence of the same contaminating organisms were excluded from each contaminated batch, rather than discarding the whole batch. During the derivation phase of the study, several batches of samples were found to be contaminated with DNA from other studies performed concurrently in the same research laboratory. Six of eight saline negative-control extracts displayed contamination with a single or multiple species at read numbers exceeding the determined diagnostic thresholds. All samples within these batches that displayed similar contamination levels were excluded from subsequent analysis if Kraken classification resulted in Ͼ100 reads corresponding to the majority of the contaminating species. A total of 22 samples (in addition to the 50 successfully sequenced) were excluded on this basis (Fig. 1). In batches 4 and 5 the negative controls were contaminated with Staphylococcus aureus, Escherichia coli, and P. acnes, and 15 samples were excluded with Ͼ100 reads from Ն2/3 species; in batch 6 the negative control was contaminated with Serratia marcescens, Klebsiella pneumoniae, E. coli, and P. acnes, and 2 samples with Ͼ100 reads from Ն3/4 species were excluded; in batches 2, 9, and 10 the negative control was contaminated with P. acnes, and 5 samples were excluded with Ͼ100 P. acnes reads. To address this issue, prior to the validation phase of the study, all pipettes, laminar flow and PCR hoods, and laboratory benches used for DNA extraction and library preparation were deep-cleaned with Virkon disinfectant and RNase Away surface decontaminant (Thermo Fisher Scientific, Waltham, MA, USA) in order to remove any possible sources of microbial or DNA contamination. All DNA extraction and library preparation reagents were replaced and used in preprepared per-batch aliquots used exclusively for this study. Sonication fluid samples were handled one at a time in the laminar flow hood, which was cleaned as above between each sample. Fresh gloves were worn each time a new sample was handled during the DNA extraction phase of the protocol. Having implemented these changes, for the validation phase, a more stringent quality control standard was applied, requiring the negative control to be contamination free for any of the samples in a batch to be analyzed.
Technical replicates. To ensure sequencing reproducibility, one DNA sample was sequenced twice, and biological replicates (DNA extraction process repeated) were sequenced for six samples (four in duplicate and two in triplicate).