Viruses in Vietnamese Patients Presenting with Community-Acquired Sepsis of Unknown Cause

Community-acquired (CA) sepsis is a major public health problem worldwide, yet the etiology remains unknown for >50% of the patients. Here we applied metagenomic next-generation sequencing (mNGS) to characterize the human virome in 492 clinical samples (384 sera, 92 pooled nasal and throat swabs, 10 stools, and 6 cerebrospinal fluid samples) from 386 patients (213 adults and 173 children) presenting with CA sepsis who were recruited from 6 hospitals across Vietnam between 2013 and 2015.

a recent etiological study of 1,578 patients with CA sepsis, conducted by the Southeast Asia Infectious Disease Clinical Research Network, the etiology (viruses, bacteria, and parasites) was established for only 48% (3). While this diagnostic yield is comparable to that of previous reports, the unknown etiology for Ͼ50% of the patients may be attributed to the low sensitivity of current diagnostic tests and/or the diversity of the causative agents that may be responsible for this important clinical condition. Furthermore, Southeast Asia is one of the major hot spots for the emergence of novel pathogens, as illustrated by the emergence of Nipah virus, severe acute respiratory syndrome (SARS) coronavirus, avian influenza virus A (H5N1), avian influenza virus A (H7N9), enterovirus A71 (EV-A71), and, more recently, Zika virus (4,5).
Improving our knowledge about the causative agents of CA sepsis can inform clinical management, while active surveillance for novel pathogens in this region is of public health significance. In this study, we use mNGS to characterize the viral contents of clinical samples collected from patients enrolled in an etiological study of sepsis of unknown etiology across Southeast Asia between 2013 and 2015 (3).

MATERIALS AND METHODS
Clinical specimens and patient data. The clinical specimens and patient data used for mNGS analysis were derived from an etiological study of CA sepsis conducted at multiple hospitals across Indonesia (n ϭ 3), Thailand (n ϭ 4), and Vietnam (n ϭ 6) between 2013 and 2015 (3). Hospitalized patients with suspected or documented CA infections, fulfilling the diagnostic criteria for sepsis of the 2012 Surviving Sepsis Campaign (adults) (15) or the definitions of the Pediatric Sepsis Consensus Conference (16), were enrolled within 24 h of admission (3). A total of 1,582 patients were enrolled (750 each from Vietnam and Thailand; 82 from Indonesia) (Fig. 1). Per the study protocol, serum samples were collected from all patients; additional samples, including pooled nasal and throat swabs, cerebrospinal fluid (CSF), and stools, were collected when clinically indicated. After collection, all clinical samples were stored at -80°C. Additionally, information about the demographics, clinical entities, and outcomes of the patients was retrieved from a publicly available data set of the original study that was deposited at https://figshare.com/articles/Data_set_-_Causes_and_outcomes_of_sepsis_in_southeast_Asia_a_ multinational_multicentre_cross-sectional_study_NCT02157259_/3486866/1.
Of 749 patients from Vietnam, 402 (54%) had no etiology identified via extensive clinical and reference laboratory workups in the original study ( Fig. 1; see also Table S1 in the supplemental material); of these, 386 (96%) had clinical materials available for additional etiological investigation and were thus included for viral metagenomic analysis in this study ( Fig. 1) (3). In total, 492 samples (6 CSF samples, 92 pooled nasal and throat swabs, 384 serum samples, and 10 stool samples) from these 386 patients with sepsis of unknown etiology were included in the analysis. Due to the availability of the materials, most samples were analyzed individually (n ϭ 458) or in pools of multiple samples (n ϭ 8) (Fig. 2).
Sample pretreatments and NA isolation. Prior to nucleic acid (NA) isolation, 100 l of clinical sample was treated with 2 U/l of Turbo DNase (Ambion, Life Technology, Carlsbad, CA, USA) and 0.4 U/l RNase I (Ambion) at 37°C for 30 min. Viral NA was then isolated from nuclease-treated materials using a QIAamp viral RNA kit (Qiagen GmbH, Hilden, Germany) and was recovered in 50 l of elution buffer.
dsDNA synthesis and sequencing. Double-stranded DNA (dsDNA) was synthesized from isolated viral NA using a set of 96 nonribosomal random primers (17), amplified by PCR, and sequenced on an Illumina MiSeq platform (Illumina, San Diego, CA, USA) as described previously (18,19). In brief, 10 l of extracted viral NA was converted to dsDNA using FR26RV-Endoh primers (19), SuperScript III enzyme (Invitrogen, Carlsbad, CA, USA), RNaseOUT (Invitrogen), exo-Klenow fragment (Ambion, Life Technology, Carlsbad, CA, USA), and RNase H (Ambion). Subsequently, the synthesized dsDNA was randomly amplified using the FR20RV primer (5=-GCCGGAGCTCTGCAGATATC-3=). The random PCR product obtained was then purified with the use of Agencourt AMPure XP beads (Beckman Coulter) and was quantified with a Qubit dsDNA HS (high-sensitivity) kit (Invitrogen). Finally, 1 ng of purified product was subjected to library preparation using a Nextera XT sample preparation kit (Illumina) and was sequenced using a MiSeq reagent kit, v3 (600 cycles) (Illumina), on a MiSeq platform (Illumina). mNGS data analysis. The mNGS data were analyzed using an in-house viral metagenomic pipeline running on a 36-node Linux cluster to identify the presence of viral sequences in the tested specimens as described previously (20). In brief, after duplicate reads and reads belonging to human or bacterial genomes were filtered out, the remaining reads were assembled de novo. The resulting contigs and singlet reads were then aligned against a customized viral proteome database using a BLAST (Basic Local Alignment Search Tool)-based approach. Next, the candidate viral reads were aligned against a nonre-dundant nonvirus protein database to remove any false-positive reads (i.e., reads with expected [E] values higher than those against viral protein databases). Any virus-like sequence with an E value of Յ10 Ϫ5 was considered a significant hit. Finally, a reference-based mapping approach was employed to assess the levels of identity and genome coverages of the corresponding viruses.
PCR confirmation of viral reads. Because of the focus of the present study, specific PCRs were used to confirm the mNGS hits for viral species that are known to be infectious to humans and for recently discovered viruses that have been reported in human tissues previously but remain of uncertain tropism. Depending on the availability of the clinical materials, virus-specific PCRs were carried out either on leftover NA after mNGS experiments or on newly extracted NA. An mNGS result was considered positive only if it was subsequently confirmed by a corresponding viral PCR analysis of original NA materials derived from the corresponding individual samples. All PCR primers and probes used were either derived from previous publications or newly designed based on the sequences generated by mNGS (see Table  S2 in the supplemental material).
Phylogenetic analysis. Sequence alignment and phylogenetic tree reconstructions of the sequences obtained were carried out using ClustalW alignment and maximum likelihood methods available within Geneious 8.1.5 (Biomatters) and IQ-TREE (21), respectively.
Ethical statement. The study was reviewed and approved by the Institutional Review Boards of collaborating hospitals in Vietnam and the Oxford Tropical Research Ethics Committee (OxTREC), University of Oxford, Oxford, United Kingdom.
Accession number(s). The metagenomics data obtained in this study have been deposited in GenBank, and the accession numbers can be found via BioProject accession number PRJNA526981.

RESULTS
Demographics, clinical features, and outcomes for patients with sepsis of unknown origin. The baseline characteristics and 28-day mortality data of all patients (including the 386 patients included in the mNGS analysis) from Vietnamese sites enrolled in the original study are presented in Table 1. Retrospectively, 129 (34.4%) adult patients (including 54 of the 213 with undiagnosed cases [25%]) had SOFA (Sequential Organ Failure Assessment) scores of Ն2, fulfilling the diagnostic criteria presently used for sepsis in adults as defined by Sepsis-3 (22). For pediatric sepsis, no harmonized criteria similar to those for sepsis in adults have been established (23).  Table S1 in the supplemental material for more details; # , the causative agents detected are detailed in the report of the original study (3); $ , more details about the analysis of those 386 patients can be found in Fig. 2.
There was considerable homogeneity between the group of patients included and the group not included in the mNGS analysis (Table 1). Among the 386 patients with sepsis of unknown cause whose data were included in the mNGS analysis, the most frequent clinical entity was acute respiratory infection (n ϭ 158 [41%]), followed by systemic infection (n ϭ 152 [39.5%]), diarrhea (n ϭ 36 [9.3%]), and central nervous system (CNS) infection (n ϭ 40 [10.5%]) ( Table 1) (3). Ten of these patients (8 adults and 2 children) were recorded as deceased by day 28, accounting for 2.6% of total patients.
Overview of virus-like sequences detected by mNGS. In total, 466 samples were sequenced in five MiSeq runs, generating a total of Ͼ26 million reads (median reads per sample, 432,682; range, 540 to 1,916,732) (see Fig. S1 in the supplemental material). Despite the inclusion of a nuclease digestion step prior to NA isolation, viral reads accounted for only a small proportion of total reads, ranging from 168,028 (2.5%) to 287,307 (8.4%) reads/run. Evidence of sequences related to 47 viral species belonging to 21 families was detected in 358/386 (93%) patients. The viruses detected included those known to cause human infections, those with unknown pathogenicity, and viruses that have been reported previously to be contaminants found in mNGS data sets or that have not been reported in human samples, as detailed below. Additionally, codetection of Ն2 viruses in the same samples/patients was recorded for 13 patients (see Table S3). None of the 10 fatal cases had a viral etiology identified by mNGS.
(i) Detection of viruses known to cause human infections. NA sequences of 21 viral species known to be infectious to humans were detected in 137 of 466 (29%) clinical samples from 125 of 386 (32%) individuals by viral metagenomics. The detection rate was reduced to 13.4% (52/386) of the 386 patients included in the mNGS analysis after specific PCR confirmation. There was a significant difference in the number of viral reads generated by mNGS between the groups of samples that were subsequently found to be PCR positive or negative (see Fig. S2 (Fig. 3). Detailed information about the numbers of viral reads and genome coverage is summarized in Table S7 in the supplemental material.
(ii) Detection of sequences related to viruses with unknown pathogenicity. Sequences related to four recently discovered viruses (gemycircularviruses, WU polyomavirus, human pegivirus 2 [HPgV-2], and cyclovirus-VN) whose pathogenicity or tropism remains unknown, but whose genetic materials have been reported in human samples previously, were identified by mNGS in 3.4% of the samples from the 386  Because these viruses are common nonpathogenic infectious agents, they were not subjected to subsequent PCR confirmation testing. [24] and Kadipiro virus [25]) were detected in 96 and 5 samples, respectively (see Table S4 in the supplemental material). Additionally, sequences related to numerous viruses that have not been reported in human tissues previously were also found (Table S4). Here we focus our analysis on viruses that have been reported in human tissues. Viral detection by mNGS followed by PCR confirmation testing in different sample types. The detection rates for human viruses or viruses reported in human tissues were 8% (32/384) for sera, 41% (38/92) for nasal-throat swabs, and 50% (5/10) for stool samples, while all 6 CSF samples available from 40 patients presenting with CNS infection were negative. More viruses were found in pooled nasal-throat swabs than in samples of other types (Fig. 4). In the sera tested, 12 different viral species were detected, including the wellestablished human pathogens HBV (n ϭ 9), EV (n ϭ 8), rotavirus A (n ϭ 3), dengue virus (DENV) (n ϭ 2), hepatitis C virus (HCV) (n ϭ 2), human parechovirus (n ϭ 1), HRV (n ϭ 1), and human immunodeficiency virus (HIV) (n ϭ 1) (Fig. 4).

(iii) Detection of sequences related to contaminants and/or viruses not previously reported in human samples. Sequences related to common contaminants of mNGS data sets (including a parvovirus-like hybrid virus
Viral detection in different patient groups and clinical entities by mNGS followed by PCR confirmation testing. The frequencies of different viral species detected in different clinical entities and patient groups are shown in Fig. 5 and Fig. S3 in the supplemental material. Regardless of the clinical sample type, the highest proportion of distinct viral infections was recorded in patients presenting with CNS infections ( Table S5 in the supplemental material).
Among the patients presenting with CNS infections, picornaviruses were the most common viruses detected (see Table S6 in the supplemental material); these included enterovirus, accounting for 7 of 15 (47%) viruses detected (6 in sera and 1 in a pooled nasal-throat swab), and HRV, detected in a serum sample. Rotavirus, a well-known cause of diarrhea, was detected in the blood of three diarrhea patients (two children and one adult).
In terms of age groups, EV and other respiratory viruses (e.g., respiratory syncytial virus [RSV] and HRV) were detected more frequently in children than in adults (Fig. 5). In contrast, blood-borne viruses (HIV, HCV, and HBV) were found more often in adults than in children (Fig. 5). Parechovirus, an established cause of pediatric infections, was detected in one adult presenting with a systemic infection.
Genetic characterization of EV and HBV. Excluding anellovirus-related sequences, mNGS generated sufficient sequence data for informative genetic characterization and phylogenetic inference of EV and HBV in 14 samples, including seven complete viral capsid protein 1 (VP1) sequences of enterovirus and seven complete HBV genomes. Phylogenetically, all seven EVs were classified into six different serotypes of enteroviruses A and B (echovirus 3, echovirus 6, echovirus 9, echovirus 16, coxsackievirus A2, and coxsackievirus A6), while the HBV strains belonged to genotypes B and C (see Fig.  S4 and S5 in the supplemental material), supporting reports about circulating enterovirus serotypes and HBV genotypes in Vietnam (26)(27)(28).
For other viruses, due to the small numbers of genomic sequences recovered (two for DENV, two for gemycircularvirus, and one each for RSV, influenza B virus, HCV, measles virus, WU polyomavirus, and cyclovirus-VN), similar phylogenetic inference was deemed uninformative.

DISCUSSION
We present the results of mNGS for exploration of the human virome in 386 patients presenting with CA sepsis of unknown cause who were enrolled in a multicenter observational study across Vietnam from 2013 to 2015. We identified 21 viral species known to be infectious to humans in 52 (13.4%) of 386 patients presenting with CA sepsis of unknown cause. The study, however, cannot directly impute sepsis causation involving the viruses identified. More specifically, on several occasions, viral detection in nonsterile materials, such as respiratory samples (including EBV and CMV) and stool samples, may simply reflect the carriage of such viruses in those bodily compartments rather than a clinical association. Similarly, viral detection (e.g., enterovirus) in the blood of patients with asymptomatic infections has been reported previously (29). Additionally, the detection of blood-borne viruses, such as HBV, HIV, and HCV, in serum samples might represent underlying diseases and not the causative pathogens leading to the hospital admission, although the detection of HIV RNA in a serum sample of a patient presenting with systemic infection may suggest an acute HIV infection. However, together with the clinical and epidemiologic data, the results present a provocative argument for a wide range of viral pathogens that might be associated with CA sepsis in Vietnam.
Epidemiologically, our results support previous findings regarding the frequent detection of common viruses in corresponding clinical entities and age groups. For Parechoviruses are a well-known cause of disease in children, ranging from acute gastrointestinal/respiratory infections to meningitis, but have increasingly been reported to cause infections in adults (30). Nonpolio enteroviruses, such as EV-A71 and EV-D68, have become serious global threats. In fact, EV-A71 has overwhelmed countries of the Asia-Pacific region (including Vietnam) with large outbreaks of severe hand-foot-and-mouth disease since 1997 (31,32). Recently, EV-D68 has emerged and caused large outbreaks of respiratory infections in the United States; this virus is epidemiologically linked with acute flaccid myelitis (33). The data presented here, together with the results of the original report (3), expand our knowledge about the clinical burden posed by nonpolio enteroviruses (HRV and particularly diverse EV serotypes) and parechoviruses in Vietnam. mNGS detected several recently discovered viruses (Saffold virus, salivirus A, WU polyomavirus, gemycircularvirus, and HPgV-2), representing their first detection in Vietnam and adding to the growing literature about the geographic distribution of these newly identified viruses. Salivirus A has been linked to gastrointestinal infection, and Saffold virus has been reported in gastrointestinal and respiratory infection patients (34)(35)(36)(37). Saffold virus has also been reported to be associated with myocarditis and aseptic meningitis (38,39). Additionally, using a mouse model, studies have shown the neurotropic potential of Saffold virus (39)(40)(41). The pathogenicity of WU polyomavirus, gemycircularvirus, and HPgV-2 remains unresolved. Likewise, it is imperative to conduct follow-up studies to determine whether the detected sequences that are related to viruses not previously reported in human tissues are derived from other sources and whether the respective viruses are infectious to humans. The results of the present investigation also emphasize the utility of serum samples for assessing the etiology of sepsis. Indeed, viruses of the families Picornaviridae (enterovirus, rhinovirus, and parechovirus), Flaviviridae (DENV), and Caliciviridae (rotavirus) were detected by mNGS in the sera included in this study. Notably, as per the design of the original etiological study, sera were not tested for these viruses by PCR (3). Likewise, while it remains unknown why the original study failed to detect common causes of respiratory/enteric infections (influenza A virus, influenza B virus, EV, etc.) in pooled nasal swabs by multiplex PCR assays (3), a slightly lower sensitivity of the multiplex PCR assays used than that of the respective monoplex PCR assays has been reported elsewhere (42).
Virus detection by mNGS is based on the detection of matching viral reads regardless of their number or the resulting genome coverage. While few metagenomic studies published to date have reported the use of specific PCR to verify metagenomic results subsequently, the failure of virus-specific PCR to confirm the original mNGS detections for many patients in the present study may be a consequence of cross talk (bleedthrough) contamination occurring as part of the sequencing procedure, a welldocumented phenomenon (10,43,44). An alternative explanation is the low sensitivity, likely attributed to nucleotide mismatches, of some of the PCR primers used to confirm infection.
The absence of human viral pathogens in 87% of 386 patients may be attributed to the low sensitivity of our mNGS approach, especially in cases where the number of reads obtained was supposedly insufficient (Fig. S1 in the supplemental material), as suggested by the difference in the number of reads obtained between the groups of samples with and without a virus identified. Clearly, future research should address the question of what level of sequencing depth mNGS-based approaches need to achieve in order to reach the required sensitivity while maintaining cost-effectiveness. It is equally important to identify the factors (e.g., sample types and library preparation/ sequencing methods) that may affect sequencing depth (i.e., the number of reads obtained) and assay sensitivity. Additional possibilities include the presence of the sepsis pathogen in nonanalyzed tissues, the presence of nonviral pathogens (e.g., bacteria and parasites) in tested specimens, and/or the inclusion of patients with no infection (e.g., those with conditions caused by toxicity whose clinical presentations mimic infections) in the study.
In summary, we report the application of mNGS for patients presenting with CA sepsis of unknown etiology. Our results highlight challenges in identifying possible viral culprits in patients with CA sepsis and show that diverse viral agents might be responsible for such devastating conditions in tropical settings such as Vietnam. Therefore, rigorous testing for a wide range of viral pathogens in samples from different body compartments collected early after symptom onset, when viral loads are usually highest, is likely to have the greatest yield. Under these circumstances, mNGS is a promising approach because of its capacity to simultaneously detect and genetically characterize viral pathogens in patient samples without the need for prior knowledge of genomic information about the targeted pathogens, thus enhancing the ability to identify infectious etiologies of sepsis and facilitating optimal targeted management.
Blood Systems Research Institute and the National Heart, Lung, and Blood Institute (grant R01 HL105770). We thank Le Kim Thanh for logistical support. We are indebted to the patients for their participation in this study. The