ABSTRACT
A viral whole-genome sequencing (WGS) strategy, based on PCR amplification followed by next-generation sequencing, was used to investigate a nosocomial respiratory syncytial virus-B (RSV-B) outbreak in a hematology-oncology and stem cell transplant unit. RSV-B genomes from 16 patients and health care workers (HCWs) suspected to be involved in the outbreak were compared to RSV-B genomes that were acquired from outpatients during the same time period but epidemiologically unrelated to the outbreak. Phylogenetic analysis of the whole genome identified a cluster of 11 patients and HCWs who had an identical RSV-B strain which was clearly distinct from strains recovered from individuals unrelated to the outbreak. Sequence variation of the glycoprotein (G) gene alone was insufficient to distinguish the outbreak strains from the outbreak-unrelated strains, thereby demonstrating that WGS is valuable for local outbreak investigation.
INTRODUCTION
Respiratory syncytial virus (RSV) is well known to cause significant morbidity and mortality in pediatric populations, especially in premature or very young infants, in patients with chronic heart or lung disease, and in the immunosuppressed (1–3). In adults, the overall burden of RSV infection is similar to that of influenza A illness in elderly and high-risk adults (4). In transplant recipients, RSV is the second leading cause of respiratory virus infections (5). Progression of RSV to involve the lower respiratory tract is commonly associated with immunocompromised status. In allogeneic stem cell transplant recipients, lower respiratory tract RSV infection has been reported to have a mortality rate of 20% to 70% (6, 7).
Transmission of RSV can occur via direct viral inoculation of the eye and/or nose, by inhalation of respiratory droplets, or by indirect inoculation after contact with contaminated fomites (8). Outbreaks of RSV have occurred in a variety of health care settings, including infant wards, adult hematology and transplant units, and outpatient cancer centers (9, 10). Infected visitors or health care workers (HCWs) and patients with active illness or prolonged viral shedding can serve as sources for an outbreak. A recent study showed that prolonged viral shedding over 30 days in patients with hematological disorders was more commonly associated with RSV than with other respiratory viral pathogens (11).
Timely identification of the outbreak source is critical to allow implementation of infection control measures. Whole-genome sequencing (WGS) with next-generation sequencing technology has been increasingly applied to assess the epidemiological link between bacteria implicated in outbreaks (12–14). In the studies with bacterial pathogens, the targeted pathogen was first recovered by culture, and then whole-genome sequencing was performed with the nucleic acid extracted from the pure isolates. Using WGS for investigation of viral molecular epidemiology is more difficult due to less frequent use of viral culture for virus detection. A few studies have reported analysis of genetic diversity of global and local strains of influenza and RSV with WGS technology (15, 16). Only one recent study reported an investigation of hospital transmission of human parainfluenza virus 3 on a general medicine unit (17). In this study, we characterized an RSV-B outbreak in an adult stem cell transplant unit with WGS using the samples collected for respiratory virus PCR. The sequence data were used to identify genetic variations among patient- and HCW-associated strains, and in turn to define transmission pathways.
RESULTS AND DISCUSSION
Phylogeny analysis.Sixteen of 19 nasopharyngeal samples generated sufficient RSV-B PCR product for sequencing. Eleven samples were from patients (R1, R2, R5, R8, R10, R11, R12, R13, R14, R15, R16) and five samples (R3, R4, R6, R7, R9) were collected from health care workers (HCWs) who were in contact with the RSV-positive patients during the outbreak. A viral WGS strategy, based on PCR amplification followed by shotgun sequencing, was used to determine the phylogenetic relatedness of the 16 RSV-B strains from the suspected outbreak cluster and 8 strains (RC1 to RC8) amplified from outpatient samples that tested positive for RSV-B during the outbreak period. The reference genome, human respiratory syncytial virus wild-type strain B1 (AF013254 ), was >99% similar to all genomes recovered in this study with 142 to 151 differences per genome. The number of mapped reads and genome recovery (ranging from 87% to 99%) are listed in Table 1. Phylogenetic analysis was also performed using only variant nucleotide positions within the glycoprotein (G) gene for the 16 patient RSV-B strains and seven outpatient strains. Sequence coverage of the G gene was insufficient for outpatient sample RC7, and thus the strain was not included in the G gene phylogenetic analysis.
Number of mapped reads and genome recovery
Phylogenetic analysis of the whole genome identified a cluster of 11 RSV-B strains with identical genomes (Fig. 1A). These strains were recovered from eight patients (R1, R2, R5, R8, R10, R12, R14, R15) and three HCWs (R4, R6, R9). The rest of the patients (R11, R13, R16) and two HCWs (R3 and R7) thought to be involved in the outbreak were infected with strains that were distinct and unrelated to the outbreak strain. As expected, the eight RSV-B strains recovered from the outpatients (RC1 to RC8) were diverse and were different from both the outbreak strain and the other strains circulating on the stem cell unit.
Phylogenetic trees reflecting the relationships of RSV-B strains based on whole-genome resequencing and analysis (244 positions) (A) and G gene sequence analysis only (16 positions) (B). The trees were generated as described in the text, and the scale bars represent the number of nucleotide differences per sequence. A cluster of 11 identical RSV-B strains, as described in the text, are indicated with an arrow. Nodes supported by either neighbor-joining (NJ) or maximum likelihood (ML) bootstrap values of >70% are indicated by values adjacent to each node, with NJ bootstrap values shown first and ML bootstrap values shown second. Nodes with black dots are supported by Bayesian analyses, with posterior probabilities of >95%.
G protein phylogeny for RSV is known to be robust but did not offer satisfactory resolution power for discrimination of outbreak strains in our study. Strains from three patients (R11, R13, R16) and four out of eight outpatients (RC3, RC4, RC5, RC8) were distinct, while the rest of the strains were undistinguishable (Fig. 1B). Only 16 nucleotide positions were variable across the data set within the G gene.
Epidemiologic investigation.A description of the outbreak and patient characteristics were described previously (18). A summary of the outbreak investigation (Fig. 2) and a spot map (Fig. 3) of the 36-bed stem cell transplant unit with patients infected with RSV are shown. All eight patients who shared the outbreak strain, including R1, R2, R5, R8, R10, R12, R14, and R15, were from the unit's west wing. The three patients with RSV-B strains distinct from the outbreak strain, including R11, R13, and R16, were from the unit's east wing. While three HCWs involved in the outbreak carried the outbreak strain (i.e., R4, R6, R9), the other two HCWs had unrelated nonoutbreak strains (R3 and R7). These data support the hypothesis that this outbreak included eight patients from the west wing of the unit and three HCWs. The other three patients from the east wing of the unit and two HCWs investigated in the outbreak were not part of the outbreak and likely acquired their RSV from the community outbreak occurring at the time.
Summary of the RSV-B outbreak investigation. The outbreak included eight patients from the west wing of the HSCT unit and three HCWs working in the unit. The other three patients were from the east wing of the unit, and two HCWs implicated in the outbreak were determined not to be part of the outbreak.
Diagram of the 36 beds of the stem cell transplant unit showing patients infected with RSV-B along with the time of symptom onset. Patients infected with the RSV-B outbreak strain are marked with ○. Patients with distinct RSV-B strains are represented with ◽, △, and ♢. WK indicates the week number in the outbreak in which each patient developed symptoms.
Traditionally, RSV genotyping was focused on analysis of a complete or partial sequence of the attachment glycoprotein (G) (19). The G protein mediates virus binding to cells. During natural propagation of RSV in infected patients, sequence changes accumulate quickly, especially in the C terminus, the second hypervariable region of the protein (20). Complete or partial G gene sequences are commonly utilized to distinguish the two RSV groups (A and B) and the various genotypes within each group. G gene sequence-based RSV genotyping has been widely used for global epidemiological study of RSV. Evidence indicates that genetic diversity based on the entire RSV genome is necessary for investigation of transmission of RSV strains over short periods of time. Agoti et al. compared RSV genomes with identical G genes and suggested that genotyping based on the whole genome distinguished the RSV strains with identical G genes and increased the sensitivity for tracking RSV transmission over a short period of time (15). Results from our study were consistent with this finding. Variation of G gene sequences was insufficient to distinguish the outbreak strains from the outbreak-unrelated strains. Phylogenetic analysis of G protein gene sequences was able to identify RSV-B strains from only three patients (R11, R13, R16) and four out of eight outpatients (RC3, RC4, RC5, RC8) as distinct strains; the rest of the strains were undistinguishable. In contrast, phylogenetic analysis of the whole-genome sequences clearly separated the outbreak strain from outbreak-unrelated outpatient strains. The mutation rate of the G gene is 2.78 × 10−3 to 3.5 × 10−3 changes/nucleotide/year while the RSV-B genome evolves at an estimated rate of 6.81 × 10−4 to 8.62 × 10−4 changes/nucleotide/year (21, 22). Even though the G gene evolves faster than the whole genome, the difference in mutation rate cannot be detected in strains collected over a short period of time. The whole-genome analysis is based on more positions for evolution to act upon and therefore has greater phylogenetic resolution.
In our study, comparison of RSV-B genomes from 16 patients and HCWs suspected of being involved in the nosocomial RSV-B outbreak in a hematology-oncology and stem cell transplant unit with the RSV-B genomes from patients epidemiologically unrelated to the outbreak clearly identified a cluster of 11 patients and HCWs with an identical RSV-B strain and distinguished them from the patients unrelated to the outbreak. Investigation of patient geographic location provided additional evidence to support the genetic relatedness of the RSV-B genome revealed by the WGS typing, demonstrating person-to-person transmission. The information indicated that five patients and HCWs were misclassified as part of the original outbreak.
The patient contact history of each HCW during the outbreak was not available. The five HCWs implicated in the outbreak included a social worker, a stem cell coordinator, a food service worker, a palliative care physician, and a nurse. They likely worked on both the west and east sides of the unit. In this outbreak three HCWs (the social worker, the stem cell coordinator, and the nurse) were infected with the outbreak strain. Although index personnel for the outbreak cannot be determined in this outbreak, our study showed that the HCWs were important vectors for pathogen spread. Many hospitals have policies to restrict HCWs with acute respiratory symptoms from immunocompromised and other at-risk patients (10, 23). This practice is not sufficient, however, to prevent transmission of pathogens from asymptomatic carriers. Further investigation is required to identify additional practical approaches to reduce the risk of transmission from colonized personnel in a high-risk patient population.
The purpose of the study was to determine whether WGS would be able to separate patients with cases of RSV-B transmission from patients with strains unrelated to the transmission in an outbreak over 8 weeks. We showed that WGS is a valuable tool for a local outbreak investigation compared to the traditional G gene-based analysis. Accurately identifying transmission and defining outbreak boundaries is critical information that allows implementation of appropriate infection control and prevention measures. Delay in detection of patients and HCWs involved in transmission due to lack of symptoms often results in propagation of the outbreak. In this outbreak, RSV-B transmission occurred in eight patients in spite of single-patient room and stringent infection control measures. We showed that WGS provides genotyping with an increased resolution and can be used to screen a large number of patients and HCWs once a small cluster of patients appears.
MATERIALS AND METHODS
Ethics statement.The study was a retrospective study. Samples used for the study were residual clinical specimens not specifically collected for study purposes. Patient identifiers were removed and study numbers were assigned. The study does not involve subject participation or clinical information collection; thus, subject consent was waived by the Northwestern University Institutional Review Board.
Setting.Northwestern Memorial Hospital (NMH) is an 894-bed tertiary care academic medical center in Chicago, IL, offering health care for adults. The hematology/oncology and stem cell transplant unit consists of 36 single-occupancy rooms. The majority of patients in the unit are pre- or posthematopoietic stem cell transplantation (HSCT). These include patients undergoing conditioning for HSCT, receiving HSCT, and undergoing monitoring during the preengraftment period. Patients with HSCT-associated conditions, such as graft-versus-host disease, as well as patients with hematologic malignancies, autoimmune diseases undergoing induction, or consolidation chemotherapy before HSCT are also frequently admitted to this unit. The unit is divided into east and west wings separated by a hallway and two sets of double doors. Nursing staff may care for patients on both sides of the unit during the course of a week. Other HCWs care for patients from either side as needed.
Outbreak and sample collection.The outbreak was detected when an infection preventionist (IP) recognized a cluster of 3 patients who tested positive for RSV-B with GenMark's respiratory viral panel in the HSCT unit in 2015. Turnaround time for the test in our hospital is 24 h to 48 h. Upon further investigation, 11 additional patients with the diagnosis of RSV-B infection were identified during or shortly after admission to the unit over an 8-week period and were occurring during a nationwide and community outbreak of RSV. Screening of all asymptomatic HCWs and unit staff for RSV-B with nasopharyngeal swab by Genmark's respiratory viral panel identified five HCWs carrying RSV-B; thus, 19 individuals with RSV-B were identified during this outbreak investigation.
Nucleic acid was extracted from 200 μl of viral transport medium containing nasopharyngeal samples from the 14 RSV-B-positive patients, 5 HCWs, and 8 outpatients submitted for routine clinical testing for respiratory viruses using the Qiagen Symphony automated extraction system with the QIAsymphony DSP virus/pathogen kit (Qiagen, Inc., Hilden, Germany).
Library construction and DNA sequencing.Twenty-five overlapping pairs of primers (Table 2) were designed to amplify 600- to 700-bp amplicons based on the complete genome sequence of the human respiratory syncytial virus wild-type strain B1 (GenBank accession number AF013254.1 ). Reverse transcription (RT)-PCR was performed with the SuperScript III RT-PCR system containing platinum Taq DNA polymerase with random primer (Invitrogen, Carlsbad, CA). Each fragment was amplified with RSV-specific primers under the following conditions: 94°C for 3 min, 30 cycles of 94°C for 30 s, 55°C for 30 s, and 68°C for 30 s. The 25 PCR products from each sample were quantified and pooled for next-generation sequencing. Pooled amplicons were prepared for sequencing using the Nextera XT DNA library preparation kit (Illumina, San Diego, CA), according to the manufacturer's instruction. Barcoded libraries were pooled and sequenced using an Illumina MiSeq sequencer, employing V2 chemistry with paired-end 2 × 250-base reads. Demultiplexing of the sequence data was performed on instrument. Best practices recommended by Illumina were followed.
Primers used for RSV-B genome amplification
Data analysis.Sequence data were processed with the software package SPANDx using an RSV genome (AF013254 ) as a reference (24). SPANDx is a comparative genomic analysis tool for haploid organisms. It incorporates Burrows-Wheeler aligner (BWA) for read alignment mapping, SAMtools for read filtering and parsing, BEDTools for genetic locus presence/absence (P/A) determination, Picard (see http://picard.sourceforge.net ) for data filtering, the Genome Analysis Toolkit (GATK) for realignment around insertion-deletion (indel) regions, base quality score recalibration, variant determination, data filtering and improved insertion-deletion calling, VCFtools for single-nucleotide polymorphism (SNP) and indel matrix construction, and SnpEff for variant annotation (25–30).
The final output from the SPANDx pipeline was a nucleotide matrix derived from genomic positions which varied in at least one of the 24 genomes. This sequence matrix, consisting of 244 nucleotide positions, was subsequently used for phylogeny, employing neighbor-joining (NJ), maximum likelihood (ML), and Bayesian analyses. For the G gene-only analysis, only 23 strains could be compared, with 16 informational positions. A neighbor-joining phylogenetic tree was constructed with sequence matrix; the scale of analysis is the number of base differences per sequence. The robustness of both NJ and ML inferred tree topologies was evaluated by 1,000 bootstrap resamplings of the data. The phylogenetic trees were compared, and nodes that were not supported by bootstrap values of 70% or higher for at least one of the methods were treated as polytomies. In addition, Bayesian analyses were performed on the aligned sequence data by running five simultaneous chains (four heated, one cold) for one million generations, sampling every 1,000 generations. The selected model was the general time reversible (GTR) using empirical base frequencies and estimating the shape of the gamma distribution and proportion of invariant sites from the data. A resulting 50% majority-rule consensus tree (after discarding the burn-in of 25% of the generations) was determined to calculate the posterior probabilities for each node. The split-differential at 1 million generations was below 0.01. NJ and ML phylogenetic analyses were performed using the software package MEGA6 (31), and Bayesian analyses were performed using the software package MrBayes v 3.1.2 (32). SPANDx analysis was performed in UIC's Center for Research Informatics (CRI).
Accession number(s).The sequences reported in this paper have been deposited in the NCBI Sequence Read Archive under BioProject number PRJNA371804 .
ACKNOWLEDGMENT
This work was performed at Northwestern Memorial Healthcare in Chicago, IL.
FOOTNOTES
- Received 1 March 2017.
- Returned for modification 6 April 2017.
- Accepted 3 July 2017.
- Accepted manuscript posted online 26 July 2017.
Supplemental material for this article may be found at https://doi.org/10.1128/JCM.00360-17 .
- Copyright © 2017 American Society for Microbiology.