Skip to main content
  • ASM
    • Antimicrobial Agents and Chemotherapy
    • Applied and Environmental Microbiology
    • Clinical Microbiology Reviews
    • Clinical and Vaccine Immunology
    • EcoSal Plus
    • Eukaryotic Cell
    • Infection and Immunity
    • Journal of Bacteriology
    • Journal of Clinical Microbiology
    • Journal of Microbiology & Biology Education
    • Journal of Virology
    • mBio
    • Microbiology and Molecular Biology Reviews
    • Microbiology Resource Announcements
    • Microbiology Spectrum
    • Molecular and Cellular Biology
    • mSphere
    • mSystems
  • Log in
  • My alerts
  • My Cart

Main menu

  • Home
  • Articles
    • Current Issue
    • Accepted Manuscripts
    • COVID-19 Special Collection
    • Archive
    • Minireviews
  • For Authors
    • Submit a Manuscript
    • Scope
    • Editorial Policy
    • Submission, Review, & Publication Processes
    • Organization and Format
    • Errata, Author Corrections, Retractions
    • Illustrations and Tables
    • Nomenclature
    • Abbreviations and Conventions
    • Publication Fees
    • Ethics Resources and Policies
  • About the Journal
    • About JCM
    • Editor in Chief
    • Editorial Board
    • For Reviewers
    • For the Media
    • For Librarians
    • For Advertisers
    • Alerts
    • RSS
    • FAQ
  • Subscribe
    • Members
    • Institutions
  • ASM
    • Antimicrobial Agents and Chemotherapy
    • Applied and Environmental Microbiology
    • Clinical Microbiology Reviews
    • Clinical and Vaccine Immunology
    • EcoSal Plus
    • Eukaryotic Cell
    • Infection and Immunity
    • Journal of Bacteriology
    • Journal of Clinical Microbiology
    • Journal of Microbiology & Biology Education
    • Journal of Virology
    • mBio
    • Microbiology and Molecular Biology Reviews
    • Microbiology Resource Announcements
    • Microbiology Spectrum
    • Molecular and Cellular Biology
    • mSphere
    • mSystems

User menu

  • Log in
  • My alerts
  • My Cart

Search

  • Advanced search
Journal of Clinical Microbiology
publisher-logosite-logo

Advanced Search

  • Home
  • Articles
    • Current Issue
    • Accepted Manuscripts
    • COVID-19 Special Collection
    • Archive
    • Minireviews
  • For Authors
    • Submit a Manuscript
    • Scope
    • Editorial Policy
    • Submission, Review, & Publication Processes
    • Organization and Format
    • Errata, Author Corrections, Retractions
    • Illustrations and Tables
    • Nomenclature
    • Abbreviations and Conventions
    • Publication Fees
    • Ethics Resources and Policies
  • About the Journal
    • About JCM
    • Editor in Chief
    • Editorial Board
    • For Reviewers
    • For the Media
    • For Librarians
    • For Advertisers
    • Alerts
    • RSS
    • FAQ
  • Subscribe
    • Members
    • Institutions
Virology

Comprehensive Human Virus Screening Using High-Throughput Sequencing with a User-Friendly Representation of Bioinformatics Analysis: a Pilot Study

Tom J. Petty, Samuel Cordey, Ismael Padioleau, Mylène Docquier, Lara Turin, Olivier Preynat-Seauve, Evgeny M. Zdobnov, Laurent Kaiser
M. J. Loeffelholz, Editor
Tom J. Petty
aDepartment of Genetic Medicine and Development, University of Geneva Medical School, Geneva, Switzerland
bSwiss Institute of Bioinformatics, Geneva, Switzerland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Samuel Cordey
cDivision of Infectious Diseases, Laboratory of Virology and Division of Laboratory Medicine, University Hospitals of Geneva, Geneva, Switzerland
dDepartment of Medicine, University of Geneva Medical School, Geneva, Switzerland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ismael Padioleau
aDepartment of Genetic Medicine and Development, University of Geneva Medical School, Geneva, Switzerland
bSwiss Institute of Bioinformatics, Geneva, Switzerland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Mylène Docquier
eUniversity of Geneva Medical School and Genomics Platform, Geneva, Switzerland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Lara Turin
dDepartment of Medicine, University of Geneva Medical School, Geneva, Switzerland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Olivier Preynat-Seauve
fLaboratory of Immunohematology, Hematology Unit, Department of Genetic and Laboratory Medicine, University Hospitals of Geneva, University of Geneva, Geneva, Switzerland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Evgeny M. Zdobnov
aDepartment of Genetic Medicine and Development, University of Geneva Medical School, Geneva, Switzerland
bSwiss Institute of Bioinformatics, Geneva, Switzerland
gImperial College London, South Kensington Campus, London, United Kingdom
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Laurent Kaiser
cDivision of Infectious Diseases, Laboratory of Virology and Division of Laboratory Medicine, University Hospitals of Geneva, Geneva, Switzerland
dDepartment of Medicine, University of Geneva Medical School, Geneva, Switzerland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
M. J. Loeffelholz
Roles: Editor
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
DOI: 10.1128/JCM.01389-14
  • Article
  • Figures & Data
  • Info & Metrics
  • PDF
Loading

ABSTRACT

High-throughput sequencing (HTS) provides the means to analyze clinical specimens in unprecedented molecular detail. While this technology has been successfully applied to virus discovery and other related areas of research, HTS methodology has yet to be exploited for use in a clinical setting for routine diagnostics. Here, a bioinformatics pipeline (ezVIR) was designed to process HTS data from any of the standard platforms and to evaluate the entire spectrum of known human viruses at once, providing results that are easy to interpret and customizable. The pipeline works by identifying the most likely viruses present in the specimen given the sequencing data. Additionally, ezVIR can generate optional reports for strain typing, can create genome coverage histograms, and can perform cross-contamination analysis for specimens prepared in series. In this pilot study, the pipeline was challenged using HTS data from 20 clinical specimens representative of those most often collected and analyzed in daily practice. The specimens (5 cerebrospinal fluid, 7 bronchoalveolar lavage fluid, 5 plasma, 2 serum, and 1 nasopharyngeal aspirate) were originally found to be positive for a diverse range of DNA or RNA viruses by routine molecular diagnostics. The ezVIR pipeline correctly identified 14 of 14 specimens containing viruses with genomes of <40,000 bp, and 4 of 6 specimens positive for large-genome viruses. Although further validation is needed to evaluate sensitivity and to define detection cutoffs, results obtained in this pilot study indicate that the overall detection success rate, coupled with the ease of interpreting the analysis reports, makes it worth considering using HTS for clinical diagnostics.

INTRODUCTION

Over the last decade, high-throughput sequencing (HTS) has provided unprecedented opportunities for advancement in the field of virology (1–3). To date, HTS has been widely used for microbiome analysis (4–7), whole-genome sequencing (8–12), quasispecies population analysis (13–16), and the discovery of novel viruses (17–21). As a result, various methods and web-based services, such as VIDISCA-454 (22), Pathoscope (23), MetaVir (24), VirusHunter (25), or VirusFinder (26), have emerged. However, each is specific to a particular HTS platform, and interpretation of results remains convoluted to non-experts in bioinformatics. Yet it is reasonable to consider that this technology will soon be adopted for use in routine clinical diagnostics, as HTS has the potential to improve standard diagnostics in many ways, such as providing the means to identify unexpected pathogens, increasing assay sensitivity, and detecting viruses for which no assay exists. Even commercial HTS-based virus typing assays are beginning to emerge, such as the PathAmp FluA reagent (Life Technologies). An additional motivating factor for using HTS is that infectious disease specialists must often handle cases where viral origin is highly suspected, but where all microbiological test results are negative or inconclusive. HTS can provide the means for an alternative diagnostic tool to analyze such specific, potentially life-threatening cases.

To this end, the commonly used Roche-454 and Illumina HTS platforms were recently evaluated for their capacity to detect viruses either using artificially spiked specimens (27) or with specific preselected clinical specimens (22, 28). However, in the latter studies, the analysis and interpretation were focused on the capacity of HTS to detect the known target(s). Consequently, there was little evaluation of the capacity of these methodologies for clinical diagnostics, a situation where “background” sequences play an important role in how results are interpreted. For example, clinical specimens often contain common circulating viruses, such as Epstein-Barr virus (EBV), human herpesvirus 6 (HHV-6), and torque teno virus (TTV), whose presence and quantity vary with specimen type (e.g., blood versus cerebrospinal fluid) and can potentially mask the signal of the causative pathogen.

Therefore, there is a need to benchmark the ability of HTS to provide clear and unbiased viral diagnostic results using a set of diverse, clinically relevant positive specimens. The current lack of diagnostics-oriented HTS investigations in the literature is likely due to the difficulty in displaying and interpreting the complicated data. Often, such results are incomprehensible to non-specialists in bioinformatics (29). Furthermore, both the high cost of and the time needed for the computational analysis remain nontrivial aspects that must be overcome. This point is important, since performance (i.e., total amount of sequence data obtained, sequence length, etc.) provided by the different platforms has greatly improved in recent years, but the minimal cost for users seems to have reached a ceiling. In other words, while the standard cost of sequencing remains the same (e.g., approximately $1,000 to $1,500 for one paired-end sequencing run using an Illumina HiSeq 2500 as in this pilot study), more information can now be generated for the same price. It is therefore possible to analyze multiple specimens in the same sequencing run (termed multiplexing), which significantly reduces the cost per specimen while still generating sufficient amounts of sequence data per specimen. However, HTS procedures need to be validated, as has been done for other virological diagnostic assays, while accounting for the specificity of this technology. To this end, the specimens analyzed in this pilot study encompass a wide range of human DNA and RNA viruses of various genome lengths that are representative of common viral infections (Table 1).

View this table:
  • View inline
  • View popup
  • Download powerpoint
TABLE 1

Specimen information and HTS summary statisticsa

In this pilot study, we present ezVIR as a proof of principle for using HTS for clinical diagnostics. We aimed to demonstrate that HTS can be effectively used in a clinical setting to help non-specialists in bioinformatics make more informed decisions, as all detection results are easy to understand. This bioinformatics pipeline was developed using positive clinical specimens and is designed to processes HTS data and provide easy-to-interpret results independently of the HTS platform used. Given the elevated sensitivity of HTS technology, we also designed a tool to identify cross-contamination in situations where specimens were prepared simultaneously. The results are reported in two phases to enable rapid identification and subsequent typing of the identified viruses, including the ability to customize the reports.

MATERIALS AND METHODS

Clinical specimen selection.Cerebrospinal fluid (CSF), bronchoalveolar lavage (BAL) fluid, serum, plasma, and nasopharyngeal aspirate (NPA) analyzed in this pilot study were selected from specimens submitted between January and September 2013 to the Laboratory of Virology from the University Hospitals of Geneva, Switzerland. The patient's ages ranged from 10 months to 88 years (median, 42 years old). Specimens were randomly selected with the criteria of obtaining one representative of different clinically relevant virus species and genome properties per specimen type, each with an average viral load determined by semiquantitative (e.g., threshold cycle [CT]) and quantitative (quantitative PCR [qPCR]) molecular testing (Table 1). Specimens were identified to be positive for either human rhinovirus (HRV), echovirus (EchoV; an enterovirus), parechovirus (PeV), influenza A virus (IAV), measles virus (MeV), human metapneumovirus (hMPV), human parainfluenza type 1 (hPIV-1), hPIV-3, hPIV-4, human immunodeficiency virus type 1 (HIV-1), hepatitis C virus (HCV), parvovirus B19 (ParvoB19), herpes simplex virus 1 (HSV-1), HSV-2, varicella-zoster virus (VZV; also known as human herpesvirus 3), cytomegalovirus (CMV; human herpesvirus 5), Epstein-Barr virus (EBV; human herpesvirus 4), BK virus (BKV), or JC virus (JCV) by in-house assays or commercial real-time PCR (RT-PCR). After routine screening, each specimen was immediately stored at −20°C (blood specimens) or −80°C (CSF, BAL fluid, NPA). The ethics committee of the Geneva University Hospitals approved this study and determined that no informed consent was required.

Viral nucleic acid extraction.For each specimen analyzed, 110 μl was centrifuged at 10,000 × g for 10 min. One hundred microliters of cell-free supernatant was collected and treated with 20 U of Turbo DNase (Ambion, Rotkreuz, Switzerland) to degrade non-particle-protected DNA. Two nucleic acid extraction procedures were then used for RNA and DNA virus genome extraction. RNA virus genome extraction was performed with TRIzol according to the manufacturer's instructions (Invitrogen, Carlsbad, CA, USA). The RNA pellet was resuspended in 20 μl of RNase-free water (Promega, Dübendorf, Switzerland). DNA virus genome extraction was performed with a NucliSens easyMAG magnetic bead system (bioMérieux, Geneva, Switzerland) according to the manufacturer's instructions, using an elution volume of 25 μl. Subsequent double-stranded DNA synthesis was performed as described by De Vries et al. (22, 28) with some modifications. A 120-μl mixture containing 5 U of a 3′-5′ exo− Klenow fragment (New England BioLabs, Ipswich, MA, USA), 3 μg of random hexamers (Invitrogen), 0.4× Escherichia coli ligase buffer (Invitrogen), 1.8 mM MgCl2, 0.75 mM dithiothreitol (DTT), 0.3 mM dNTPs, and 16 μg/ml RNase A (Sigma-Aldrich, Buchs, Switzerland) was added to 20 μl of extracted DNA, incubated 1.5 h at 37°C, and subjected to a phenol-chloroform extraction and ethanol precipitation. The DNA pellet was resuspended in 20 μl of RNase-free water (Promega).

High-throughput DNA sequencing (DNA-seq) library preparation.Nine specimens were prepared. The volume of the specimens was reduced to 5 μl to measure the concentration. For 8 of the 9 specimens, the amount of starting material was 1 ng (representing approximately one-third of the total amount of material). The totality of specimen 09 was used, since the concentration was below the limit of detection. Libraries were prepared using the Illumina Nextera XT protocol (12 PCR cycles). Library concentrations were measured with a Q-bit (Life Technologies, Carlsbad, CA, USA). Only the most concentrated library (specimen 03) was detectable by a 2200 TapeStation (Agilent, Santa Clara, CA, USA). Fragments of 150 to 450 bp were obtained.

High-throughput RNA sequencing (RNA-seq) library preparation.Eleven specimens were prepared. The starting amount was unknown (below the limit of detection) for 10 of the 11 specimens. For specimen 17, the starting material was 80 ng. The rRNA was removed using a Ribo-Zero Gold kit (epicenter, Madison, WI, USA) according to the manufacturer's protocol. rRNA-depleted specimens were purified on Zymo columns. For specimen 20, a poly(A) depletion using a TruSeq Stranded mRNA kit (Illumina, San Diego, CA, USA) was performed after removal of rRNA. Libraries were prepared with the low-throughput TruSeq total RNA preparation protocol from Illumina (San Diego, CA, US) using 15 PCR cycles. Library concentrations were measured with a Q-bit. Size distribution of fragments was estimated with a 2200 TapeStation. Fragments of 200 to 450 bp were obtained.

High-throughput sequencing.All specimens were sequenced (paired end [PE]) using the 100-bp protocol with indexing on a HiSeq 2500 (Illumina) sequencer in pools of two specimens per lane. RNA-seq libraries were loaded at 8 pM. DNA-seq libraries were loaded at 20 pM or lower for low-concentrated libraries (specimen 01, specimen 02, and specimen 08 were loaded at 10, 13, and 16 pM, respectively). The specimen 03 library size was used to calculate molarity.

Virus database.Complete mammalian and avian virus genome sequences were collected from EMBL, ViralZone, and NCBI databases, as no single comprehensive collection of full-length virus genome sequences currently exists (for more detail, see Table S1 in the supplemental material). Briefly, all genomes listed in both EMBL virus (http://www.ebi.ac.uk/genomes/virus.html) and ViralZone (http://viralzone.expasy.org/all_by_species/678.html) collections were merged into one database. Complete virus sequences from any missing virus families were then downloaded from NCBI and combined with the EMBL and ViralZone sequences. All duplicates and any unverified genomes were removed. Furthermore, any genomes labeled as “recombinant” or “clone” were carefully inspected and conserved only if pertinent.

Virus genome bin size selection.A histogram of genome lengths for all genomes in the ezVIR database was generated in order to determine optimal bin size values for the data point dot circumference (genome size) groups shown on the ezVIR plots (see Fig. S1 in the supplemental material) (Fig. 1C). The default bin cutoff values are 4,000, 20,000, and 40,000 bp; however, these cutoff values can be customized by the user in order to modify the appearance of viruses on result plots, depending on the aim of the particular study or application.

FIG 1
  • Open in new tab
  • Download powerpoint
FIG 1

ezVIR pipeline, metrics, and reporting. (A) ezVIR pipeline overview. DB, database. (B) Three metrics were reported for each detected virus. Percent genome coverage reports all regions of the genome (blue) covered by at least one read (purple) divided by genome length. Maximum (max) depth of coverage refers to the average number of reads covering the genome in a 50-bp sliding window; the window slides along the genome and the maximum value is reported. Total covered length is the total number of bases detected for the genome. Simple equations demonstrate how each is calculated (results are in green). (C) ezVIR reporting features, including the type of data contained in each report, analysis options, data point information, and clinically relevant virus family colors. ID, identification number. (D) Examples of phase 1 reports (using specimens 01 and 16, which were found to be positive for herpes simplex virus 1 and HRV-66 by traditional routine clinical diagnostics, respectively). Plots depict six metrics per identified virus: (i) virus type, (ii) virus family (color of dot and label), (iii) percent genome coverage, (iv) maximum coverage depth, (v) total covered length in base pairs (represented by the area of the colored dot), (vi) genome size group (gray outer ring).

Bioinformatics pipeline.The specimen libraries were bar-coded and sequenced in a multiplexed reaction, and the resulting PE reads (100 nucleotides each) were demultiplexed (libraries were separated by their index). To remove human sequences, reads are first mapped (Bowtie2) (30) to the human genome (NCBI GRCh37). In virus identification phase 1, the remaining nonhuman reads are mapped (Bowtie2) to a comprehensive, manually curated database containing 11,018 complete virus sequences (see Table S1 and Fig. S1 in the supplemental material). To increase the sensitivity of detection, all mate-pair mappings, for all reads and for every genome, are retained. By exposing each genome to all reads during this stage, the pipeline is able to determine the most likely viruses (as the best-mapped genome for each virus species [genus for herpesviruses]) given the sequence data. After the mapping stages, the pipeline computes genome detection metrics (defined below), summarizes data, and generates reports at two levels of sensitivity: phase 1, general virus identification (positive targets representing the strongest signal from each virus species [genus for herpesviruses] detected); and phase 2, targeted strain detection and genome coverage statistics (Fig. 1A).

After mapping to all genomes in the database, three metrics are calculated for every detected virus genome: percent genome coverage, maximum depth of coverage (using a sliding window of 50 bp), and total covered length in bp (Fig. 1B). The percent genome coverage is calculated as the total length of all regions along the genome that are covered by at least one read, divided by genome length. This serves as an intrinsic normalization, as the lengths of virus genomes vary by more than 100 orders of magnitude. The maximum depth of coverage reflects the relative “signal strength” of a particular virus and is calculated as the maximum of the average number of reads in a sliding window (default size of 50 bp) along the genome. The sliding window provides the means to highlight viruses with slightly more genome coverage in cases where multiple viruses have the same upper limit of mapped reads (for more detail, see Fig. S2 in the supplemental material). The total covered length indicates the total number of nucleotides detected for each virus genome and is represented (on the reports) by colored dots of varied circumference (larger circumferences correspond to more nucleotides covered). Phase 1 plots display the best-scoring representative (highest percent coverage and greatest maximum depth of coverage) for each detected virus species. For each virus identified in phase 1, phase 2 provides genome coverage information on the level of strains, genotype, serotype, or lineage, depending on the virus identified. For this pilot study, while all specimens were mapped to the complete database (and therefore retained potential mappings to nonhuman viruses), only known human viruses are shown in the reports. Aside from mapping, all data processing, analysis, statistics calculations, and reporting are coded in, and performed with, BEDtools (31), R (32), python (http://www.python.org), and Linux (bash) shell scripts. Regarding data storage, the largest files are the initial raw sequencing results (“fastq” files) in the range of 4 to 20 gigabytes (GB) per specimen. All ezVIR analysis files (including all mapping results) are significantly smaller (on the order of megabytes [MB]) and can easily be stored on desktop systems. The code for this pilot version of the ezVIR pipeline and supporting documentation is available at http://cegg.unige.ch/ezvir.

CCA.Genome mapping results for all specimens are compared in a pairwise manner for all possible pair permutations (specimen 01 versus specimen 02, specimen 01 versus specimen 03, specimen 02 versus specimen 03, etc.). Per pair, any virus genomes that are detected in both specimens are stored as an intersect set with corresponding ezVIR detection metrics (percent genome coverage, maximum depth of coverage, total covered length, and genome length, as explained above) per genome. These intersect sets can be queried in phase 2 on a per-virus-species basis. To perform the cross-contamination analysis (CCA), once any detected virus appearing on the report plot is selected, the cross-contamination module will create a bar plot displaying the “signal” (the log10 maximum coverage depth) for that particular virus genome in all specimens (Fig. 2C) (see Fig. S4 in the supplemental material). The CCA plots serve to guide interpretation of ezVIR analysis results and help determine if a detected virus was present in the original specimen or could be a contaminant from a neighboring specimen or the laboratory environment.

FIG 2
  • Open in new tab
  • Download powerpoint
FIG 2

ezVIR phase 2 strain identification and cross-contamination analysis. (A) The phase 2 report highlights details for any selected virus family identified in phase 1; this example shows the HRV signal from specimen 16 (phase 1 report shown in Fig. 1D). Phase 2 reports show mapping results from all HRV genotypes in the database. The read mapping histograms can be used to help discriminate among genomes in situations where multiple viruses have a similar percentage of genome coverage and depth of coverage. In this example, the genome coverage of HRV-66 is compared to that of HRV-77. (B) Case of coinfection. The most prominent strain of measles is confirmed to correspond to the vaccinal Edmonston strain. (C) Cross-contamination plots can be used to confirm the presence of identical virus strains in other specimens prepared in the same experimental series. In this example, the strongest signal in the specimen 17 report is for rhinovirus (at a maximum coverage depth of 65). However, the CCA plot reveals that the neighboring specimen 16 contains 10,000 times the amount of the same HRV-66 strain (∼197,000 coverage depth).

Workflow.Viral nucleic acids are extracted directly from clinical specimens without particle enrichment steps and then processed as described above for HTS (Fig. 1A). The input to ezVIR is the sequence data (“fastq” files, the output from all standard HTS platforms), and the default output is the phase 1 report. Based on the viruses identified in this phase, the user can then ask for more detailed information (in phase 2) about each particular virus, including read coverage histograms, strain typing, and cross-contamination reports.

How to interpret reports.The reports are designed to allow rapid and intuitive comparison of all viruses detected regardless of large differences in genome lengths. Only the best-mapped (in terms of genome coverage and maximum depth of coverage) genomes from each virus species (genus for herpesviruses) are presented in the initial phase 1 reports (Fig. 1). Genus- or strain-specific information can be viewed in phase 2 reports. For each detected virus, a dot appears on the plot, reflecting the percent genome coverage on the x axis (how much of the genome was detected) and the signal strength on the y axis (the most reads mapped calculated as the maximum depth of coverage). Ideally, the causative and/or most prominent virus in a specimen will appear in the upper right corner, representing 100% genome coverage and the strongest signal (y axis) relative to those of other potential viruses in the same specimen. Generally, when 100% genome coverage is not observed, the best candidate(s) is represented by the dot corresponding to the clinically relevant virus species or genus that combines the highest percent genome coverage and maximum depth of coverage. Furthermore, the metrics corresponding to the total covered length indicated in phase 1 reports (colored dots of varied circumference) also help to highlight the viruses of interest in such cases. The y-axis scale is dynamic and is automatically adjusted according to the virus with the strongest signal in each specimen (Fig. 1B). Although we do not define a lower limit of detection (cutoff), any virus appearing with a low y-axis value (based on our observations here, a value of <10) must be considered with caution (refer to “Dealing with cross-contamination” below).

The data points (colored dots) that appear on the reports are intended to provide multiple useful measurements in the same location (Fig. 1C, Data point information). The color of the inner dot and label name corresponds to the virus family (key on right side of reports). The outer gray ring represents the size class (genome length) of the virus. The size of the inner dot indicates approximately how much of that genome was detected (in terms of mapped nucleotides). The different dot sizes help to compare viruses of vastly different sizes on the same plot. Additionally, a table in the form of a comma-separated text file (for use with standard data display software, such as Microsoft Excel) that contains all values for all detected viruses accompanies each graphical report (Fig. 3).

FIG 3
  • Open in new tab
  • Download powerpoint
FIG 3

Masking undesired viruses. The “blacklist” option allows users to remove any viruses from plots. This is useful in situations where high levels of irrelevant virus might reduce visibility of other viruses present in the specimen. For example, the human TT viruses found in these clinical cases are not relevant and can be removed with this option. By default, a corresponding table is created with each graphical report to list all viruses found in the specimen, regardless of whether they are blacklisted. Even if a virus is not clearly visible in the default report, it is easily found in the corresponding table. Cov., coverage.

Cost and practicality.In this pilot study, each specimen was sequenced (Illumina HiSeq platform) using standard technology with a standard paired-end protocol. The cost of sequencing (including library preparation) was approximately $1,500 per paired-end run. While the first part of the pilot ezVIR pipeline (mapping to the human genome and then to all virus genomes) (Fig. 1A) took ∼4 days per specimen using Bowtie2 (30) on a multicore computer (∼100 central processing units [CPUs]), the use of alternate alignment software (e.g., SNAP) (33) can reduce the analysis time to less than 1 day. The speed of the mapping stage depends on the mapping software used, the number of computing cores, and the number of nonhuman reads. After mapping, all report generation and the phase 2 analysis can be performed on a desktop or laptop computer in a matter of seconds to minutes.

RESULTS

Bioinformatics pipeline.With the goal of identifying viruses (those for which genome sequences exist) in clinical specimens, our bioinformatics tools are designed to remove human “background” sequences, identify virus genus or species, subsequently identify particular strains, and report findings in a comprehensible manner (Fig. 1, 2, and 3) (see Fig. S3 in the supplemental material). The pipeline generates reports in two stages in order to simplify interpretation without compromising the presentation of significant microbiological findings. The bioinformatics tools presented here have been developed to enable better discrimination of the “true-positive” human viral sequences within the “noise” of multiple background sequences. The main steps involve (i) mapping the HTS data to the human genome, (ii) mapping all nonhuman reads to a comprehensive database of virus genomes, and (iii) computing mapping metrics (see Materials and Methods) and then organizing and summarizing the results into user-friendly and comprehensible graphical reports (Fig. 1A).

To design and subsequently challenge the pipeline, 20 clinical specimens (5 CSF, 7 BAL fluid, 1 NPA, 5 plasma, and 2 serum specimens) found to be positive for either DNA or RNA viruses by routine molecular diagnostics were analyzed (Table 1). After mapping the HTS data from each specimen to the human genome, the number of human reads per specimen was found to vary (from 0.65% to ∼84%) according to the clinical specimen. For each specimen, all nonhuman reads were mapped to our database, metrics calculated, data analyzed, and results presented in two phases using the ezVIR tools described below.

Analysis of positive clinical specimens.Eighteen of the 20 specimens used to assess the robustness of the pipeline presented here were correctly analyzed: 16 specimens in phase 1, and 2 specimens in phase 2 (specimen 12 and specimen 17). Results were inconclusive at the genus level for the remaining 2 specimens (specimen 03 and specimen 04) (Table 1) (see Fig. S3 in the supplemental material). Of note, as the specimens studied here represent a broad range of those often found in routine clinical situations, we also gained valuable insight into areas for improvement and the potential limitations (described in the following sections) of using HTS in a clinical setting.

The pipeline generates reports in two phases (Fig. 1A). The phase 1 report serves as the default representation that indicates the strongest signal from each detected virus species (Fig. 1D). To reduce background signals (viruses with very low genome coverage) and improve the interpretation of the results, users have the option to define a threshold value for “percent genome coverage” (for example, the threshold can be set to display only those viruses with more than 5% genome coverage). Since the y-axis scale is dynamic, this option may be useful to better differentiate partially overlapping dots (including the associated labels). The phase 2 report provides identification of particular strains, genotypes, serotypes, or lineages as well as detection statistics, including genome coverage histograms (available for any user-selected data point appearing in the plot) to enable a detailed assessment and comparisons of identified viruses (Fig. 2A). For 16 specimens, the viruses were clearly indicated in the phase 1 report. Overall, clear information was obtained for most specimens (Table 1), and strong virus signals (both percent genome coverage and maximum coverage depth) were observed (see Fig. S3 in the supplemental material), providing results ready for interpretation by microbiologists. Furthermore, all of the viruses that were detected by specific RT-PCR in specimen 20 by routine screening (PeV, PIV-1, PIV-3, PIV-4, and MeV) were also highlighted in the phase 1 report (Table 1 and Fig. 2B), demonstrating the capacity of this pipeline to identify coinfections (all detected viruses have genome coverage of ≥83% and a relatively strong signal [maximum coverage of ≥70]). Specimen 20 (NPA) was collected from a 10-month-old child 8 days postvaccination for measles. In agreement with the traditional Sanger sequencing of the MeV virus N gene, the phase 2 analysis reported the vaccinal Edmonston strain as the most likely candidate (Fig. 2B).

Two specimens (specimens 12 and 17) had multiple strong signals that made it difficult to pinpoint one specific virus in phase 1. However, phase 2 reports and histograms made it possible to distinguish the target virus over other background viruses. For example, specimen 17 was positive for influenza A virus but negative for human rhinovirus by specific RT-PCR (routine laboratory screening), yet HRV represented the strongest signal in the phase 1 report for specimen 17 (Fig. 2C). The phase 2 cross-contamination analysis (CCA) can help to clarify results in such situations. In the CCA bar plot shown in Fig. 2C, neighboring specimen 16 was shown to contain 1,000 orders of magnitude more of the same HRV genotype, HRV-66. As these two specimens were prepared alongside each other in the same run of experiments, the pipeline indicated that specimen 17 was most likely contaminated by specimen 16. While we could not definitively exclude the possibility of coinfection with the same virus genotype, the lower signal observed for specimen 17 made this highly improbable (Fig. 2C). Of note, the robustness of the phase 2 report (presence of the HRV-66 genotype in specimen 16) was confirmed by classical sequencing methods based on an analysis of the VP4-VP2 region and 5′ untranslated region (UTR) (data not shown). The same approach could be used to rule out the very weak signals from JCV in specimen 01, HRV in specimen 10, and measles virus in specimen 12 (see Fig. S4 in the supplemental material). Of note, while these cross-contaminating virus signals were extremely low (only 2 to 8 total mapped reads per virus) and could be considered background signal, the CCA module was still able to detect them.

The ezVIR results from two specimens (specimens 03 and 04) were inconclusive. These cases were CSF and BAL specimens for which VZV and CMV (both herpesviruses) were detected in routine analysis, respectively. While the presence of these herpesviruses was detected in phase 1 reports, the low depth and percent genome coverage limited our ability to determine the exact type of herpesvirus present in the specimen with the current version of our pipeline (see Fig. S3 in the supplemental material). However, the ezVIR phase 1 reports for both of these specimens enabled one to presume that herpesvirus was present—valuable information for physicians nonetheless. Statistically, lower genome coverage can be expected for large (>40,000-bp) virus genomes, such as those of herpesviruses, than for small virus genomes, making the former difficult to highlight. In blood specimens from immunocompromised patients, the interpretation of results can be further hindered by the considerable number of “background” viruses resulting from viral reactivation infections (EBV, TTV, HHV-6, etc.). Despite the relatively large volume (>1,000) of reads mapping to these long genomes, the percent genome coverage and depth of coverage may remain low. This, in turn, can place a detected large-genome virus in the same region as significantly smaller background viruses detected with low depth and percent coverage. In such cases, the larger data point size (which reflects the total genome nucleotides mapped) helps to distinguish the detected target virus from background virus signals (Fig. 1C). Taken together, these observations indicate that improving both the sensitivity and the specificity for Herpesviridae is a key point that needs to be addressed in the next version of ezVIR. The “blacklist” option can also help to clarify reports by removing any potential non-clinically relevant viruses that may mask the signal of other detected viruses (for example, removing the TT viruses in specimens 06 and 07), as the user can specify viruses that should not appear on the plots (Fig. 3). Of note, regardless of which viruses are displayed on the plots, all analysis metrics for all detected viruses are retained in a corresponding comma-separated data file.

Dealing with cross-contamination.Interspecimen contamination is a known consequence of the increased sensitivity of HTS technology (27, 34, 35). Despite exercising the highest degree of precaution during specimen preparation, we nevertheless observed the (albeit weak) presence of viruses (DNA and RNA viruses) from neighboring specimens in 4 specimens (specimen 01 contaminated by specimen 02, specimen 10 by specimen 11, specimen 12 by specimen 13, specimen 17 by specimen 16) (see Fig. S4 in the supplemental material). As previously discussed for specimens 12 and 17, while ezVIR correctly identified the presence of the target viruses, the reports also revealed the presence of viruses not in the original specimens. Cross-contamination during specimen preparation is the most likely cause, as the contaminating virus was always the same as the target virus in neighboring specimens (see Fig. S4). A clear example is the presence of the same HRV-66 genotype present in specimens 16 and 17 (Fig. 2C). It is highly improbable that both patients were independently infected with the same virus, given the large variety of circulating rhinovirus genotypes (11). As these two specimens were prepared alongside each other, the HRV-66 detected in specimen 17 is most likely a result of cross-contamination from specimen 16 rather than the presence of a coinfection. This is supported by the fact that only IAV, not HRV, was identified in specimen 17 by specific RT-PCR as used daily in routine screening. These observations underscore one obstacle that stems directly from the inherent sensitivity of HTS technology that needs to be fully addressed in the future.

DISCUSSION

Although various HTS-based virus detection methods currently exist, each is designed to function with a particular HTS platform, and the results remain cryptic for non-experts in bioinformatics. In this proof-of-principle study, a total of 20 clinically relevant positive specimens containing a wide range of viruses (both DNA and RNA) were selected in order to conduct a pilot validation of our HTS-based virus detection tools. The design of this pilot study allowed us to assess the potential effectiveness of using HTS to analyze a representative selection of routine specimens previously characterized by conventional real-time PCR (RT-PCR) (Table 1). While the sample size used in this pilot phase is not sufficient to provide a final validation (i.e., sensitivity and specificity), our results indicate that the success rate and ease of interpretation of results provided by the ezVIR pipeline make it worth considering using HTS as an alternative method for investigation of selected cases. Indeed, 18 of the 20 specimens were correctly analyzed with ezVIR, while 2 specimens remained inconclusive despite the fact that for both specimens, the presence of herpesvirus was clearly detected in phase 1 reports. Nevertheless, as previously mentioned, improving the sensitivity and specificity for Herpesviridae members is a key issue that needs to be addressed in the next release of ezVIR. An alternative would be to consider the percentage of similarity and to provide information concerning those reads that specifically correspond to each herpesvirus. Although this pilot study did not validate the sensitivity of HTS data analysis with ezVIR, our results show that it may reach a threshold close to that of real-time PCR. The advantage here is that all known viruses can be detected at once, in contrast to virus-specific PCR methods. The variety of clinical specimens used (BAL fluid, CSF, NPA, plasma, and serum) and the spectrum of viruses tested are representative of those seen in daily practice. In a further step, a larger set of specimens will be needed to corroborate these results. For now, the current pipeline should not be considered a validated diagnostic tool for clinical care.

While the phase 1 report provides rapid species (genus for herpesviruses) identification, the phase 2 report is useful for virus typing (Fig. 2A) and highlighting contamination (e.g., observing an identical strain in multiple specimens with similar patterns in read coverage histograms) and coinfections (Fig. 2B). The reliability of the phase 2 typing reports was demonstrated for 3 specimens that were each confirmed (by classical PCR-based sequencing) to be positive for the correct HRV-66 and MeV B3 genotypes and the MeV Edmonston strain (specimens 16, 13, and 20, respectively). A comparative analysis with a larger set of specimens would be necessary to confirm the robustness of phase 2 typing reports. Although physicians in general may not need typing results for most viruses, this information can be extremely useful in certain situations, such as in the case of immunosuppressed individuals, travelers, vaccinations, or antiviral therapy. An example in this pilot study is the identification of measles virus in a nasopharyngeal aspirate shortly after MeV vaccination with a live vaccine (specimen 20). In such cases, it is important for both physicians and epidemiologists to define whether symptoms are related to the vaccinal or circulating strains. The same situation, although rare, is also observed for travelers developing yellow fever meningitis shortly after vaccination.

Due to the intrinsic capacity of HTS to generate millions of sequence reads per specimen, a commonly described consequence of this sensitivity is interspecimen or “environmental” contamination (27, 35). Therefore, current HTS users should be extremely vigilant regarding the viruses present in neighboring specimens analyzed in the same series. Despite implementation of the highest precautions to reduce any potential cross-contamination in this investigation, interspecimen signals were nevertheless observed for four specimens (see Fig. S4 in the supplemental material). However, the presence of such contamination did not affect interpretation of phase 1 reports for two specimens (specimens 01 and 10), and the phase 2 CCA analysis made it possible to correctly interpret results for specimens 12 and 17. When interpreting reports, one must also consider the situation where an unexpected identified virus may indeed be present in the specimen but where the patient was asymptomatic for that particular virus.

The most obvious way to eliminate this issue is to prepare one specimen at a time, a solution not only time consuming but also expensive and impractical for use in routine analysis. Of note, the protocols used in HTS technology are based mainly on individual methods and utilize manual kits. The optimization of all such presequencing procedures (sample preparation, nucleic acid extraction, and library preparation) is a key issue that will likely be highly improved by automation. While solutions like this currently exist, they remain cost prohibitive for most laboratories. Such automated solutions may also strongly minimize the contamination detected by HTS methods. In the future, it will be important to define the sample preparation methods best adapted for each routine application (e.g., DNA versus RNA viruses). Currently, the issue of contamination may be partially resolved by considering the virus identification in the context of results from the neighboring specimens, as demonstrated with the ezVIR cross-contamination analysis (see Fig. S4 in the supplemental material). The CCA aims to facilitate the identification of such contaminants based on significant differences in signal strength (percent coverage and depth of coverage) of the same virus between specimens. This leads to a paradigm shift in clinical microbiology: a result can still be validated despite background contamination if the bioinformatics pipeline provides reliable analysis tools. The set of specimens used in this pilot study included one documented case of coinfection (specimen 20). Since ezVIR analysis could efficiently detect all viruses previously detected by routine analysis, our results suggest that this pipeline can reach a sufficient level of sensitivity to identify coinfection cases. The next challenge will be to validate and define a minimal threshold (cutoff values) and statistical means to discriminate with certainty between coinfection and cross-contamination. In this respect, the clinical history will also need to be taken into account.

In summary, a close collaboration among infectious disease specialists, clinical microbiologists, bioinformaticians, and HTS platform technology experts allowed us to harness HTS technology for effective and user-friendly virus detection in clinical specimens. The pipeline was designed to identify viruses using a comprehensive manually curated database containing more than 11,000 complete virus genomes, and most importantly, it automatically generates clear and concise (customizable) representations of the results for non-specialists in HTS. This is an important step toward routine use of HTS for clinical virology.

Outlook.While this pilot study was designed around a comprehensive sampling of clinical specimen types and viruses, there are many options for further advancement and validation, as this collection of specimens is obviously neither complete nor 100% representative. The ezVIR pipeline is designed to be modular and customizable (e.g., HTS data from various sequencing platforms can be used as input, and different custom databases can be built from a list of reference nucleotide sequences in a straightforward manner). While HTS is not yet optimal for use in clinical routine diagnostics, we suggest that it soon can be.

ACKNOWLEDGMENTS

We thank Lia van der Hoek (University of Amsterdam, Netherlands) for advice and useful discussions.

This study was supported by the Swiss National Science Foundation (grant 32003B_146993 to L.K.), the Louis-Jeantet Foundation (E.M.Z., L.K., S.C., and O.P-S.) and the Faculty of Medicine, Geneva.

Part of the computations were performed at the Vital-IT (http://www.vital-it.ch) Center for high-performance computing of the SIB Swiss Institute of Bioinformatics.

FOOTNOTES

    • Received 14 May 2014.
    • Returned for modification 16 June 2014.
    • Accepted 3 July 2014.
    • Accepted manuscript posted online 9 July 2014.
  • Supplemental material for this article may be found at http://dx.doi.org/10.1128/JCM.01389-14.

  • Copyright © 2014, American Society for Microbiology. All Rights Reserved.

REFERENCES

  1. 1.↵
    1. Barzon L,
    2. Lavezzo E,
    3. Costanzi G,
    4. Franchin E,
    5. Toppo S,
    6. Palu G
    . 2013. Next-generation sequencing technologies in diagnostic virology. J. Clin. Virol. 58:346–350. doi:10.1016/j.jcv.2013.03.003.
    OpenUrlCrossRefPubMed
  2. 2.↵
    1. Radford AD,
    2. Chapman D,
    3. Dixon L,
    4. Chantrey J,
    5. Darby AC,
    6. Hall N
    . 2012. Application of next-generation sequencing technologies in virology. J. Gen. Virol. 93:1853–1868. doi:10.1099/vir.0.043182-0.
    OpenUrlCrossRefPubMedWeb of Science
  3. 3.↵
    1. Capobianchi MR,
    2. Giombini E,
    3. Rozera G
    . 2013. Next-generation sequencing technology in clinical virology. Clin. Microbiol. Infect. 19:15–22. doi:10.1111/1469-0691.12056.
    OpenUrlCrossRefPubMed
  4. 4.↵
    1. Handley SA,
    2. Thackray LB,
    3. Zhao G,
    4. Presti R,
    5. Miller AD,
    6. Droit L,
    7. Abbink P,
    8. Maxfield LF,
    9. Kambal A,
    10. Duan E,
    11. Stanley K,
    12. Kramer J,
    13. Macri SC,
    14. Permar SR,
    15. Schmitz JE,
    16. Mansfield K,
    17. Brenchley JM,
    18. Veazey RS,
    19. Stappenbeck TS,
    20. Wang D,
    21. Barouch DH,
    22. Virgin HW
    . 2012. Pathogenic simian immunodeficiency virus infection is associated with expansion of the enteric virome. Cell 151:253–266. doi:10.1016/j.cell.2012.09.024.
    OpenUrlCrossRefPubMedWeb of Science
  5. 5.↵
    1. Lysholm F,
    2. Wetterbom A,
    3. Lindau C,
    4. Darban H,
    5. Bjerkner A,
    6. Fahlander K,
    7. Lindberg AM,
    8. Persson B,
    9. Allander T,
    10. Andersson B
    . 2012. Characterization of the viral microbiome in patients with severe lower respiratory tract infections, using metagenomic sequencing. PLoS One 7:e30875. doi:10.1371/journal.pone.0030875.
    OpenUrlCrossRefPubMed
  6. 6.↵
    1. Lecuit M,
    2. Eloit M
    . 2013. The human virome: new tools and concepts. Trends Microbiol. 21:510–515. doi:10.1016/j.tim.2013.07.001.
    OpenUrlCrossRefPubMed
  7. 7.↵
    1. De Vlaminck I,
    2. Khush KK,
    3. Strehl C,
    4. Kohli B,
    5. Luikart H,
    6. Neff NF,
    7. Okamoto J,
    8. Snyder TM,
    9. Cornfield DN,
    10. Nicolls MR,
    11. Weill D,
    12. Bernstein D,
    13. Valantine HA,
    14. Quake SR
    . 2013. Temporal response of the human virome to immunosuppression and antiviral therapy. Cell 155:1178–1187. doi:10.1016/j.cell.2013.10.034.
    OpenUrlCrossRefPubMedWeb of Science
  8. 8.↵
    1. Batty EM,
    2. Wong TH,
    3. Trebes A,
    4. Argoud K,
    5. Attar M,
    6. Buck D,
    7. Ip CL,
    8. Golubchik T,
    9. Cule M,
    10. Bowden R,
    11. Manganis C,
    12. Klenerman P,
    13. Barnes E,
    14. Walker AS,
    15. Wyllie DH,
    16. Wilson DJ,
    17. Dingle KE,
    18. Peto TE,
    19. Crook DW,
    20. Piazza P
    . 2013. A modified RNA-Seq approach for whole genome sequencing of RNA viruses from faecal and blood samples. PLoS One 8:e66129. doi:10.1371/journal.pone.0066129.
    OpenUrlCrossRefPubMed
  9. 9.↵
    1. Kundu S,
    2. Lockwood J,
    3. Depledge DP,
    4. Chaudhry Y,
    5. Aston A,
    6. Rao K,
    7. Hartley JC,
    8. Goodfellow I,
    9. Breuer J
    . 2013. Next-generation whole genome sequencing identifies the direction of norovirus transmission in linked patients. Clin. Infect. Dis. 57:407–414. doi:10.1093/cid/cit287.
    OpenUrlCrossRefPubMed
  10. 10.↵
    1. Lin Z,
    2. Wang X,
    3. Strong MJ,
    4. Concha M,
    5. Baddoo M,
    6. Xu G,
    7. Baribault C,
    8. Fewell C,
    9. Hulme W,
    10. Hedges D,
    11. Taylor CM,
    12. Flemington EK
    . 2013. Whole-genome sequencing of the Akata and Mutu Epstein-Barr virus strains. J. Virol. 87:1172–1182. doi:10.1128/JVI.02517-12.
    OpenUrlAbstract/FREE Full Text
  11. 11.↵
    1. Tapparel C,
    2. Cordey S,
    3. Junier T,
    4. Farinelli L,
    5. Van Belle S,
    6. Soccal PM,
    7. Aubert JD,
    8. Zdobnov E,
    9. Kaiser L
    . 2011. Rhinovirus genome variation during chronic upper and lower respiratory tract infections. PLoS One 6:e21163. doi:10.1371/journal.pone.0021163.
    OpenUrlCrossRefPubMed
  12. 12.↵
    1. Sun M,
    2. Gao L,
    3. Liu Y,
    4. Zhao Y,
    5. Wang X,
    6. Pan Y,
    7. Ning T,
    8. Cai H,
    9. Yang H,
    10. Zhai W,
    11. Ke Y
    . 2012. Whole genome sequencing and evolutionary analysis of human papillomavirus type 16 in central China. PLoS One 7:e36577. doi:10.1371/journal.pone.0036577.
    OpenUrlCrossRefPubMed
  13. 13.↵
    1. Cordey S,
    2. Junier T,
    3. Gerlach D,
    4. Gobbini F,
    5. Farinelli L,
    6. Zdobnov EM,
    7. Winther B,
    8. Tapparel C,
    9. Kaiser L
    . 2010. Rhinovirus genome evolution during experimental human infection. PLoS One 5:e10588. doi:10.1371/journal.pone.0010588.
    OpenUrlCrossRefPubMed
  14. 14.↵
    1. Solmone M,
    2. Vincenti D,
    3. Prosperi MC,
    4. Bruselles A,
    5. Ippolito G,
    6. Capobianchi MR
    . 2009. Use of massively parallel ultradeep pyrosequencing to characterize the genetic diversity of hepatitis B virus in drug-resistant and drug-naive patients and to detect minor variants in reverse transcriptase and hepatitis B S antigen. J. Virol. 83:1718–1726. doi:10.1128/JVI.02011-08.
    OpenUrlAbstract/FREE Full Text
  15. 15.↵
    1. Dybowski JN,
    2. Heider D,
    3. Hoffmann D
    . 2010. Structure of HIV-1 quasi-species as early indicator for switches of co-receptor tropism. AIDS Res. Ther. 7:41. doi:10.1186/1742-6405-7-41.
    OpenUrlCrossRefPubMed
  16. 16.↵
    1. Beerenwinkel N,
    2. Zagordi O
    . 2011. Ultra-deep sequencing for the analysis of viral populations. Curr. Opin. Virol. 1:413–418. doi:10.1016/j.coviro.2011.07.008.
    OpenUrlCrossRefPubMed
  17. 17.↵
    1. Zaki AM,
    2. van Boheemen S,
    3. Bestebroer TM,
    4. Osterhaus AD,
    5. Fouchier RA
    . 2012. Isolation of a novel coronavirus from a man with pneumonia in Saudi Arabia. N. Engl. J. Med. 367:1814–1820. doi:10.1056/NEJMoa1211721.
    OpenUrlCrossRefPubMedWeb of Science
  18. 18.↵
    1. Palacios G,
    2. Druce J,
    3. Du L,
    4. Tran T,
    5. Birch C,
    6. Briese T,
    7. Conlan S,
    8. Quan PL,
    9. Hui J,
    10. Marshall J,
    11. Simons JF,
    12. Egholm M,
    13. Paddock CD,
    14. Shieh WJ,
    15. Goldsmith CS,
    16. Zaki SR,
    17. Catton M,
    18. Lipkin WI
    . 2008. A new arenavirus in a cluster of fatal transplant-associated diseases. N. Engl. J. Med. 358:991–998. doi:10.1056/NEJMoa073785.
    OpenUrlCrossRefPubMedWeb of Science
  19. 19.↵
    1. Xu B,
    2. Liu L,
    3. Huang X,
    4. Ma H,
    5. Zhang Y,
    6. Du Y,
    7. Wang P,
    8. Tang X,
    9. Wang H,
    10. Kang K,
    11. Zhang S,
    12. Zhao G,
    13. Wu W,
    14. Yang Y,
    15. Chen H,
    16. Mu F,
    17. Chen W
    . 2011. Metagenomic analysis of fever, thrombocytopenia and leukopenia syndrome (FTLS) in Henan Province, China: discovery of a new bunyavirus. PLoS Pathog. 7:e1002369. doi:10.1371/journal.ppat.1002369.
    OpenUrlCrossRefPubMed
  20. 20.↵
    1. Tan LV,
    2. van Doorn HR,
    3. Nghia HD,
    4. Chau TT,
    5. Tu LTP,
    6. de Vries M,
    7. Canuti M,
    8. Deijs M,
    9. Jebbink MF,
    10. Baker S,
    11. Bryant JE,
    12. Tham NT,
    13. BKrong NTTC,
    14. Boni MF,
    15. Loi TQ,
    16. Phuong LT,
    17. Verhoeven JT,
    18. Crusat M,
    19. Jeeninga RE,
    20. Schultsz C,
    21. Chau NVV,
    22. Hien TT,
    23. van der Hoek L,
    24. Farrar J,
    25. de Jong MD
    . 2013. Identification of a new cyclovirus in cerebrospinal fluid of patients with acute central nervous system infections. mBio 4:e00231–00213. doi:10.1128/mBio.00231-13.
    OpenUrlCrossRefPubMed
  21. 21.↵
    1. Chiu CY
    . 2013. Viral pathogen discovery. Curr. Opin. Microbiol. 16:468–478. doi:10.1016/j.mib.2013.05.001.
    OpenUrlCrossRefPubMed
  22. 22.↵
    1. de Vries M,
    2. Oude Munnink BB,
    3. Deijs M,
    4. Canuti M,
    5. Koekkoek SM,
    6. Molenkamp R,
    7. Bakker M,
    8. Jurriaans S,
    9. van Schaik BD,
    10. Luyf AC,
    11. Olabarriaga SD,
    12. van Kampen AH,
    13. van der Hoek L
    . 2012. Performance of VIDISCA-454 in feces-suspensions and serum. Viruses 4:1328–1334. doi:10.3390/v4081328.
    OpenUrlCrossRefPubMed
  23. 23.↵
    1. Francis OE,
    2. Bendall M,
    3. Manimaran S,
    4. Hong C,
    5. Clement NL,
    6. Castro-Nallar E,
    7. Snell Q,
    8. Schaalje GB,
    9. Clement MJ,
    10. Crandall KA,
    11. Johnson WE
    . 2013. Pathoscope: species identification and strain attribution with unassembled sequencing data. Genome Res. 23:1721–1729. doi:10.1101/gr.150151.112.
    OpenUrlAbstract/FREE Full Text
  24. 24.↵
    1. Roux S,
    2. Faubladier M,
    3. Mahul A,
    4. Paulhe N,
    5. Bernard A,
    6. Debroas D,
    7. Enault F
    . 2011. Metavir: a web server dedicated to virome analysis. Bioinformatics 27:3074–3075. doi:10.1093/bioinformatics/btr519.
    OpenUrlCrossRefPubMed
  25. 25.↵
    1. Zhao G,
    2. Krishnamurthy S,
    3. Cai Z,
    4. Popov VL,
    5. Travassos da Rosa AP,
    6. Guzman H,
    7. Cao S,
    8. Virgin HW,
    9. Tesh RB,
    10. Wang D
    . 2013. Identification of novel viruses using VirusHunter—an automated data analysis pipeline. PLoS One 8:e78470. doi:10.1371/journal.pone.0078470.
    OpenUrlCrossRefPubMed
  26. 26.↵
    1. Wang Q,
    2. Jia P,
    3. Zhao Z
    . 2013. VirusFinder: software for efficient and accurate detection of viruses and their integration sites in host genomes through next generation sequencing data. PLoS One 8:e64465. doi:10.1371/journal.pone.0064465.
    OpenUrlCrossRefPubMed
  27. 27.↵
    1. Cheval J,
    2. Sauvage V,
    3. Frangeul L,
    4. Dacheux L,
    5. Guigon G,
    6. Dumey N,
    7. Pariente K,
    8. Rousseaux C,
    9. Dorange F,
    10. Berthet N,
    11. Brisse S,
    12. Moszer I,
    13. Bourhy H,
    14. Manuguerra CJ,
    15. Lecuit M,
    16. Burguiere A,
    17. Caro V,
    18. Eloit M
    . 2011. Evaluation of high-throughput sequencing for identifying known and unknown viruses in biological samples. J. Clin. Microbiol. 49:3268–3275. doi:10.1128/JCM.00850-11.
    OpenUrlAbstract/FREE Full Text
  28. 28.↵
    1. de Vries M,
    2. Deijs M,
    3. Canuti M,
    4. van Schaik BD,
    5. Faria NR,
    6. van de Garde MD,
    7. Jachimowski LC,
    8. Jebbink MF,
    9. Jakobs M,
    10. Luyf AC,
    11. Coenjaerts FE,
    12. Claas EC,
    13. Molenkamp R,
    14. Koekkoek SM,
    15. Lammens C,
    16. Leus F,
    17. Goossens H,
    18. Ieven M,
    19. Baas F,
    20. van der Hoek L
    . 2011. A sensitive assay for virus discovery in respiratory clinical samples. PLoS One 6:e16118. doi:10.1371/journal.pone.0016118.
    OpenUrlCrossRefPubMed
  29. 29.↵
    1. Barzon L,
    2. Lavezzo E,
    3. Militello V,
    4. Toppo S,
    5. Palu G
    . 2011. Applications of next-generation sequencing technologies to diagnostic virology. Int. J. Mol. Sci. 12:7861–7884. doi:10.3390/ijms12117861.
    OpenUrlCrossRefPubMed
  30. 30.↵
    1. Langmead B,
    2. Salzberg SL
    . 2012. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9:357–359. doi:10.1038/nmeth.1923.
    OpenUrlCrossRefPubMedWeb of Science
  31. 31.↵
    1. Quinlan AR,
    2. Hall IM
    . 2010. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26:841–842. doi:10.1093/bioinformatics/btq033.
    OpenUrlCrossRefPubMedWeb of Science
  32. 32.↵
    R Core Team. 2013. R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.
  33. 33.↵
    1. Zaharia M,
    2. Bolosky WJ,
    3. Curtis K,
    4. Fox A,
    5. Patterson D,
    6. Shenker S,
    7. Stoica I,
    8. Karp RM,
    9. Sittler T
    . 2011. Faster and more accurate sequence alignment with SNAP. ArXiv 1111:5572. http://arxiv.org/abs/1111.5572.
    OpenUrl
  34. 34.↵
    1. Callejas S,
    2. Alvarez R,
    3. Benguria A,
    4. Dopazo A
    . 2014. AG-NGS: a powerful and user-friendly computing application for the semi-automated preparation of next-generation sequencing libraries using open liquid handling platforms. Biotechniques 56:28–35.
    OpenUrlPubMed
  35. 35.↵
    1. Naccache SN,
    2. Greninger AL,
    3. Lee D,
    4. Coffey LL,
    5. Phan T,
    6. Rein-Weston A,
    7. Aronsohn A,
    8. Hackett J Jr,
    9. Delwart EL,
    10. Chiu CY
    . 2013. The perils of pathogen discovery: origin of a novel parvovirus-like hybrid genome traced to nucleic acid extraction spin columns. J. Virol. 87:11966–11977. doi:10.1128/JVI.02323-13.
    OpenUrlAbstract/FREE Full Text
PreviousNext
Back to top
Download PDF
Citation Tools
Comprehensive Human Virus Screening Using High-Throughput Sequencing with a User-Friendly Representation of Bioinformatics Analysis: a Pilot Study
Tom J. Petty, Samuel Cordey, Ismael Padioleau, Mylène Docquier, Lara Turin, Olivier Preynat-Seauve, Evgeny M. Zdobnov, Laurent Kaiser
Journal of Clinical Microbiology Aug 2014, 52 (9) 3351-3361; DOI: 10.1128/JCM.01389-14

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Print

Alerts
Sign In to Email Alerts with your Email Address
Email

Thank you for sharing this Journal of Clinical Microbiology article.

NOTE: We request your email address only to inform the recipient that it was you who recommended this article, and that it is not junk mail. We do not retain these email addresses.

Enter multiple addresses on separate lines or separate them with commas.
Comprehensive Human Virus Screening Using High-Throughput Sequencing with a User-Friendly Representation of Bioinformatics Analysis: a Pilot Study
(Your Name) has forwarded a page to you from Journal of Clinical Microbiology
(Your Name) thought you would be interested in this article in Journal of Clinical Microbiology.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Comprehensive Human Virus Screening Using High-Throughput Sequencing with a User-Friendly Representation of Bioinformatics Analysis: a Pilot Study
Tom J. Petty, Samuel Cordey, Ismael Padioleau, Mylène Docquier, Lara Turin, Olivier Preynat-Seauve, Evgeny M. Zdobnov, Laurent Kaiser
Journal of Clinical Microbiology Aug 2014, 52 (9) 3351-3361; DOI: 10.1128/JCM.01389-14
del.icio.us logo Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
  • Top
  • Article
    • ABSTRACT
    • INTRODUCTION
    • MATERIALS AND METHODS
    • RESULTS
    • DISCUSSION
    • ACKNOWLEDGMENTS
    • FOOTNOTES
    • REFERENCES
  • Figures & Data
  • Info & Metrics
  • PDF

Related Articles

Cited By...

About

  • About JCM
  • Editor in Chief
  • Board of Editors
  • Editor Conflicts of Interest
  • For Reviewers
  • For the Media
  • For Librarians
  • For Advertisers
  • Alerts
  • RSS
  • FAQ
  • Permissions
  • Journal Announcements

Authors

  • ASM Author Center
  • Submit a Manuscript
  • Article Types
  • Resources for Clinical Microbiologists
  • Ethics
  • Contact Us

Follow #JClinMicro

@ASMicrobiology

       

ASM Journals

ASM journals are the most prominent publications in the field, delivering up-to-date and authoritative coverage of both basic and clinical microbiology.

About ASM | Contact Us | Press Room

 

ASM is a member of

Scientific Society Publisher Alliance

 

American Society for Microbiology
1752 N St. NW
Washington, DC 20036
Phone: (202) 737-3600

 

Copyright © 2021 American Society for Microbiology | Privacy Policy | Website feedback

Print ISSN: 0095-1137; Online ISSN: 1098-660X