TABLE 1

Tools and programs for analysis of HTS data used in the COMPARE virus proficiency testa

Program (reference)ApplicationDescription/relevance for viral HTSURL
BWA (10)Alignment (nucleotide)Burrows-Wheeler Alignment Tool for efficient alignment of short sequencing reads against a large reference genome. Based on string matching with Burrows-Wheeler transform.http://bio-bwa.sourceforge.net/
DIAMOND (14)Alignment (protein)Double-index alignment of NGS data. Shown to be as much as 20,000 times faster than comparable programs, with high sensitivity.http://ab.inf.uni-tuebingen.de/software/diamond/
FastQC (9)Quality control, trimmingGenerates base quality scores and sequence contents, sequence length distributions, identification of duplicate or overrepresented sequences, adapter, and k-mer contents.https://www.bioinformatics.babraham.ac.uk/projects/fastqc/
Kmerfinder (40)Taxonomic assignmentOnline user interface also allows the prediction of human and vertebrate viruses.https://cge.cbs.dtu.dk//services/KmerFinder/
Kraken (15)Alignment (nucleotide)Uses only exact alignments for its taxonomic classification with high speed.https://ccb.jhu.edu/software/kraken/
MetaPhlAnTaxonomic assignmentMetagenomic Phylogenetic Analysis is a tool for the taxonomic assignment of microbial communities. High accuracy and speed are supported by only high-confidence matches. Such approaches allow the assignment of 25,000 microbial reads per second but might fail with viral genomes, which often lack common markers and genes.https://bitbucket.org/biobakery/metaphlan2
MGMapper (41)PipelineOnline tool for processing, assigning, and analyzing HTS sequences.https://bitbucket.org/genomicepidemiology/mgmapper
MIRADe novo assemblyMimicking Intelligent Read Assembly, an overlap-layout-consensus graph (OLC) assembler for metagenomics data from several sequencing platforms. Assembles the most as well as the largest contigs among de novo assembly programs, as well as producing the highest number of contigs that could be assigned to a viral taxon.https://sourceforge.net/projects/mira-assembler/
NCBI BLAST (16)Alignment (nucleotide and protein)Basic local alignment search tool. Offers very sensitive online and stand-alone alignments of nucleotides, translated nucleotides, and protein sequences.https://blast.ncbi.nlm.nih.gov/Blast.cgi
One Codex (42)Taxonomic assignmentWeb-based data platform for k-mer-based taxonomic classification. Very high degrees of sensitivity and specificity, even when analyzing highly divergent and mutated sequences.https://www.onecodex.com/
PAIPline (20)PipelinePipeline for metagenomic analysis of HTS data.https://gitlab.com/rki_bioinformatics/paipline
QUASR (43)PipelineCombination of several R packages and external software for HTS read analysis. Part of the Bioconductor project.http://www.bioconductor.org/packages/release/bioc/html/QuasR.html
RIEMS (18)PipelinePipeline for metagenomics sequence analysis, combining several established programs and tools for pathogen detection in one automated workflow. Separated into a workflow of accurate and fast “basic analysis” and a more sensitive “further analysis.”https://www.fli.de/en/institutes/institute-of-diagnostic-virology-ivd/laboratories-working-groups/laboratory-for-ngs-and-microarray-diagnostics/
Skewer (44)Quality control, trimmingTrimming of primer and adapter sequences focusing on the characteristics of paired-end and mate-pair reads. A statistical scheme based on quality values allows the accurate trimming of adapters with mismatches.https://sourceforge.net/projects/skewer/
SNAP (45)Alignment (nucleotide)As much as 10 to 100 times faster than similar alignment programs but offers greater sensitivity due to richer error acceptance.http://snap.cs.berkeley.edu/
SPAdes, MetaSPAdes (12)De novo assemblyDe Bruijn graph assembler. MetaSPAdes specifically addresses the challenges that arise with complex metagenomics data.http://cab.spbu.ru/software/spades/
Taxonomer (46)Taxonomic assignmentWeb-based tool for nucleotide- and protein-based read assignment. User-friendly interactive result visualization. Based on exact k-mer matching with low error tolerance. Speed as high as ∼32 million reads/min. Furthermore, protein-based read identification offers the detection of divergent viral sequences but is based on exact k-mer matching without error allowance.https://www.taxonomer.com/
Trimmomatic (8)Quality control, trimmingPaired-end sequence reads can be cut from technical sequences as adapters, primers, or low-quality bases. Has been shown to improve downstream analyses considerably, for example, de novo assembly (increasing contig size up to 77%) and alignment (increasing alignment rates from 7% to 78%).www.usadellab.org/cms/index.php?page=trimmomatic
USEARCH (17)Alignment (protein)Exceptionally high speed for protein or translated nucleotide read alignment. The sensitivity of USEARCH is comparable to that of the NCBI protein BLAST, but USEARCH is ∼350 times faster.https://www.drive5.com/usearch/
Velvet (13)De novo assemblyCan be used for de novo assemblies of short HTS reads using the de Bruijn algorithm. De novo assembly using Velvet can be achieved in as little as 14 min.https://www.ebi.ac.uk/~zerbino/velvet/
  • a Listed in alphabetical order.