LETTER
Whole-genome sequencing (WGS) of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has led to a better understanding of the virus’s origin, transmission, and evolution (1–6). Multiplex amplicon sequencing for viral WGS is a preferred approach to library preparation since it is simple, sensitive, cost-effective, and scalable (7, 8). However, balancing multiplexed primers to achieve high sensitivity and even coverage can be difficult (8), and superlative analytical sensitivity is not assured. Here, we evaluated the Swift Biosciences’ single-tube SARS-CoV-2 multiplex amplicon sequencing panel for the recovery of genomes from low-viral-load samples (threshold cycle [CT] > 26 on a Hologic Panther Fusion system).
This work was approved by the University of Washington Institutional Review Board (proposal no. STUDY00000408). Libraries were constructed from double-stranded cDNA using Swift Biosciences’ Normalase amplicon SARS-CoV-2 panel. The resulting libraries were sequenced on 2× 300-bp MiSeq runs, and a median of 605,654 reads were obtained for each library. Genomes were assembled with a custom pipeline, TAYLOR (https://github.com/greninger-lab/covid_swift_pipeline). Briefly, sequence reads were trimmed using Trimmomatic v0.38, aligned to the Wuhan-Hu-1 genome (NCBI accession no. NC_045512.2) using BBMap v38.70 (https://sourceforge.net/projects/bbmap/), and trimmed of PCR primers using Primerclip (https://github.com/swiftbiosciences/primerclip). Consensus genomes were called bcftools v1.9.
The 61 samples sequenced had CT values ranging from 26.04 to 37.93. We recovered genomes from all samples with a CT of ≤32.16 and from a sample with a CT value of 36.77, equivalent to approximately 4.24 copies input (Fig. 1A) (9–11). For samples with a CT value between 32.01 and 34.00, we recovered genomes from 8/10 (80%) of the samples, and for samples with a CT value between 34.01 and 36.00, we recovered genomes from 4/10 (40%) of the samples. While we recovered genomes from just 3 of the samples with a CT value between 36.01 and 38.00, we were able to recover partial genomes for the other 7 samples (median genome covered, 36.0%; range, 4.9 to 73.7%).
Evaluation of the Swift Biosciences’ SARS-CoV-2 multiplex amplicon sequencing panel. (A) Complete genomes were recovered from all samples with a CT value of ≤32.16 and a CT value as high as 36.77. Samples for which complete genomes (>95% genome coverage) were recovered are highlighted in purple. Partial genomes are highlighted in gold. (B) SARS-CoV-2 sequences were highly enriched in the sequencing libraries as measured by the percentage of reads mapping to the reference genome for SARS-CoV-2 (NCBI accession no. NC_045512.2). Complete genomes were recovered for samples highlighted in purple, while partial genomes were recovered for those highlighted in gold. (C) The genome coverage between nucleotides 201 and 29741 of the SARS-CoV-2 reference genome is even. The 5th and 95th percentiles of coverage at each position across the 41 samples with a mean depth of >100× are plotted in purple. A 250-nucleotide window moving average is represented in gold. (D) The 46 SARS-CoV-2 samples with complete genomes belong to both major SARS-CoV-2 lineages. A phylogenetic tree with the 46 SARS-CoV-2 genomes recovered in this report and 109 other global strains was constructed with FastTree version 2.1.1. Strains belonging to lineage A are highlighted in purple, while those belonging to lineage B are highlighted in gold. Those genomes sequenced in this report are circled in black. SNV, single nucleotide variation.
The libraries produced with the Swift SARS-CoV-2 amplicon panel were highly enriched for SARS-CoV-2 reads. Samples with CT values ranging from 26.01 to 32.00 had a median on-target percentage of 98.5% (range, 93.1 to 99.0%) after removal of reads attributed to primer dimer formation (Fig. 1B). Among samples with CT values from 32.01 to 38.00, the median on-target percentage was 92.4% (range, 16.5 to 98.7) (Fig. 1B).
We also assessed the coverage distribution from samples with an average depth of >100×. The coverage across the genome for the 41 samples analyzed was highly even (Pielou’s evenness, 0.988) (Fig. 1C). To assess reproducibility, we performed 8 separate library preparations on a single sample. All 8 preparations yielded identical consensus sequences, demonstrating the high reproducibility of the Swift SARS-CoV-2 amplicon panel.
Lastly, we performed a phylogenetic analysis of the 46 strains with complete genomes and 109 randomly selected global SARS-CoV-2 strains. The 46 strains belonged to both major lineages defined by pangolin (https://github.com/cov-lineages/pangolin) (12) and reflected the genomic diversity currently circulating in the SARS-CoV-2 population (Fig. 1D).
In summary, the Swift SARS-CoV-2 amplicon panel is a simple, highly sensitive approach for recovering SARS-CoV-2 genomes. The panel has allowed for the study of genomic rearrangements and mutations that are uniquely associated with low-viral-load samples (13, 14).
Data availability.Sequencing data are available under NCBI BioProject no. PRJNA610428 (Table 1). Code for assembling consensus FASTA genomes from FASTQ files is also available online (https://github.com/greninger-lab/covid_swift_pipeline).
Assembly and sequencing read accession numbers for strains sequenced in this studya
ACKNOWLEDGMENT
Swift Biosciences provided reagent for optimization of this protocol but had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
FOOTNOTES
- Accepted manuscript posted online 12 October 2020.
Supplemental material is available online only.
- Copyright © 2020 American Society for Microbiology.