Copy Number Heterogeneity of JC Virus Standards

ABSTRACT Quantitative PCR is a diagnostic mainstay of clinical virology, and accurate quantitation of viral load among labs requires the use of international standards. However, the use of multiple passages of viral isolates to obtain sufficient material for international standards may result in genomic changes that complicate their use as quantitative standards. We performed next-generation sequencing to obtain single-nucleotide resolution and relative copy number of JC virus (JCV) clinical standards. Strikingly, the WHO international standard and the Exact v1/v2 prototype standards for JCV showed 8-fold and 4-fold variation in genomic coverage between different loci in the viral genome, respectively, due to large deletions in the large T antigen region. Intriguingly, several of the JCV standards sequenced in this study with large T antigen deletions were cultured in cell lines immortalized using simian virus 40 (SV40) T antigen, suggesting the possibility of transcomplementation in cell culture. Using a cutoff 5% allele fraction for junctional reads, 7 different rearrangements were present in the JC virus sequences present in the WHO standard across multiple library preparations and sequencing runs. Neither the copy number differences nor the rearrangements were observed in a clinical sample with a high copy number of JCV or a plasmid control. These results were also confirmed by the quantitative real-time PCR (qPCR), droplet digital PCR (ddPCR), and Sanger sequencing of multiple rearrangements. In summary, targeting different regions of the same international standard can result in up to an 8-fold difference in quantitation. We recommend the use of next-generation sequencing to validate standards in clinical virology.

dards in virology often relies on growing up large amounts of virus in cell culture to recapitulate the viral particle in extraction and to ensure future preparations are not required. However, passaging a virus several times can create selection pressures that do not recapitulate viral biology in vivo. In this study, we used next-generation sequencing and confirmatory quantitative real-time PCR (qPCR) and droplet digital PCR (ddPCR) to show that several of the standards used for the JCV quantitative PCR mixture contain multiple rearrangements, some with large deletions in the T antigen region, that incur large variability in measured copies depending on the locus measured. We hypothesize that these deletions are due to passage of the virus in simian virus 40 (SV40) T antigen immortalized cell lines.

RESULTS
Based on our previous discovery of copy number heterogeneity in the BK virus international standard, we performed next-generation sequencing on a variety of JCV standards present in our laboratory. We obtained a provisional WHO standard for JCV, two different versions of the Exact (v1 and v2) standard, a urine specimen with a high copy number of JCV present, the ATCC 1397 JC virus strain, and the original 1980 plasmid with the Mad-1 strain of JC virus cloned into it. All strains were sequenced using Nextera XT libraries on an Illumina MiSeq. qPCR and ddPCR using the Focus PCR primers targeting the VP2/3 region and the University of Washington (UW) Virology clinical primers targeting the T antigen region were performed as well (refer to Fig. 1B for primers).
Deep sequencing of the JC virus standards resulted in between 0.4% and 80% of the sequencing reads mapping to the JC virus ( Table 1). Sequencing of DNA extracted from a urine specimen that was known to contain a high copy number of JC virus as well as DNA extracted from a plasmid containing cloned JC virus both yielded the lowest variation in coverage (Fig. 1B). qPCR and ddPCR quantitation of the clinical urine specimen indicated equal copy numbers at the VP2/3 and T antigen loci of these materials.
The WHO international standard contained an approximate 8-fold variation in coverage between structural genes and much of the T antigen region and contained the largest coefficient of variation in coverage among the strains at 98.4% ( Fig. 1B and  D). Analysis of junctional reads present in the WHO international standard at a more than 5% allele frequency revealed 7 different rearrangements between loci more than 50 bp apart. The Exact v1 and v2 standards and the ATCC 1397 standard contained a complex rearrangement that resulted in the same duplication of the C terminus of the VP2/3 gene inserted into a C terminus of the T antigen (Fig. 2). Multiple rearrangements in these strains that were detected by deep sequencing were confirmed by PCR and Sanger sequencing (Fig. 2). Of note, both of the Exact strains as well as the ATCC 1397 strain were grown in Cos cell lines (J. Boonyarantanakornkit, Exact Diagnostics, personal communication). The Exact v1 and v2 standards both yielded an approximate 4-to 5-fold difference in copy number between structural genes and T antigen, while the ATCC 1397 strain had a 1.5-fold difference in copy number between these loci ( Fig. 1C and D; see also Table S1 in the supplemental material).

DISCUSSION
Using deep sequencing and confirmatory qPCR, ddPCR, and Sanger sequencing, we show four different qPCR standards for strains containing JC virus with large, different deletions in the T antigen region. These deletions result in significant copy number differences between different loci in the JC virus genome that may lead to radically different sensitivities and quantities being reported by various JC virus clinical qPCR assays if these standards are adopted by the clinical virology community. In contrast, deep sequencing of a clinical specimen with JC virus and of a plasmid containing the JC virus sequence do not show the large differences in copy number found in the cell culture-adapted strains. These results mirror those we recently reported in an international standard for BK virus (41). The y axis is normalized such that 1 designates the average coverage across the viral genome to highlight relative differences in coverage. Three of the six standards include a large deletion in the T antigen region that constitutes a greater than 4-fold difference in copy number relative to the structural genes. Reductions in coverage in the regulatory repeat region are due both to small deletions and sequence divergence relative to JC virus reference genome. Primers for the Focus PCR analyte-specific reagent targeting the VP2/3 region are shown on the JC virus genome in red, while the pep primers targeting the T antigen region are shown in blue. (C, D) Confirmation of the copy number differences seen by sequencing was performed with qPCR (C) and ddPCR (D) using PCR primers against the VP2/3 gene (red) and T-ag gene (blue). Ten-fold dilutions of each of the standards depicted were quantitated, and the discrepancy in cycle threshold (C T ) and absolute quantitation (D) between the VP2/3 and T-ag assays are depicted (green) for each standard. ΔC T , change in C T .
Of note, most of the JC virus standards sequenced in this study were cultured in the Cos cell line. The Cos cell line is derived from a primary monkey cell line that was transformed with SV40 T antigen and expresses both large T antigen and small t antigen (15). Given that the deletions seen in this study occurred in the T antigen region of the JC virus, that they were seen in approximately 90% of the JC virus strains present, and that T antigen is known to be required for polyomavirus replication, we hypothesize that the SV40 T antigen in the Cos cell line is providing nonstructural functions, such as DNA binding, unwinding, viral replication, and transformation in trans, to the JC virus strains with the T antigen deletions sequenced in this study. Indeed, SV40 T antigen has been shown to be required for archetype JC virus replication in Cos cells, as JC virus is unable to replicate in the untransformed Cos parental cell line CV-1 (16). The presence of SV40 T antigen also allows primary human fetal glial cells to support the growth of JC virus, which otherwise does not grow (17). Thus, SV40 T-antigen-transformed cell lines may have been specifically chosen for their ability to support high levels of JC virus growth given the need for large amounts of highly concentrated virus to provide qPCR standards to multiple labs around the world, albeit with the unforeseen consequence that the cell line would yield JC virus strains with large deletions.
The many functions of the polyomavirus large T antigen include binding multiple host proteins (including p68 DNA polymerase/primase, p53, Rb, hsc70, and replication protein A), binding to itself to form hexamer, binding the origin of replication, unwinding DNA, and translocating to the nucleus (18,19). Of these, only origin binding would be potentially compromised by complementation with a different origin. The BK, JC, and SV40 viruses have nearly identical sequences in their origin of replication and surrounding sequences (20). They only differ in the singular nucleotide flanking the pentanucleotide GAGGC motif that is not involved in hairpin formation (SV40, gaggc Cgaggc; JCV, gaggcGgaggc; BKV, gaggcAgaggc) (1). Previous experiments with SV40 T antigen have shown that it is capable of binding to the JC virus and BK virus origin sequence and initiating replication in vitro and in vivo (21,22). Indeed, the SV40 T antigen is noted to have an even broader sequence-binding capability than that of JC virus (23).
The recovery of the JC and BK virus sequences with large deletions in cell culture is likely due to the unique biology and sequence identity of the polyomaviruses. Encapsidation of viral DNA is thought to be dependent on viral structural proteins with no contribution from viral nonstructural proteins beyond DNA replication (24). The amino acid sequence conservation between SV40 and the JC and BK virus T antigen is approximately 73%. As described above, the origin DNA sequence is nearly identical between different polyomaviruses, with the nucleotide sequence required for T antigen binding being absolutely conserved. The JC virus T antigen J domain can replace the SV40 J domain and retain replication activity (25). Chimeras of JC virus and SV40 showed that viruses containing JC virus regulatory sequences and SV40 coding regions can replicate, which is consistent with the JC virus recovered in this study (26). Thus, polyomavirus nonstructural proteins show the ability to complement each other (27,28). Indeed, different deletions were recovered in three of the cell culture-associated BK and JC polyomaviruses sequenced, suggesting that each of these deletions arose independently in culture.
Our study shows the importance of deep sequencing standards to validate reagent  integrity before they are scaled internationally (29). This study adds to the many uses of next-generation sequencing in the clinical virology laboratory (30,31). The main limitation of our study is the use of short-read sequencing to sequence the strain, as we are thus unable to link rearrangements across the multiple viruses present in each standard (32). Deep sequencing provides single-nucleotide resolution of the sequences present and the relative copy numbers of loci across the genome. Deep sequencing of these standards demonstrated the presence of multiple viral species with radically different copy numbers due to deletions, as well as single-nucleotide changes that may affect PCR primer binding and overall quantitation, making growth of BK and JC viruses in SV40-transformed cell lines a potentially suboptimal choice for a qPCR international standard.
Next-generation sequencing. Extracted DNA from the standards was diluted to 0.1 to 0.2 ng/l and was used for dual-indexed Nextera XT sequencing library preparation followed by 18 cycles of amplification (36). Strains from the WHO, Exact v1, pMAD1 plasmid, as well as a clinical strain from urine were sequenced on a single-end 185-bp run on an Illumina MiSeq. JCV 1397 from the ATCC, WHO, Exact v1, Exact v2, and the urine clinical strain were also sequenced on a 2 ϫ 260-bp run on an Illumina MiSeq. Sequencing reads were adapter and quality filtered (Q30) using cutadapt and aligned to the JC virus reference genome (GenBank accession number NC_001699) using the Geneious v9.1.4 mapper with structural variant detection enabled (37)(38)(39). Coverage maps were produced from bam files generated using Geneious using the genomecov option from bedtools (40).
To determine the locus used in the Focus Diagnostics analyte-specific reagent, an amplicon created via real-time PCR with the Focus primers was subjected to half reaction of end repair and dA tailing, followed by adapter ligation and PCR amplification with TruSeq adapters using a Kapa HyperPlus kit. The indexed amplicon was sequenced on a 2 ϫ 500-bp MiSeq run, and reads were adapter/qualitytrimmed and mapped to the JC virus reference genome as above. The location of the primers is depicted in Fig. 1B.