Validation and Implementation of Clinical Laboratory Improvements Act-Compliant Whole-Genome Sequencing in the Public Health Microbiology Laboratory

ABSTRACT Public health microbiology laboratories (PHLs) are on the cusp of unprecedented improvements in pathogen identification, antibiotic resistance detection, and outbreak investigation by using whole-genome sequencing (WGS). However, considerable challenges remain due to the lack of common standards. Here, we describe the validation of WGS on the Illumina platform for routine use in PHLs according to Clinical Laboratory Improvements Act (CLIA) guidelines for laboratory-developed tests (LDTs). We developed a validation panel comprising 10 Enterobacteriaceae isolates, 5 Gram-positive cocci, 5 Gram-negative nonfermenting species, 9 Mycobacterium tuberculosis isolates, and 5 miscellaneous bacteria. The genome coverage range was 15.71× to 216.4× (average, 79.72×; median, 71.55×); the limit of detection (LOD) for single nucleotide polymorphisms (SNPs) was 60×. The accuracy, reproducibility, and repeatability of base calling were >99.9%. The accuracy of phylogenetic analysis was 100%. The specificity and sensitivity inferred from multilocus sequence typing (MLST) and genome-wide SNP-based phylogenetic assays were 100%. The following objectives were accomplished: (i) the establishment of the performance specifications for WGS applications in PHLs according to CLIA guidelines, (ii) the development of quality assurance and quality control measures, (iii) the development of a reporting format for end users with or without WGS expertise, (iv) the availability of a validation set of microorganisms, and (v) the creation of a modular template for the validation of WGS processes in PHLs. The validation panel, sequencing analytics, and raw sequences could facilitate multilaboratory comparisons of WGS data. Additionally, the WGS performance specifications and modular template are adaptable for the validation of other platforms and reagent kits.


Description of the assay
Whole genome sequencing (WGS) is an analytical process that determines the complete DNA sequence of an organism's genome in a single reaction. WGS provides the most comprehensive information on genetic variations within and between species, because it can identify low frequency variants and genome rearrangements that may be missed using other molecular methods. Sequencing the entire microbial genome is important for microbial identification, subtyping, and other comparative genomic studies. Illumina sequencing technology used in MiSeq desktop sequencer leverages clonal array formation and reversible terminator technology for rapid and accurate large-scale WGS.
First, genomic DNA (gDNA) is extracted from bacterial cells and the concentration of double stranded DNA is fluorometrically quantified. Next, the gDNA is randomly fragmented by enzyme into a library of small fragments that can be uniformly and accurately sequenced in multiple parallel reactions. Adaptors are ligated to both ends of the DNA fragments. The Nextera XT DNA Sample Preparation Kit uses an engineered transposome to simultaneously fragment and tag ("tagment") input DNA, adding special adapter sequences in the process: Generation of library fragments using Nextera XT kit A-Nextera XT transposome with adapters is combined with template DNA B-Tagmentation to fragment and add adapters C-Limited cycle PCR to add sequencing primer sequences and indices The sequencing cycles are repeated to determine the sequence of bases in a fragment, one base at a time: Identified short sequences of a sample are reassembled together by resequencing (using known genome as scaffold), or by de novo assembly (in the absence of a reference genome). Next generation sequencing is a universal tool which enables analysis of a genome of any biological organism, particularly any bacterial species.

Regulations
Whole genome sequencing is considered to be a laboratory-developed test, not cleared or approved by the FDA.
"Each laboratory that … introduces a test system not subject to FDA clearance or approval (including methods developed in-house and standardized methods such as textbook procedures), or uses a test system in which performance specifications are not provided by the manufacturer must, before reporting patient test results, establish for each test system the performance specifications for the following performance characteristics, as  This validation examines the following steps: Performance characteristic Definition for NGS applications*

Accuracy
The degree of agreement between the nucleic acid sequences derived from the assay and a reference sequence.

Precision
The degree to which repeated sequence analyses give the same result repeatability (within-run precision) and reproducibility (between-run precision).

Analytical sensitivity (Limit of detection)
The likelihood that the assay will detect the targeted sequence variations, if present.

Analytical specificity
The probability that the assay will not detect a sequence variation when none are present (the false positive rate is a useful measurement for sequencing assays).

Not acceptable specimens for other Biosafety Risk group 3 organisms
Select agents cannot be processed by the personnel who are not cleared for the handling of the select agents. In the case of exceptional public health emergency this policy can be overturned by Lab Director.

Negative control for sequencing process
To control for contamination during the library preparation and sequencing process, index combination which doesn't correspond to any sample in the current sequencing run should be added to the indexes demultiplexing step. Index combination for negative control should correspond to one of the index combinations used in the previous sequencing run, this way it would capture carry-over contamination with the library fragments generated in the previous run. Thresholds for negative control: Parameter Required value for negative control Number of reads after trim <10,000 N50 for de novo assembled reads < 1,000 The highest coverage of de novo assembled contigs < 10x

Negative control for sequencing analysis
In the case of epidemiological typing, unrelated strain of the same species as pathogen caused potential outbreak should be included in the analysis as a negative control. A sequence of epidemiologically unrelated negative control can be acquired from SRA NCBI database (http://www.ncbi.nlm.nih.gov/sra/) or European Nucleotide Archive (http://www.ebi.ac.uk/ena).

Quality assurance & Quality control
Preliminary quality thresholds were selected based on literature data and previous sequencing experience. The quality thresholds were used to determine when to exclude failed validation sequencing results from the analysis and to repeat sequencing. The preliminary quality metrics have to be reviewed after validation is completed and adjusted in accordance with acquired data.

Each run QC
At every run QC must be performed by monitoring following quality metrics: a) quality of the input DNA for all tested samples b) quantity of the input DNA for all tested samples c) DNA library size distribution for representative samples d) DNA library concentration for all tested samples e) quality of the sequences for all tested samples f) quality of spiked in positive PhiX control sequences g) quality of negative sequencing process/analysis control.
Ensure proper record keeping for quality assurance. Mark all manipulations performed during the samples processing starting from the moment of reception of cultures/DNA samples till completion of sequencing analysis by filling out the Core Lab Tracking form and MiSeq Data. Indicate if any changes which have been introduced to the protocol. Note if quality metrics weren't met and any corrective measures were undertaken in Comments sections in the corresponding chapter of the Tracking form (metrics a-d) or in the corresponding field of MiSeq Data log (metrics e-g) .

Input DNA and DNA library quality and quantity metrics
Quality of the input DNA for all tested samples is estimated via ratio of absorbance at 260 nm to absorbance at 280 nm. A value of 260/280 absorbance must be >1.7.
Quantity of the input DNA for all tested samples is measured using Qubit fluorometer and corresponding reagents kit. DNA concentration should be ≥1ng/µl.
If abovementioned quality parameters are not met, repeat and troubleshoot DNA isolation step.

DNA library quality and quantity metrics
DNA library size distribution for representative samples must be measured using BioAnalyzer instrument and corresponding reagents kits. Representative samples should include the following variety (if present within the run): different species, Gram-positive and Gram-negative bacteria, and species with different GC content. An average size of the library must be within the range of 300bp-3kb.
DNA library concentration is measured using Qubit fluorometer and corresponding reagents kit and must be ≥ 1nM.
If abovementioned quality parameters are not met, repeat and troubleshoot tagmentation/amplification/post-PCR cleanup steps of library preparation.
In some cases BioAnalyzer run failure leads to missing or shifted library peaks, in this case repeat BioAnalyzer run.

Quality metrics of the tested samples sequences
Quality metrics of tested samples must meet the following parameters [preliminary quality thresholds]: • Percent of bases with quality score >Q30 for the run must be ≥ 50% • Q30 score for generated genome sequences must be ≥75% for at least 80bp of the read length.
• Average depth of coverage must be ≥ 10x across the whole genome If abovementioned quality parameters are not met, repeat and troubleshoot library preparation and library pooling/loading. [Preliminary] quality threshold for spiked-in positive control: • PhiX error rate must be <6% If abovementioned quality parameter is not met, verify expiration dates of PhiX stock and storage conditions, check that used denatured 20 pM PhiX library was within 3 weeks of preparation. Troubleshoot library denaturation and library pooling/loading. If the problem persists, contact Illumina tech support, hence the high PhiX error rate might be caused by MiSeq hardware malfunction. 7.1.5. Quality metrics of negative control sequences 7.1.5.1. Quality metrics of negative control sequences for sequencing process Negative control of sequencing process represents an index combination which doesn't correspond to any sample in the current sequencing run but matches one of the index combinations used in the previous sequencing run. If negative control doesn't meet [preliminary] quality parameters below, this indicates a possibility of carry-over contamination with the library fragments generated in the previous run: • Number of reads after trim must be <10,000 • N50 for de novo assembled reads must be < 1,000 • The highest coverage of de novo assembled contigs must be < 10x Perform template line wash of MiSeq instrument with 0.01% sodium hypochlorite and clean all working surfaces with 10% bleach in a case of failure to meet abovementioned quality parameters.

Quality metrics of negative control for sequencing analysis
Epidemiologically unrelated negative control should not cluster with tested samples on the phylogenetic tree. If negative control clusters together with epidemiologically unrelated samples: a) Repeat analysis; b) Ensure that coverage of all samples meets minimum quality parameters (see chapter 7.1.3); c)Use different Reference sequence for mapping if available; d) Add additional epidemiologically unrelated negative control for sequencing analysis.

Monthly positive QC control
Perform monthly QC testing by including control Escherichia coli ATCC 25922 genome into library pool as the last sample. Perform assembly of E.coli ATCC 25922 genome. Sequences for monthly QC control must meet following [preliminary] metrics: • Average coverage of the genome must be ≥ 10x.
• Q30 score for E.coli ATCC 25922 genome sequence must be ≥75% for at least 80bp of the read length. Following assays results must be acquired for monthly QC control: • MLST allelic profile acquired during the run must correspond to known ST73 profile described for this strain.

Monthly negative QC control
Perform monthly QC testing by including no-template (water) as a negative quality control for detection of reagents contamination. Start with the step of DNA isolation, for DNA resuspension use Tris HCl buffer, which was used in previous runs or prepare new buffer using MilliQ water source in the lab. Measure DNA concentration in the negative sample using NanoDrop and Qubit. Proceed through all library preparation steps. Measure library concentration and size distribution using Qubit and Bioanalyzer.
Negative quality control sample must meet following parameters: • Genomic DNA concentration in the negative sample according to NanoDrop and Qubit must be 0±0.1ng/µl • Library concentration for the negative sample according to Qubit must be <0.5ng/µl • Bioanalyzer must show no peak for the negative sample • If library concentration for the negative sample is >0.5ng/µl and/or Bioanalyzer shows a peak on the electrophoregram-load sample for sequencing. For pooling take the volume of the negative sample which corresponds to the average volume of other samples. Quality metrics of the sequencing data for negative sample must meet following [preliminary] parameters: − Number of reads after trim must be <10,000 − N50 for de novo assembled reads must be < 1,000 − The highest coverage of de novo assembled contigs must be < 10x Record values for Monthly Positive and Negative controls into both MiSeq data log for the run and into the "Monthly QC log". Start a new Monthly QC log spreadsheet every calendar year.

Equipment calibration
Following equipment must be calibrated every 6 months: • NanoDrop-see Appendix 20 of Standard Operating Procedure CORE-PROC_WGS _001.
• PCR thermocycler-see Appendix 21 of Standard Operating Procedure CORE-PROC_WGS _001 . Following equipment must be calibrated at every run: • Qubit Fluorometer: by using 2 standards included in the kit.

Test Rejection Criteria
Test results should be rejected and test must be troubleshot and repeated if: • Read length at which 75% of bases have quality score ≥Q30 is less than 80bp • Average depth of coverage is < 10x across the whole genome • Calculated % error value for spike in PhiX is >6% • Percent of bases with quality score >Q30 for the run is < 50% • Negative control met two or more of the following parameters: ≥10,000 reads after trim, N50 ≥1,000, or highest coverage of de novo assembled contigs >10x Data need to be re-analyzed if: • Negative control clusters with epidemiologically unrelated samples on the phylogenetic tree

Validation samples
For the WGS validation, the ATCC bacterial strains or strains previously sequenced by CDC were sequenced by the MDL Core laboratory (further referred to as Core lab) using Illumina MiSeq sequencer. Reference strains with available whole genome sequences from NCBI or CDC were included in the validation in order to compare validation sequencing results with reference sequencing results. Reference strains used in validation of WGS by the Core lab are referred to as "validation samples". Validation samples included 34 bacterial samples for whole genome sequencing: 10 Enterobacteriaceae bacterial samples 5 gram-positive cocci bacterial samples 5 gram-negative non-fermenting bacterial samples 9 Mycobacterium tuberculosis samples 5 representatives of miscellaneous bacterial species. Sequences generated by Core laboratory by performing WGS of validation samples are referred to in this document as "validation sequences" (in opposite of "reference sequences" acquired from NCBI or CDC). Phylogenetic tree generated from validation sequences is referred to as "validation tree". Validation samples repeated within/between runs are referred to as "validation replicates".

Reference materials
Complete reference sequences of standard strains of the representative species were downloaded from the publicly available website of the National Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov/genome/) to serve as reference materials.
Reference materials in this validation study are represented by: 1) The complete genome of the same ATCC strain which was sequenced by the laboratory was used when genome of the strain sequenced by the laboratory is available from NCBI database.
2) The complete genome of a strain belonging to the same species as tested by the laboratory isolates, though not identical to the strain sequenced by the Core lab, was used when genome of the strain sequenced by the laboratory is NOT available from NCBI database.
3) Raw reads generated by CDC for the same strains which were also sequenced in the Core lab.
In the cases #1 and #2, the reference was downloaded from NCBI Genome database and used as a reference for mapping of the Core lab-generated sequences. SNP differences between laboratory replicates are shown in both cases. SNP differences between the reference sequence and Core lab sequences are analyzed only for the case #1. In the case #3, CDC reference raw reads were trimmed and mapped to the same complete genome scaffold as was used for mapping of Corelab sequences. The number of SNP differences between the laboratory sequences and CDC-generated sequences was determined. Thirty-four validation samples were sequenced in triplicate (within and between runs). For between run reproducibility assessment, all replicates were generated starting from fresh culture (exception: replicates for Mycobacterium tuberculosis samples were generated starting from DNA). Between run replicates were processed on different days, altering two operators (See testing schedule in Appendix 7). For within run replicates one DNA extract was used, but independent library preparations were done, with final samples being included in one sequencing run.

Reference materials
Data analysis is described in Appendix 5.
Confirmatory testing has to be performed if 16S rRNA ID or MLST results for validation samples don't match the reference results, by performing Sanger sequencing of 16S rRNA gene or MLST housekeeping genes, correspondingly. Confirmatory testing should be performed if results of antibiotic genes detection with WGS don't match the reference results, by performing antibiotic susceptibility testing via micro-broth dilution or discdiffusion methods. All discrepancies must be recorded.

Accuracy
In accuracy of WGS can be divided into three components: accuracy of platform, assay accuracy, and accuracy of bioinformatics pipeline.

MiSeq Platform Accuracy
Platform accuracy is assessed as the accuracy of identification of individual base pairs in a bacterial genome. We determined MiSeq Illumina platform accuracy by comparing base calling results with the reference sequence.
Also, we validated quality parameters, which affect platform accuracy and determined ranges, which allow accurate identification of individual base pairs. The preliminary quality thresholds were adjusted based on validation data to match the most stringent values of quality parameters which were detected during the validation (±5%). In several cases, the threshold was left at the level which was even more stringent than any of the detected values.

Quality parameters affecting platform accuracy
We have identified the quality parameters of the sequencing data affecting platform accuracy and established quality parameters thresholds, which provide ≥ 90% accuracy of base calling. Following types of errors affect the MiSeq Illumina accuracy of the platform: a) sequence errors introduced by DNA library preparation technique (e.g. amplification-introduced errors); and b) base calling accuracy of the sequencer. The set of quality parameters to account for corresponding types of errors were established.

a) Sequence errors introduced by DNA library preparation technique:
The first type of sequencing errors which are introduced by PCR errors during libraries amplification are stochastic and independently performed library preps are not likely to have the same errors. High depth and good uniformity of coverage reduce impact of sequencing errors [25]. For that reason the thresholds for depth and uniformity of coverage providing accurate variant calling were determined empirically during this validation: −Average depth of coverage must be ≥ 15x across genome. The minimum coverage of 15x was achieved for targeted areas used in gene-specific analysis: MLST scheme genes, 16S rRNA gene. If the minimum coverage threshold is not achieved for targeted areas, an alternate method such as Sanger sequencing should be used for sequencing of given genome region. The second type of errors, determined by the accuracy of base calling of the sequencer. The parameter used to estimate the accuracy of the base calling by the platform is the Phred quality score, which reflects a probability of incorrect base call. For Illumina sequencing platform the Phred score of Q30 is generally used and it corresponds to a probability of 1 incorrect base call in 1,000 [26]. The MiSeq sequencer specifications cited on the internet site of the manufacturer suggests a following base calling accuracy, measured by the Phred quality score (Q score): > 70% bases in 300bp-long fragments should have Phred score higher than Q30, while it is noted that "actual performance parameters may vary based on sample type, sample quality, and clusters passing filter".
While error rate in spiked in PhiX sequence is another quality metrics reflecting calling accuracy of the sequencer, it is not recommended for use as a sole quality control for platform accuracy [27]. For that reason we used error rate of the PhiX as an addition to other metrics in order to monitor platform base calling accuracy.
To account for base calling accuracy of the sequencer we evaluated following quality metrics: Accuracy of base calling: Sequence reads must have ≥ Q30 for more than 75% bases for at least 86bp of the read length. The average read length after trimming and discarding the base pairs with quality score <Q30 should be >109bp.
PhiX error rate: PhiX error % reflects sequencing accuracy for positive control PhiX spiked into each run. The PhiX error rate for the run should be <4.9%.
Sequencer run metrics were assessed to establish the optimal performance of the platform: Percent of bases with the quality score >Q30 for the run-MiSeq sequencer base calling accuracy metrics for the run. Should be > 57% for the 600 cycles MiSeq reagents.
Cluster density for the run-density of clusters formed by clonally amplified library fragments on the flow cell surface. Should be >800 K/mm 2 , Maximum 1700K/mm 2 . While preferably to have cluster density within the range of 800-1100 K/mm 2 .
Cluster passing filter of the run -percentage of clusters that pass quality filter for the purity of the signal. Should be >72% Below see the ranges established for the corresponding data metrics. The graphs represent a distribution of values by the number of samples with certain value.

Quality parameters of positive and negative controls.
Quality thresholds of positive and negative controls have been revised based on data acquired during validation.

Negative control for sequencing process (each-run QC)
The preliminary metrics are more stringent than acquired data (The results for negative controls included into the validation runs are summarized in the Appendix 12). Preliminary thresholds were left in place: • Number of reads after trim must be <10,000 • N50 for de novo assembled reads must be < 1,000 • The highest coverage of de novo assembled contigs must be < 10x

Spiked in positive PhiX control
The positive spiked-in control PhiX error rate should be <4.9%.

Negative monthly control
Preliminary thresholds were adequate and weren't changed: • Genomic DNA concentration in the negative sample according to NanoDrop and Qubit must be 0±0.1ng/ul • Library concentration for the negative sample according to Qubit must be <0.5ng/µl • Bioanalyzer must show no peak for the negative sample • If library concentration for the negative sample is >0.5ng/µl and/or Bioanalyzer shows a peak on the electrophoregram-load sample for sequencing. For pooling take the volume of the negative sample which corresponds to the average volume of other samples. Quality metrics of the sequencing data for negative sample must meet following [preliminary] parameters: − Number of reads after trim must be <10,000 − N50 for de novo assembled reads must be < 1,000 − The highest coverage of de novo assembled contigs must be < 10x Positive monthly control Positive monthly control E.coli ATCC 25922 quality thresholds were adjusted in accordance with collected data: • Average coverage of the genome must be ≥ 15x.
• Q30 score for E.coli ATCC 25922 genome sequence must be ≥75% for at least 86bp of the read length. Following assays results must be acquired for monthly QC control: • MLST allelic profile acquired during the run must correspond to known ST73 profile described for this strain. • 16S rRNA sequence must be identified as Escherichia coli • No antibiotic resistance genes should be found with ResFinder analysis • Following virulence genes should be found using VirulenceFinder analysis with 100% ID and 100% query length coverage:

Accuracy of base calling against reference sequence
The accuracy of the platform was established by determining the proximity of agreement between base calling made by MiSeq sequencer (measured value) and NCBI reference sequence (the true value). Reference sequences were available for 18 strains in the validation set: 10 Enterobacteriaceae bacterial samples, 4 grampositive cocci bacterial samples, 2 gram-negative non-fermenting bacterial samples, and 2 representatives of miscellaneous bacterial species.
We determined MiSeq Illumina platform accuracy by mapping generated reads to the corresponding reference sequence and identifying Single Nucleotide Polymorphisms (SNPs).Certain variations in the sequences are expected due to the possibility of mutations accumulation in the genomes of reference strains during cultivation. This could result in a number of SNPs differences detected between the reference and Core labgenerated validation sequences. For this reason, when comparing validation sequence results against reference sequence, the within-and between-run triplicate sequences of validation samples were taken into account. Reference sequence and sequences generated during 5 independent library preparations were compared. The SNP which is detected between reference and validation sequences should be considered as a sequencing error only when the SNP is detected in less than all 5 replicates. If a SNP detected between reference and validation sequence is identical in all 5 validation replicates, this SNP is not considered as a sequencing error, and instead is considered as a possible mutation in the reference strain: Accuracy here was estimated as percent agreement with reference sequence. For the purposes of platform accuracy measurement, each nucleotide call was considered as an independent test. For this reason, accuracy of base calling was estimated in relation to the number of base pairs in the reference genome sequence. According to CLIA regulations: Accuracy = # of correct results/total # of results x 100% [20]. For WGS, "# of correct results" equals number of the base pairs sequenced correctly, and "total # of results" corresponds to the total number of base pairs in the reference genome. However, as often a case with resequencing, not the whole genome is covered by the reads and contains numerous gaps where no sequence was generated. Hence, only reference areas covered by the reads should be taken into the account. For Illumina technology, the lack of genome coverage may occur due to: 1) secondary structures or AT abundance in the DNA template, leading to failure to generate the library fragments representing "difficult" genome regions; 2) multiple repeat sequences which fail to assemble correctly; 3) insufficient library representation on the sequencing flow cell. The low coverage of the genome may represent a significant problem for the data analysis and may lead to incorrect results. The first two cases are often difficult to overcome, however they didn't seem to be a problem for any of the species/assay combination which we have used during this validation. The percentage of genome covered was within the range of 86-100% (median 98.4%). It is for each user to establish whether the coverage across the genome which can be achieved in ones hands is sufficient for particular application used in the laboratory. The third issue can be troubleshot for majority of the bacterial species, which are handled in the laboratory, e.g. by increasing the proportion of the sample loading onto the flow cell.
In addition to the absence of the reads coverage due to a failure to sequence certain parts of a genome, the other parts of genome were masked from a between-genome comparison. E.g. mobile elements can skew the hqSNP-based phylogeny and are often excluded from the analysis. Since in some cases masking of mobile elements from genome was performed during presented here validation, it was also taken into account by only calculating length of covered by sequencing portion of the genome, instead of using a number of base pairs in the whole reference genome.
The formula used for the platform accuracy calculation: See Appendix 5 data interpretation chapter for instructions for retrieving "Total # of SNP difference with the reference" and "# of sequencing errors (SNP is supported only by 4 or less validation replicates)". Retrieve genome size information from NCBI Genome sitesee instructions in Appendix 3.
Several validation samples differed from reference genome by several SNPs. However, 99% (324 out of 327) of those SNPs were reproducible among all 5 replicates we have sequenced for each sample. Since sequencing errors are random between different library preparations and it is unlikely that the same erroneous SNP will occur in all 5 replicates, we can conclude that those discrepancies were not caused by sequencing errors, but most likely were a result of accumulation of mutations in the reference strains or previous sequencing mistakes in the reference sequence. This was confirmed by the Sanger sequencing (see Appendix 14).
Status: FINAL In both cases, whether we take into the account all SNPs detected between validation and reference sequence, or only those SNPs which don't appear in all of the replicates (true sequencing errors), we observed > 99.999% agreement of generated whole genome sequences with the reference sequences for each tested sample:

Assay accuracy
Assay accuracy was determined as an agreement of the assay result for the validation samples sequenced by the Core lab with the assay result for reference sequences of the same strains. Four applications of WGS were used to validate the accuracy of the assay: in silico Multilocus Sequence Typing (MLST) assay, 16S rRNA gene species identification (ID) assay, assay for detection of antibiotic resistance (ABR) genes, and genotyping assay using high quality Single Nucleotide Polymorphisms (hqSNPs). Sequences generated by the Core lab and corresponding reference sequences were analyzed in the same manner to extract and compare information about sequence type (ST), species ID, present ABR genes, or to generate a phylogenetic tree, correspondingly. One has to make the determination what to consider as a single test for the each specific assay performed in the laboratory. Assay accuracy could be measured only for those validation samples which have reference genomes available from NCBI database or CDC.
The platform accuracy and assay accuracy are interconnected, but it is important to make a distinction between these parts of WGS validation. For the assay accuracy we focus only on areas of the genome targeted by the assay (16S rRNA gene in case of species identification, or several housekeeping genes in case of MLST), while for the platform accuracy its ability to generate a correct base call across the genome is evaluated. The high quality SNP genotyping across the genome can be used as a main assay to validate the platform accuracy, since it allows to validate the accuracy of base calling throughout the genome. Even though, in all of the WGSbased assays we ultimately evaluate the accuracy of a single nucleotide base call made by the platform which directly affects the results of the above-mentioned assays, WGS assays may tolerate a certain error rate of the platform and still can yield accurate results, as long as the assay was validated with a given platform. This is especially true in cases when it is possible to reach a decent depth of the genome coverage in a particular area of the genome targeted by the assay. Certain base calling errors of the platform can also be removed using bioinformatics (56-58). From previously published studies [23,28,29], the SNP analysis or assays like wholegenome-MLST should be done with at least at 30x depth of genome coverage. We determined optimal depth of coverage to be ≥ 60x based on accuracy of SNP detection at various simulated genome coverages. However 15x coverage threshold was sufficient for other WGS assays (MLST, 16S ID, ABR genes detection) and 15x was determined as the minimum acceptance criteria for raw data in order to be considered for the mentioned types of analysis. If presence/absence of certain genes is a key diagnostic feature, the corresponding WGS assay should be added to the validation panel. Validation of the specific assays in turn allows to determine the threshold for the base calling accuracy of the platform which is required to generate accurate and reproducible assay results.

Accuracy of in silico MLST assay
Illumina-generated whole genome sequencing data was shown previously to be of sufficient quality to allow for high levels of allele identification from in silico MLST typing analysis. In a published study, among 80 strains with known MLST sequence-types concordance was 98.4% (551 out of 560 alleles) [30].
MLST is the method of bacterial genotyping which involves sequencing of 6-to-7 housekeeping genes throughout the bacterial genome. Sequence variation (or alleles) of those target genes are used to establish genetic relatedness of the isolates. Combination of known alleles (e.g. aroC-dnaN-hemD-hisD-purE-sucA- thrA=116-7-12-9-5-9-2) of target genes allows assignment of sequence type number, e.g. ST328. There are two options for the single test definition in the case of MLST: 1) consider the final sequence type result as a single test result; or 2) evaluate the result of the each allele identification separately and consider each allele call as a separate test. We believe, that in complex assays, it is reasonable to consider the detection of each of the multiple genetic determinants as a separate test, especially when sequence variation of each of the determinants changes the end results of the assay. To illustrate it with the example of MLST, any sequence variation in the MLST alleles will lead to the change of the allele identification number and will results in a new allele profile, which will lead to sequence type change, therefore each allele identification was counted as a separate test. The definition of the correct result for MLST corresponds to a correct identification of each of the MLST alleles in the validation sequence.
Fifteen validation sequences and their corresponding reference sequences for the same strains were analyzed by in silico MLST. Only those isolates with matching reference sequences online and available MLST typing scheme were analyzes. MLST profiles were identified by the CGE tool included in Core lab analysis pipeline. The protocol for in silico MLST analysis and results for all samples is in the Appendix 6.

Sample
Reference Detection and correct identification of each of the MLST alleles in the typing scheme represents an independent test. Accuracy is represented by percent of allele agreement between all alleles detected in validation samples compared to reference sequences. For all validation samples each of the sequences of the 7 housekeeping genes used in the typing scheme (or 6 genes-for Aeromonas hydrophilia) were identified correctly, resulting in 100% allele identification accuracy.
All results for validation samples matched the reference results, no confirmatory testing is required.

Accuracy of 16S rRNA gene identification assay
For 16S rRNA ID assay variations only in one gene were detected, so the species ID results as a whole (e.g. "Escherichia coli") was considered as a single test. The correct test result, is when the identity of 16S rRNA sequence extracted from validation sample is matching the identity of 16S rRNA sequence extracted from the reference sequence.
16S rRNA sequences were extracted from annotated NCBI or CDC reference sequences and from annotated sequences of 18

Accuracy of Genotyping assay
Accuracy of the genotyping assay is the ability of the assay to correctly determine genetic relatedness of the isolates. Validated here, high-quality SNP genotyping is based on mapping of the validation sequencing reads to a reference genome, which is followed by genome-wide SNP calling against the reference; the identified SNPs are used to build a phylogenetic tree and this way to determine the genetic relatedness of tested isolates. Topology of the phylogenetic tree reflects genetic distances between isolates: shorter branches equal closer related isolates, longer branches equal less related isolates. In other words, genetically related isolates cluster closer together than unrelated isolates. To assess accuracy of genotyping assay, phylogenetic trees were built using reference sequences and validation sequences, and resulting trees were compared. Trees were created for 25 samples belonging to 2 different species of Enterobacteriaceae family, 2 species of Gram-positive bacteria, and one species on Non-fermenting bacteria. See detail of sampling and data analysis in Appendix 1.
For better comparison we suggest using at least 5 strains of the same species to build a tree. If the set of microorganisms used for validation doesn't have 5 representative strains of the same species, it is possible to use genomes available from NCBI to include into the analysis in addition to either reference or validation sequences. E.g., if the validation set contains 3 strains of E. coli, one can download 2 more E.coli genomes from NCBI and use those sequences to build two trees: a) reference tree, containing sequenced of the 3 reference strains from NCBI and 2 additional NCBI genomes for comparison; and b) validation tree, built with 3 genomic sequences generated for the reference strains during validation and the same 2 additional NCBI genomes for comparison. See an example in the figure below: The accuracy of genotyping test was determined using two approaches: 1) Topological similarity between reference tree and validation tree. Tree agreement was statistically measured using Compare2Trees software, which provided a percentage of topological similarity between two trees. 2) Comparison of clustering pattern of validation tree and reference tree. The definition of the correct result used for genotyping assay is following: Clustering pattern upon the phylogenetic comparison of samples sequenced by Core lab must match the pattern generated from the clustering of reference sequences. In other words, conclusions made about the relatedness of the isolates drawn from validation tree and reference tree should be the same. During this validation, the phylogenetic trees were generated for 5 microorganisms. All 5 validation trees had matching clustering patterns and 100% of topological similarity with corresponding reference trees. Concluded accuracy of genotyping test was 100% Interpretation of the phylogenetic analysis results for different species may vary due to the degree of bacterial population clonality or different mutation rate. This may raise a question in necessity of validation of phylogenetic analysis for each species, or maybe even serotype, individually. This question will need further investigation, but in our opinion, as long as the SNP calling and phylogenetic tree building algorithms are the same for the array of routinely tested species, it should be sufficient to test the phylogenetic pipeline for representative species only. If different phylogenetic analysis approach is required for certain species which laboratory routinely works with, it has to be validated separately.

Accuracy of antibiotic resistance genes detection assay
To calculate an accuracy of the ABR genes detection assay, we detected presence/absence of antibiotic resistance genes using ResFinder database. Two sets of sequences were analyzed: 1) The ATCC reference bacterial strains designated for use as antibiotic susceptibility controls were sequenced by the laboratory performing the validation. 2) Reference sequences were acquired from the FDA-CDC Antimicrobial Resistance Isolate Bank for in silico testing.

Detection of resistance genes in the ATCC strains using ResFinder
Five ATCC reference strains (designated for use as antibiotic susceptibility controls) were sequenced and generated sequences were used for the analysis with ResFinder. Negative controls were chosen among strains which were described by the CLSI M100-S24 document as susceptible, with no known antibiotic resistance genes. Positive controls were chosen among strains, which according to the CLSI M100-S24 resistance determinants. The results of ResFinder detection were compared to the resistance genes known to be present in the ATCC strains. See details of the analysis and results reports for each sample in the Appendix 9.
The comparison of the tested sequences against each entry in the database of ABR genes was considered as an independent test. ResFinder database at the moment of validation contained sequences of 1719 antibiotic resistance genes, against which each of the validation samples was compared, resulting in a total of 1719 tests performed for each validation sample. The correct results for ABR genes detection assay would be defined as follows: In negative control samples all 1719 tests must give negative results. In chosen here positive controls 1 out of 1719 tests must give a positive result and the rest must remain negative. All of those criteria were met for all validation samples, therefore the accuracy of the assay for ABR genes detection was 100%.
The Footnote: Total number of tests in this case corresponds to the number of database entries multiplied by the number of samples, because a comparison of sequencing data of each sample was performed against each entry in the database.
All results for validation samples matched the reference results, no confirmatory testing is required.

In silico detection of resistance genes in sequences from the FDA-CDC AR Isolate Bank using ResFinder
Reference sequences were acquired from the FDA-CDC Antimicrobial Resistance Isolate Bank (https://www.cdc.gov/drugresistance/resistance-bank/) for in silico testing. The fastq files of the corresponding isolates were downloaded from the NCBI Short Read Archive (SRA) and subjected to quality trimming, de novo assembly, and ResFinder analysis following standard procedure used earlier. Thirteen isolates (seven Gramnegative and six Gram-positive) with various resistance genes were analyzed (see results of the comparison for each sample in the Appendix 15).
The analyzed isolates harbored a total of 83 resistance genes (representative of 57 different alleles) which were detected by the CDC using both PCR-based methods (for the main resistance types) and by ResFinder.
Two types of comparison were performed: In case of Gram-negative bacteria, all discrepancies were caused by the additional genes detected during validation in comparison with CDC ResFinder results ("false-positive" results). Upon further investigation, several ResFinder database updates were carried out between the update dates. This explains the discrepancies.
In case of Gram-positive bacteria, several discrepancies were also caused by additional genes detected by us, but 2 genes were missing from our results while present in CDC results ("false-negative" results). Both false negative genes were detected by us but possessed <99% ID or incomplete sequence and therefore where excluded from the final result.

Accuracy of bioinformatics pipeline
As suggested previously, in addition to the data generated by in-house sequencing, the simulated data for optimization of variant and genotype calling was used [31]. Accuracy of the bioinformatics pipeline used for hqSNP genotyping was assessed by performing phylogenetic analysis on raw WGS reads of bacterial isolates from a well-characterized outbreaks and comparing validation results to previously published phylogenetic results. Two studies, presenting the phylogenetic analysis of outbreaks, caused by a gram-positive pathogen in one study [12] and a gram-negative pathogen in another study [8] (at least 6 isolates/study), were used for validation of the bioinformatics pipeline. Raw sequences of representative outbreak isolates were available from public databases. Isolates from both studies had human source. The relatedness of isolates to the outbreak was established in previous studies by whole genome sequencing and confirmed by epidemiological data. Epidemiologically unrelated isolate presented in the study were also analyzed for comparison. Bioinformatics pipeline accuracy criteria: Clustering suggested by previous investigators must match clustering achieved by the analysis using Core lab validation bioinformatics pipeline.

Clustering reproduced for Study 1
Below is a figure of the phylogenetic tree of outbreak isolates, which was published in the study 1 by Harris et al.
[PMID: 23158674] ("Study 1 tree"). The isolates from the study which were picked for validation have arrows pointing at them and numbers 1 through 7 assigned for purposes of validation. Tree to the right from the Study 1 tree is a copy of phylogenetic connections between chosen isolates from original study tree. Phylogenetic tree generated using the Core lab bioinformatics pipeline is called "Validation tree 1" and presented below. The same isolates in the original tree and in the validation tree are marked with the same numbers.

Study 1 tree (Harris et al., PMID: 23158674):
Validation tree 1: The topology of the trees may differ; however, the clustering of validation tree completely replicates clustering of Study 1 tree. E.g. isolates 4 and 5 were identical and clustered together according to the Study 1, and the same results were shown in validation tree, with isolates 4 and 5 sharing the same node. All conclusions in regards to the genetic relatedness of the isolates that can be drawn from Study 1 tree can also be made from analysis of validation tree 1.
Group of related isolates from Study 1 was compared with epidemiologically unrelated isolates suggested by the same study (no tree available from publication by Harris et al.). Phylogenetic analysis using the Core Lab bioinformatics pipeline showed that epidemiologically unrelated isolates did not cluster with the group of outbreak isolates and appeared to be genetically distant. Thus, the resulting phylogenetic tree produced by Core lab bioinformatics pipeline showed complete concordance with the epidemiological data. The isolates from the study 2 which were picked for validation marked with green node circles and had numbers 1 through 11 assigned for purposes of validation. Tree to the right from the Study 2 tree is a copy of phylogenetic connections between chosen isolates from original study tree from publication by Leekitcharoenphon et al. Nine of the selected for validation isolates are representative of 4 independent outbreaks and two isolates are epidemiologically unrelated controls: Outbreak A: isolates 1, 2 Outbreak B: isolates 3, 4, 5 Outbreak C: isolates 6, 7 Outbreak D: isolates 9, 10 Isolate 8-epidemiologically unrelated Isolate 11-epidemiologically unrelated The phylogenetic tree generated using the Core lab bioinformatics pipeline is called "Validation tree 2". The same isolates in the tree from Study 2 and in the validation tree are marked with the same numbers.
The clustering of validation tree is identical to the clustering of Study 2 tree. For example, isolates 6 and 7 were a part of the same outbreak, while isolate 8 is an epidemiologically unrelated control used in the study. In accordance with epidemiological data and Study 2 tree, the validation tree showed that isolates 6 and 7 do cluster together, but not with isolate 8 . All conclusions in regards to the genetic relatedness of the isolates which can be drawn from Study 2 tree by Leekitcharoenphon et al. can be also made from analysis of validation tree 2.
In summary, based on analysis of simulated data from both studies accuracy of the pipeline for phylogenetic analysis was 100%.

Inter-and Intra-assay Agreement
Inter-and intra-assay precision is also referred to as within and between-run precision, or repeatability and reproducibility, correspondingly.
Repeatability (precision within run) is established by sequencing the same samples multiple times under the same conditions and evaluating the concordance of the assay results and performance.
Reproducibility (precision between runs) is assessed as the consistency of the assay results and performance characteristics for the same sample sequenced under different conditions, such as between different runs and different sample preparations.
Thirty-four validation samples (as above) each were sequenced 3 times in the same run (for repeatability) and in 3 times in different runs (for reproducibility). For between run reproducibility assessment, all replicates were processed on different days, altering two operators, and generated starting from fresh culture (exception: replicates for Mycobacterium tuberculosis samples were generated starting from DNA), as suggested by the CLSI MM11A document [32]. For within run replicates one DNA extract was used, but independent library preparations were done, with final samples being included in one sequencing run. For each sample: Number of intra-assay replicates = 3, Number of inter-assay replicates = 3; Total number of repeated results= 5, as explained in the figure below: Red-between-run replicates, blue-within-run replicates.
Repeatability and reproducibility were assessed for 3 assays (Genotyping, MLST, and 16S rRNA gene ID). The precision in quality metrics among all replicates was evaluated as well. Concordance of performance was evaluated based on the following quality control metrics: depth of coverage, uniformity of coverage, and accuracy of base calling (Q score). All quality parameters remained relatively constant within and between runs. See all quality metrics of sequences generated for all replicates in the Appendix 8. Below see the ranges established for the corresponding data metrics for all replicates. For each metrics two graphs were plotted:

Cluster density for the run, K/mm2
Cluster density for the run, K/mm2 Repeatability and reproducibility were assessed for 3 assays: • Genotyping

Repeatability and Reproducibility of Genotyping assay
Repeatability and reproducibility for the genotyping assay can be assessed as inter-and intra-assay precision of single nucleotide variant detection. All validation replicates were mapped against the same reference sequence in order to identify SNPs deferring between the replicates.
Two methods of evaluating precision were used: 1) Evaluation of absolute inter-and intra-assay precision per replicate 2) Evaluation of precision relative to the genome size.
The first method considers the whole genome of one replicate as a single test, meaning that any number of single nucleotide changes in 1 out of 3 of the replicates is regarded as 33.3% disagreement for that validation sample.
The second method of precision estimation is more preferable for WGS validation. Each nucleotide call in the WGS should be considered as an independent test. For that reason, a precision of variant calling should be estimated in relation to the genome size of the sample. Meaning, that a single SNP in one of the replicates should be taken into account as a percentage of the number of base pairs called in the genome sequence of the validation sample.

Precision per replicate:
One out of 3 within-run replicates of isolate C50 Pseudomonas aeruginosa ATCC 27853 had a 1 SNP difference from other within-run replicates. All validation samples except C50 yielded identical whole genome sequences for all 3 within-run replicates. The inter-assay precision was 99.02% as per replicate.
Three validation samples had one of the between-run replicates each differing from other between-run replicates. Sample C47 Staphylococcus epidermidis ATCC 12228 had one between-run replicate with 2 SNPs difference from other replicates. Samples C49 Streptococcus pneumoniae ATCC 6305 and C55 Escherichia coli ATCC 25922 each had one of the between-run replicates differing from other replicated sequences by 1 SNP. Intra-assay precision per replicate was 97.05%.

Precision per base pair (calculated for each sample in the table below):
The precision per base pair was calculated for each sample and then the average value was calculated (see below). Estimated precision per base pair (in relation to the covered genome size), both within-and betweenrun replicates was > 99.99999%.

Repeatability and Reproducibility for in silico MLST assay
Repeatability and Reproducibility of nucleotide base calling for in silico MLST assay were assessed. For that sequences of replicates for 21 samples were de novo assembled and used for MLST typing using CGE MLST online tool (as described above). A single change in nucleotide sequence of any of the housekeeping genes used in MLST scheme leads to change in allele number and corresponding Sequence type (ST) change. Within-and between run replicates must have reproducible sequence of all housekeeping genes resulting in a reproducible ST assignment.
Detection and correct identification of each of the MLST alleles in the typing scheme represents an independent test. For each replicate sample 7 tests of allele identification were performed (except C2 sample which had 6 tests and C53 which had 8 tests due to typing schemes difference). For MLST total number of alleles analyzed for either within-or between-run replicates was 441. Repeatability here reflects MLST allele detection precision for within-run replicates. Reproducibility reflects the precision of MLST-alleles detection for betweenrun replicates.  Each single allele in all validation samples was identified consistently among within-and between-run replicates. Within and between run precisions of allele detection were 100%.

Repeatability and Reproducibility in 16S rRNA gene ID assay
Repeatability and Reproducibility of for 16S rRNA gene identification assay were assessed. 16S rRNA gene sequences were extracted from genomes of validation replicates for 34 samples and identified using RDP database as mentioned above. Repeatability here reflects 16S rRNA gene identifications precision for within-run replicates. Reproducibility reflects the precision of 16S rRNA gene identifications for between-run replicates. See results of 16S rRNA ID for all replicates in the Appendix 11.
Within-and between run replicates had repeatable/reproducible sequences of 16S rRNA gene and resulted in repeatable/reproducible species identification. Within and between run precisions of species identification were 100%.

. Approach
Analytical sensitivity (Limit of Detection; LOD) of SNP calling was estimated by modeling different mapping coverages and estimating the minimum coverage which allows for accurate SNP calling (LOD SNP ). Mapping bam files were downsampled in order to achieve different coverage values (60x, 50x, 40x, 30x, 20x, 15x, 10x, 5x) for each of the 9 samples representative of different species. The original sequence mapping coverage was estimated from bam file using the following command: The bam files were downsampled to a desired coverage using the following command: where F is a fraction of desired coverage in relation to the original coverage.
Number of SNPs detected between the reference sequence and downsampled bam file was compared to the number of SNPs detected between reference and sample at its original coverage.

Sample C3_2b
For the rest of the samples only the results for lowest "accurate" coverage and highest "inaccurate" coverage are shown in the Appendix 16.
Below is the summary of the SNP detection at different coverage: The LOD SNP was established at 60x as it was the lowest coverage which yielded accurate SNP detection in all samples.

Approach
To determine the analytical specificity, we in silico generated the sequencing files containing mixture of the reads from 2 different samples thus mimicking a contamination. The effect of potentially interfering sequencing reads on mapping metrics (% of reads mapped/not mapped, % of reference covered, etc.) and SNP detection was estimated. Sample C3 Escherichia coli ATCC 8739 was selected as a base not-contaminated sample and equal parts of reads from different species were merged with it to generate mixed fastq files. "Contaminated" and "not-contaminated" reads were mapped to the same reference genome of E. coli ATCC 8739 from NCBI. Further SNP calling analysis was performed using the standard pipeline under validation.

Mapping metrics
Following mapping quality parameters were estimated for original not-contaminated sample Escherichia coli C3 and different combinations of E.coli C3 with contaminating samples: As expected, contaminating reads led to decrease in percentage of mapped reads and increase in portion of unmapped reads. Percent of reads in pairs has decreased in samples containing contaminating reads. Mentioned parameters should be monitored in future routine samples to detect potential contamination.

Specificity of SNP calling
Specificity of SNP calling in the samples containing potentially interfering reads was estimated by comparing: a) number of SNPs detected between "contaminated" sequence and reference SNPs between "contaminated" and "notcontaminated" sequence Difference between original not-contaminated C3 E. coli sequences (all 5 replicates) and the reference NCBI genome was 22 SNPs. C1 E. coli and C57 M. tuberculosis contaminating reads didn't cause any change in called SNPs. Contamination with any of the other reads led to additional SNPs called both between the compared samples and with the reference. In sample contaminated with C75 S. enterica, in addition to nonspecific SNPs, one of the SNP detected previously was missed. The bioinformatics pipeline had certain tolerance to contaminating reads depending of the nature of contamination. Monitoring mapping metrics will be the best way to track contamination.

Diagnostic sensitivity and Diagnostic specificity
Diagnostic sensitivity-likelihood that an assay will detect a sequence variation when present within the analyzed genomic region (this value reflects a false negative rate of the assay) [21].
Diagnostic specificity-the probability that a NGS assay will not detect sequence variation(s) when none are present within the analyzed genomic region (this value reflects a false positive rate of the assay) [21].
Diagnostic sensitivity is also referred to as "Limit of detection", which is normally for molecular tests designates the lowest amount of nucleic acid that can be detected by the assay. Limit of detection in this sense is not applicable to WGS applications, which utilize pure bacterial culture as starting material, since the amount of DNA which goes into the reaction is strictly standardized and DNA concentration of each sample is measured by fluorometric method before each assay. Starting DNA input for all validation samples in this protocol was 1ng, at input DNA extract concentration of 0.2ng/µl. Samples with a concentration less than 1ng/µl shouldn't be processed for library preparation.
Diagnostic sensitivity and specificity of WGS were estimated in two assays: MLST and Genotyping. 8.6.1. Diagnostic sensitivity and specificity of in silico MLST assay As described above, using organism-specific MLST databases sequence type of validation sequences and their reference sequences was determined. Identities of all alleles defining sequence type (ST) were confirmed.
For MLST number of the true positive results corresponds to the number of alleles correctly identified in the validation samples. For the true negative results we performed a comparison of validation sequences against MLST databases for not-matching species, e.g. search of alleles for C1 Escherichia coli validation sample against MLST database for Salmonella enterica. In the latter case, the MLST assay is not supposed to be able to identify any alleles, and definitively not to assign the sequence type, otherwise it would be counted as a false positive result. None of the negative controls were assigned to any ST. None of the alleles in negative controls were identified:

Diagnostic sensitivity and specificity of Genotyping assay
In order to estimate diagnostic sensitivity and specificity of WGS-based genotyping, the hqSNPs difference between the strains was established to build a phylogenetic tree. Trees generated from the validation sequences were compared to the trees generated from the reference sequences for the same strain. See Appendix 1 for the results. Genotyping using WGS-derived hqSNPs was performed as described above in chapter "Accuracy of Genotyping assay".
Diagnostic sensitivity in genotyping assay. Diagnostic sensitivity for genotyping test is the likelihood that all the SNPs differing between the isolates will be detected. In case of WGS-based genotyping, each sequence variation (SNP) which wasn't detected represents a false negative result. Missed sequence variations manifest as a decrease in genetic distance between tested strains and would make samples appear to be closer to each other in comparison with the same samples clustering in reference tree. See the figure below: Explanation: Effect of false negative SNP on genotyping results. Example: samples A-B-C are identical to each other and samples D and E are identical to each other, while these two groups differ by a certain number of SNPs (e.g. 1 SNP, for simplicity) and form two clusters (red and green). If sample D has false negative SNPs, meaning that SNPs which distinguished it from the "red cluster" A-B-C were missed, the distance between Sample D to A-B-C cluster would by artificially reduced. In a case of an outbreak situation, the false negative SNP call may lead to erroneous inclusion of the sample into the outbreak cluster.
For the genotyping test performed here, the correct result would be when the clustering of whole genome sequences generated by the Core lab result in an analogous phylogenetic tree as one generated from reference sequences available for these samples. Samples which clustered in separate groups must replicate their cluster division and genetic distance, indicating that all the SNPs were detected and none were missed.
For example, Saur_sample4 on the figure below is Staphylococcus aureus ATCC 25923 strain: In tree on the left Saur_sample4 is represented by a sequence generated in Core laboratory as a part of validation. In the tree on the right Saur_sample4 is represented by reference sequence from NCBI. Other samples in both trees are NCBI sequences. False negative SNP calling results in the Saur_sample4 sequenced by Core lab would lead to under-detection of SNPs in validation sample. This in turn would reduce the number of SNPs which are different between Saur_sample4 and Saur_sample5 or other samples. Reduction in SNP difference will result in Saur_sample4 in Core lab tree appearing closer to other isolates than in NCBI reference tree. From the comparison below we can see that clustering between Saur_sample4 and other samples is identical in both trees, suggesting that no false negative SNP calls affected genotyping results. Moreover, in a case of false negative SNP calling, a topological score of validation tree would be changed. According to validation vs reference tree comparison using Compare2Trees software tool all validation trees 100% matched reference trees, indicating absence false negative results in genotyping test.
Diagnostic specificity in genotyping assay. Diagnostic specificity for genotyping test is the likelihood that variation between the isolates (SNP) will not be detected when none are present. In the case of whole genome sequencing based genotyping, changes introduced in DNA sequence during library prep/sequencing, or data analysis errors can result in false positive sequence variations.
False positive SNP calls will lead to an increase in genetic distance between validation strains, in comparison with the distance between reference genomes. Consequently, samples would appear to be further apart from each other in comparison with the reference tree:

Explanation:
Effect of false positive SNP on genotyping results. Example: identical samples A-B-C form the red cluster, another group of identical isolates E & D form another cluster (green). If sample D has false positive SNPs, it will make it look more distant from sample E and other samples, in comparison with reference tree. In a case of an outbreak situation, the false positive SNP call may lead to the incorrect exclusion of the sample from the outbreak cluster. For the genotyping test, clustering of sequences generated during validation should repeat clustering results for reference sequences. In other words, several samples clustered as a single group according to reference sequences must fall into one group after clustering of validation sequences. If one or more samples appear to be excluded from the group upon resequencing this indicates presence of false positive SNPs For example, Smalt_sample4 on the figure below is Stenotrophomonas maltophilia ATCC 13637 strain: In the tree on the left Smalt_sample4 is represented by a sequence generated in Core laboratory as part of validation. In the tree on the right Smalt_sample4 is represented by reference sequence from NCBI. Other samples in both trees are NCBI sequences. False positive SNP calling results in the Smalt_sample4 sequenced by Core lab would likely lead to increase in the number of SNPs which are different between Smalt_sample4 and Smalt_sample5. False positive SNP will result in Smalt_sample4 in Core lab tree appearing further away from Smalt_sample5 than in NCBI reference tree, or Smalt_sample4 will not cluster together with Smalt_sample5.
From the comparison below we can see that clustering between Saur_sample4 and other samples is identical in both trees, particularly Saur_sample4 remains clustered together Saur_sample5. Thus, no false positive SNP calls affected genotyping results. In a case of false positive SNP calling, a topological score of validation tree would be changed. According to validation vs reference tree comparison using Compare2Trees software tool all validation trees 100% matched reference trees, indicating absence false positive results in genotyping test. x 100% = 100% All generated validation trees repeated clustering and had 100% of topological similarity with corresponding reference trees. Both diagnostic sensitivity and diagnostic specificity of the hqSNP-based genotyping assay were 100%.

Reportable range for WGS
CLIA defines the reportable range as "the span of test result values over which the laboratory can establish or verify the accuracy of the instrument or test system measurement response". For NGS used for human genetic disease diagnosis, the reportable range has been previously defined as the portion of the genome for which sequence information can be reliably derived for a defined test system [21]. Presented here validation evaluated following portions of the genome which can be included into reportable range: • Genome-wide hq SNPs • Housekeeping genes used in MLST schemes • 16S rRNA gene • Antibiotic resistance genes in included ResFinder database Reference range is not applicable to WGS since the method is not quantitative. Conclusion: Whole genome sequencing assay was validated by Core laboratory. Following applications were chosen to evaluate the ability of WGS assay to perform accurate, reproducible, specific and sensitive base calling: hqSNP-based genotyping, in silico MLST, 16S rRNA ID, and antibiotic resistance genes detection. LOD was established at 60x coverage. Contaminating reads may cause reduction in analytical specificity of SNPs calling therefore mapping quality metrics must be monitored. WGS assay was shown to have >99.9% accuracy, >99.9% reproducibility/repeatability, and 100% specificity and sensitivity, which meets CLIA requirements for laboratory-developed tests. WGS method can be implemented by the MDL the laboratory for tested applications.

Status: FINAL
Discrepancies in SNPs detected during the validation between generated sequences and NCBI genomes were selectively confirmed by Sanger sequencing. No discrepancies were detected for the 16S rRNA ID, MLST, or resistance genes detection assays; therefore, confirmatory testing was not required.

Summary of quality assurance (QA) and quality control (QC) measures developed during validation
The quality assurance (QA) and quality control (QC) measures were developed as the results of valuation to ensure high quality and consistency of further routine testing using MiSeq Illumina platform. QC must be performed during both pre-analytical (DNA isolation, library preparation), analytical (quality metrics of sequencing run) and post-analytical (data analysis) steps of the WGS. On the stage of data analysis QC includes 3 steps: raw read QC, mapping quality QC (or/and de novo assembly QC), variant calling QC.
The laboratory should use the WGS validation to establish the thresholds of quality parameters, which can be used in following routine testing to filter out poor quality samples and data and this way minimize a chance of false results.
We suggest using spiked-in positive and negative controls for routine testing as well as more comprehensive monthly positive and negative controls. Since traditional CLIA rules require the positive and negative control to pass through all the pre-analytical steps, including DNA isolation, laboratory may choose to follow this guidance and perform DNA isolation and sequencing of positive and negative control in each run, or alternatively, implement Individualized Quality Control Plan (IQCP) [as per 42CFR493.1250] and use more economical spikedin control instead. Type and complexity of positive and negative controls should be determined by each laboratory individually based on specifics of their workflow (most probable source of contamination), type of microorganisms and assays which are most commonly used.
The passing quality thresholds for the controls, particularly the acceptable level of contamination in negative controls (which is unavoidable), would have to be established as well. Our analytical specificity analysis showed that the SNP calling bioinformatics pipeline could be tolerant to a certain degree of contamination and still produce accurate assay results. As per CLIA, failure of either positive or negative control to meet minimum quality thresholds requires rejection and repeat of the entire sequencing run. For the laboratories processing large volumes of samples, especially when libraries are processed in 96-well plate format, it would be a good idea to consider including mix-up control for correct positioning of a known sample.
Here are the highlights of the proposed QA&QC procedures: i. To control for contamination during the library preparation and sequencing process, index combination which doesn't correspond to any sample in the current sequencing run can be added to the indexes demultiplexing step. Index combination for negative control should correspond to one of the index combinations used in the previous sequencing run, this way it would capture carry over contamination with the library fragments generated in the previous run. If the negative control doesn't meet established quality parameters, this indicates a possibility of carry over contamination with the library fragments generated in the previous run. ii. In a case of genotyping, epidemiologically unrelated strain of the same species as a pathogen caused potential outbreak should be included in the analysis as a negative control. Epidemiologically unrelated negative control should not cluster with tested samples on the phylogenetic tree. 2. Monthly QC should be performed by using following positive and negative controls, which have to be included into the WGS process starting from the DNA extraction step all the way to the data analysis: a. Select a well-characterized strain, e.g. Escherichia coli ATCC 25922, as a monthly positive control. If different DNA extraction protocols are used, it is recommended to include additional positive controls which would control for different procedures (e.g. Listeria monocytogenes− for Gram-positive DNA extraction protocol, Mycobacterium tuberculosis− for TB-specific DNAextraction protocol). Additional DNA-extraction controls don't have to be sequenced if the DNA quality is assessed. Monitor all the template DNA and library quantity/quality parameters, sequencing quality parameters applied to the routinely tested samples and ensure that the sequencing analysis results match the true value. In the case of E.coli ATCC 25922: i. MLST allelic profile acquired during the run must correspond to known ST73 profile described for this strain. ii. 16S rRNA sequence must be identified as Escherichia coli. iii. No antibiotic resistance genes should be found with ResFinder analysis. a. It is a good practice to check low coverage (<15x) contigs which are ≥200 nt for the contamination using BLAST search and looking for DNA fragments belonging to other bacterial species. b. Perform irradiation of the biosafety cabinets/hoods used for DNA and library preparation with UV for 20-30 min after each time of use. c. Perform weekly cleaning of the pre-and post-PCR biosafety cabinets/laminar flow cabinets with their contents and mop lab floors using a 10% bleach solution. Monthly clean all lab working areas with 10% bleach. d. Perform line wash on MiSeq instrument with 0.01% sodium hypochlorite at least monthly. e. Good pipetting and sample handling practices to prevent carry-over should be followed. f. Separation of pre-PCR and post-PCR areas is beneficial. Ensure unidirectional sample flow. g. Index rotation schedule between the runs. 6. Establish semiannual proficiency testing for the laboratory and competency testing for the lab staff on an annual basis (semiannual in the first year from the training): a. It is recommended that the testing set (selected from previously characterized isolates) contains at least 4 isolates of the same species for phylogenetic analysis. b. Both Gram-negative and Gram-positive isolates should be included in the PT, or rotated periodically. c. Samples IDs should be blinded by the supervisor/personnel from outside of the laboratory prior to the testing, except for the species names (to allow appropriate DNA isolation method choices by the staff). d. One operator should start from DNA isolation and proceed through all the steps of library preparation, sequencing, and data analysis (unless the lab workflow is routinely divided among different staff members, then each person should be evaluated based on the procedures he/she performs).

Results reporting
The following statement will accompany all WGS reports: The results represent ___average depth of coverage X (min coverage X -max coverage X)___ coverage of the bacterial genomes, which allows identification of genes and intergenic regions in over __ % of genome coverage____% of the __species____ genome. Any ambiguities in sequencing results should be resolved by Sanger sequencing.

High quality (hq) Single Nucleotide Polymorphisms (SNPs) Genotyping
Phylogenetic tree was generated by identification of hqSNPs in each isolate and genome wide comparison of hqSNPs between test isolates (___tree building algorithm used___). Individual __species___ isolates are represented as nodes on the phylogenetic tree. Group of isolates closely related to each other (connected by shorter branches) form a clade in phylogenetic tree, suggesting a possibility of epidemiological link between the isolates. The point where the branches split, called internal node, designates most recent common ancestor and potentially indicates common transmission event. The phylogenetic tree data is supplemented with the distance matrix table showing pairwise comparison of all isolates. Values in intersection show the number of SNPs differences between two isolates with smaller differences indicating close relatedness.

In silico Multilocus Sequence Typing (MLST)
In silico MLST results show genetic relatedness of the isolates based on sequence variations (alleles) in target housekeeping genes (MLST scheme for __species/genus___ via MLST 1.8 web tool at Center for Genomic Epidemiology (CGE)). Combination of known alleles of target genes allowed assignment of sequence type number, e.g. ST9. Overall, MLST provided significantly lower resolution among test isolates than WGS.

In silico Antimicrobial Resistance Gene Detection
The antimicrobial resistance genes were identified by database search of known antibiotic resistance determinants (ResFinder 2.1 web tool at CGE). This approach does not include chromosomal mutations conferring antibiotic resistance (like high-level Ciprofloxacin resistance, etc.). Phenotypic susceptibility tests are required to confirm antimicrobial resistance.

In silico Detection of Virulence genes
The virulence genes were identified by the database search for known virulence determinants (VirulenceFinder 1.4 web tool at CGE). Unknown or novel virulence element cannot be detected by this approach.

Data Release
The nucleotide sequence data from this analysis has not been released or deposited in any database.
Test results must be reported as presented in following form template: green is subject to change, substitute with correct value and change font to black before submitting the report.

Template form
The Whole Genome Sequencing of _# of isolates_ __species__ isolates was performed by MDL Core laboratory using Illumina sequencing chemistry, 300bp x 2 paired-end reads, on Illumina MiSeq sequencer. Phylogenetic analysis, 16S rRNA identification, in silico MLST typing, antibiotic resistance and virulence genes detection results are included in this report. Please find the report summary on page X.

Isolate Details
Use one of two following tables for isolate details recording depending on available metadata: Isolate details

Isolate ID
1) The dark green color indicates a perfect match for a given gene. The %Identity is 100 and the sequence in the genome covers the entire length of the resistance gene in the database.
2) The grey color indicates a warning due to a non-perfect match, length of aligned sequence is shorter than resistance gene length, %ID can be 100% or less.
3) The light green color indicates a warning due to a non-perfect match, %ID < 100%, while length of aligned sequence = resistance gene length.

Comments:
Status: FINAL

Virulence factors encoded by the present genes:
celb -Endonuclease colicin E2 gad -Glutamate decarboxylase lpfA -Long polar fimbriae senB -Plasmid-encoded enterotoxin sigA-Shigella IgA-like protease homologue 1) The dark green color indicates a perfect match for a given gene. The %Identity is 100 and the sequence in the genome covers the entire length of the virulence gene in the database.
2) The grey color indicates a warning due to a non-perfect match, length of aligned sequence is shorter than virulence gene length, %ID can be 100% or less.
3) The light green color indicates a warning due to a non-perfect match, %ID < 100%, while length of aligned sequence = virulence gene length.

DISCLAIMER
The report WGS results are provides based on a laboratory developed test (LDT) using Illumina MiSeq Sequencer and chemistry. The test has not been validated for clinical use and therefore, it is intended for investigational use only (IUO) or research use only (RUO). Additional investigation is necessary to establish epidemiological links.

Coverage and Gene Identification
The results represent ___average depth of coverage X (min coverage X -max coverage X)___ coverage of the bacterial genomes, which allows identification of genes and intergenic regions in over __ % of genome coverage____% of the __species____ genome. Any ambiguities in sequencing results should be resolved by Sanger sequencing.

High quality (hq) Single Nucleotide Polymorphisms (SNPs) Genotyping
Phylogenetic tree was generated by identification of hqSNPs in each isolate and genome wide comparison of hqSNPs between test isolates (___tree building algorithm used___). Individual __species___ isolates are represented as nodes on the phylogenetic tree. Group of isolates closely related to each other (connected by shorter branches) form a clade in phylogenetic tree, suggesting a possibility of epidemiological link between the isolates. The point where the branches split, called internal node, designates most recent common ancestor and potentially indicates common transmission event. The phylogenetic tree data is supplemented with the distance matrix table showing pairwise comparison of all isolates. Values in intersection show the number of SNPs differences between two isolates with smaller differences indicating close relatedness.

In silico Multilocus Sequence Typing (MLST)
In silico MLST results show genetic relatedness of the isolates based on sequence variations (alleles) in target housekeeping genes (MLST scheme for __species/genus___ via MLST 1.8 web tool at Center for Genomic Epidemiology (CGE)). Combination of known alleles of target genes allowed assignment of sequence type number, e.g. ST9. Overall, MLST provided significantly lower resolution among test isolates than WGS. The antimicrobial resistance genes were identified by database search of known antibiotic resistance determinants (ResFinder 2.1 web tool at CGE). This approach does not include chromosomal mutations conferring antibiotic resistance (like high-level Ciprofloxacin resistance, etc.). Phenotypic susceptibility tests are required to confirm antimicrobial resistance.

In silico Detection of Virulence genes
The virulence genes were identified by the database search for known virulence determinants (VirulenceFinder 1.4 web tool at CGE). Unknown or novel virulence element cannot be detected by this approach.

Data Release
The nucleotide sequence data from this analysis has not been released or deposited in any database.

Example of the report
Header: Microbial Disease Laboratory / Core laboratory/ Whole Genome Sequencing report The Whole Genome Sequencing of 54 Shigella sonnei isolates was performed by MDL Core laboratory using Illumina sequencing chemistry, 300bp x 2 paired-end reads, on Illumina MiSeq sequencer. Phylogenetic analysis, 16S rRNA identification, and in silico MLST typing results are included in this report. Please find the report summary on page 5.

MDL
The dark green color indicates a perfect match for a given gene. The %Identity is 100 and the sequence in the genome covers the entire length of the resistance gene in the database.
2) The grey color indicates a warning due to a non-perfect match, length of aligned sequence is shorter than resistance gene length, %ID can be 100% or less.
3) The light green color indicates a warning due to a non-perfect match, %ID < 100%, while length of aligned sequence = resistance gene length.

Comments:
Virulence genes found:

High quality (hq) Single Nucleotide Polymorphisms (SNPs) Genotyping
Phylogenetic tree was generated by identification of hqSNPs in each isolate and genome wide comparison of hqSNPs between test isolates Maximum Likelihood tree building algorithm). Individual Shigella sonnei isolates are represented as nodes on the phylogenetic tree. Group of isolates closely related to each other (connected by shorter branches) form a clade in phylogenetic tree, suggesting the possibility of epidemiological link between the isolates. The point where the branches split, called internal node, designates most recent common ancestor and potentially indicates common transmission event. The phylogenetic tree data is supplemented with the distance matrix table showing pairwise comparison of all isolates. Values in intersection show the number of SNPs differences between two isolates with smaller differences indicating close relatedness.

In silico Multilocus Sequence Typing (MLST)
In silico MLST results show genetic relatedness of the isolates based on sequence variations (alleles) in target housekeeping genes (MLST scheme for Escherichia coli/Shigella via MLST 1.8 web tool at Center for Genomic Epidemiology (CGE)). Combination of known alleles of target genes allowed assignment of sequence type number, e.g. ST9. Overall, MLST provided significantly lower resolution among test isolates than WGS.

In silico Antimicrobial Resistance Gene Detection
The antimicrobial resistance genes were identified by database search of known antibiotic resistance determinants (ResFinder 2.1 web tool at CGE). This approach does not include chromosomal mutations conferring antibiotic resistance (like high-level Ciprofloxacin resistance, etc.). Phenotypic susceptibility tests are required to confirm antimicrobial resistance.

In silico Detection of Virulence genes
The virulence genes were identified by the database search for known virulence determinants (VirulenceFinder 1.4 web tool at CGE). Unknown or novel virulence element cannot be detected by this approach.

Data Release
The nucleotide sequence data from this analysis has not been released or deposited in any database.

Comparison with NCBI reference tree
Validation sequences generated by the Core lab were clustered together with additional comparison sequences of the same species from NCBI database. Total number of clustered samples =5. Reference sequences available from NCBI for validation samples were clustered in an analogous way with comparison sequences. Tree comprised of only NCBI sequences ("NCBI tree") matched tree comprised of combination of validation Core lab sequences and comparison NCBI sequences ("Core tree") in all cases, meaning that sequences which were generated by Core lab for the validation strains were concordant with NCBI reference sequences for the same strains.
In order to cluster NCBI reference sequences, full genomes were downloaded and converted into the paired reads format. Following command-line tool was used for mock reads generation: wgsim -N1000000 -r 0 -R 0 -X 0 -e 0 NCBI_sequence.fasta Mock_reads_NCBI_sequences.R1.fastq Mock_reads_NCBI_sequences.R2.fastq Afterwards NCBI mock reads and Core lab generated reads were processed in identical ways as described in SOP.

Comparison with CDC reference tree
Similarly, strains previously sequenced by CDC were used for the tree comparison. Clustering of sequences generated by CDC was compared with clustering of Core lab validation sequences for the same strains. Tree generated from CDC sequences ("CDC tree") matched tree comprised of validation sequences generated by the Core lab ("Core tree").
CDC sequences were obtained as raw reads from SRA database and processed alongside with Core lab sequences.
Comparison trees were built for 2 species of Enterobacteriaceae family, 2 species of Gram-positive bacteria, and one species on Non-fermenting bacteria. Below see the list of sequenced isolates and references used for comparison: