Characterization and Clinical Significance of Natural Variability in Hepatitis B Virus Reverse Transcriptase in Treatment-Naive Chinese Patients by Sanger Sequencing and Next-Generation Sequencing

Mutations in hepatitis B virus (HBV) reverse transcriptase (RT) are associated with nucleos(t)ide analogue (NA) resistance during long-term antiviral treatment. However, the characterization of mutations in HBV RT in untreated patients has not yet been well illustrated. The objective of this study was to investigate the characterization and clinical significance of natural variability in HBV RT in treatment-naive patients.

H epatitis B virus (HBV) is the most common cause of acute and chronic liver disease in China. Patients infected by HBV during infancy or early childhood are likely to develop into chronic hepatitis B (CHB) and have an increased risk of progression to liver cirrhosis (LC) and hepatocellular carcinoma (HCC) (1). HBV is an enveloped partially double-stranded DNA virus with an approximately 3.2-kb genome encoding four open reading frames (ORFs), pre-S/S, pre-C/C, HBX, and polymerase. The polymerase gene consists of four domains, as follows: the terminal protein, spacer, RNase H, and reverse transcriptase (RT). The HBV genome is replicated through RT using pregenomic RNA (pgRNA) as the template. However, RT lacks proofreading capacity in the process of viral replication, resulting in various genomic variants. Under the selective pressure of antiviral agents and the host immune defense, the fittest variants tend to survive (2).
Nucleos(t)ide analogue (NA) resistance is closely associated with mutations in the RT region. For instance, M204V/I in RT (rtM204V/I) is a classical lamivudine (LMV) resistance mutation, which also greatly reduces susceptibility to telbivudine (LdT); rtA181T/V is 10 min. The PCR products were purified using a QIAquick gel extraction kit (Qiagen, Germany) and directly sequenced (Beijing Genomics Institute, Shenzhen, China).
Next-generation sequencing. The HBV RT gene was amplified by nested PCR using first-round primer pairs (forward, 5=-TAGGACCCCTGCTCGTGTTA-3=; reverse primer, 5=-GCTAGGAGTTCCGCAGTATG G-3=) and second-round primer pairs (forward, 5=-GGGCTTTCCCCCACTGTY-3=; reverse, 5=-GRGCAACGGG GTAAAGGK-3=). Raw data were obtained from the Illumina MiSeq sequencing platform with 2 ϫ 300-bp dual-end mode sequencing, and the average depth was more than 1,500ϫ. Reads with an average quality of ϾQ20 were able to be used for subsequent analysis. After removing the adapter sequences in the reads, small fragments with a length of less than 25 bp were discarded, with the aim to avoid the contamination of PCR primers. Subsequently, clean data were obtained using the BWA software for the final mutation analysis. The real mutation frequency threshold was 0.889%, according to Yang et al. (18). In brief, any mutation detected from the HBV wild-type (WT) plasmid template by NGS was due to errors. Therefore, after aligning the sequencing read of the HBV WT plasmid template by NGS with that by Sanger sequencing, the mean overall error rate was 0.741% (standard deviation, 0.074%) per base. The frequency threshold was set to two standard deviations higher than the mean error rate, i.e., 0.889%, to differentiate a real mutation from a sequencing error.
Sequence alignment and HBV genotyping. Mutations were analyzed by aligning HBV RT sequences with the consensus sequence generated based on the HBV RT sequences in this study and the reference sequences in previous studies according to published literature (6,19). In brief, all of the sequences in this study were aligned with the previous published HBV wild sequences (20) to obtain the consensus sequence as the reference sequence, and then each of the sequences was aligned with the reference sequence described above to analyze the mutation. Shannon entropy was applied as a measure of variation in RT and calculated by the online tool Entropy (https://www.hiv.lanl.gov/content/ sequence/ENTROPY/entropy.html) (21).
Liver inflammation grading and fibrosis staging. Liver biopsies were performed by clinicians. Then, pathological tissues from the biopsy specimens were analyzed by experienced pathologists. The severity of liver disease was evaluated according to grade of inflammation (G) and stage of fibrosis (S). The results were obtained from the electronic medical records (EMR) system. Statistical analysis. Statistical difference was evaluated by t test, Mann-Whitney U test, Kruskal-Wallis test, chi-square test (Fisher's exact test was used when needed), one-way analysis of variance, or multiple comparisons with IBM SPSS Statistics software (version 22.0.0; IBM, Armonk, NY, USA) where appropriate. All P values were two-tailed. A P value of Ͻ0.05 was considered to be statistically significant.

RESULTS
Characterization of mutations in HBV reverse transcriptase gene by Sanger sequencing. The HBV RT region from rt145 to rt289, covering parts of the A-B interdomain, domain B, B-C interdomain, domain C, C-D interdomain, domain D, D-E interdomain, domain E, and parts of the E-RNA H interdomain, was sequenced by Sanger sequencing in 427 treatment-naive CHB patients, including 57 HBV carriers (HC), 265 chronic hepatitis B (CHB) patients, and 105 patients with advanced liver disease (ALD). Among these 427 patients, the 57 HBV carriers consisted of 39 immunotolerant carriers and 18 inactive carriers, and the 105 ALD patients consisted of 83 LC patients and 22 HCC patients. The characteristics of these patients are shown in Table 1. As previously reported (6), 12 genotype-dependent amino acid sites were confirmed in this study ( Table 2). The results revealed that the presence of tyrosine or phenylalanine at rt151, tyrosine or phenylalanine at rt221, alanine or threonine at rt222, alanine or serine at rt223, valine or isoleucine at rt224, histidine or asparagine at rt238, glutamine or leucine at rt267, and methionine or glutamine at rt271 was significantly correlated (P Ͻ 0.0001) with genotype B or C, respectively. Isoleucine at rt191 and  histidine at rt226 were significantly more associated with genotype B than genotype C (P Ͻ 0.0001), respectively, and cysteine at rt256 was significantly more associated with genotype C than genotype B (P ϭ 0.003), though valine, asparagine, or serine was predominant at rt191, rt226, or rt256 for both genotypes, respectively. Isoleucine was predominant at rt269 for genotype B (P Ͻ 0.0001), while isoleucine (36/69) and leucine (33/69) were comparable at rt269 for genotype C. In addition, although methionine was predominant at rt271 for genotype B, leucine at rt271 was significantly more associated with genotype B than with genotype C (P Ͻ 0.001). According to previous investigations (6,19), the consensus amino acid residue at each of above-described sites was genotype dependent and regarded as the reference. The other residues at these sites present at a low frequency were regarded as spontaneous mutations in this study.
On the basis of the consensus amino acid residue mentioned above, mutations in this study were classified into 4 categories: primary resistance mutations, secondary resistance mutations, putative resistance mutations, and pretreatment mutations, according to previous studies (6,22). Primary and secondary resistance mutations are well known as classical antiviral resistance mutations which have been confirmed by phenotypic experiments in vitro and vivo, while putative resistance mutations and pretreatment mutations remained to be proven by experiments. Moreover, putative resistance mutations were related to long-term NA therapy, and pretreatment mutations were mainly found in treatment-naive patients. The results in this study showed that mutations in the RT were found in 36.53% ( ). In addition, except for only 1 patient with a mutation at rt181 (A181T) in the ALD group, there was not any primary resistance mutation (i.e., I169T, T184A/C/F/G/I/L/M/S, A194T, S202C/G/I, M204I/V/S, N236T, or M250I/L/V) and secondary resistance mutation (i.e., V173L or L180M) found in treatment-naive patients, while 9 putative resistance mutations and 51 pretreatment mutations were detected in these patients.
Mutation distribution and frequency in different RT sections. To testify the difference of the mutation distribution among 3 groups (HC group, CHB group, and ALD group), the mutation frequency for each section of the RT region was calculated, referring to the previous study (6). For instance, the mutation frequency of domain D in the CHB group was calculated with the following formula: 11 mutations detected/(5 studied sites ϫ 265 isolates) ϫ 100% ϭ 11/1,325 ϫ 100% ϭ 0.83%. The detailed results for each section of the RT region in different groups are shown in Fig. 1. In total, the mutations analyzed in this study occurred mainly in domain C, the C-D interdomain, and domain E. In the HC group, the mutation frequency of domain C (2.92%) was highest (P Ͻ 0.05). In the CHB group, significantly higher mutation frequencies were found in domain E (1.38%) and the C-D interdomain (1.20%) than in other sections (P Ͻ 0.01). In the ALD group, domain C showed the higher mutation frequency (1.90%), though this difference was not statistically significant. Notably, among these 3 groups, the mutation frequency of domain C was highest in the HC group, while it was lowest in the CHB group (P Ͻ 0.05).

Correlation between HBV RT mutations and clinical features.
To investigate the clinical influence of HBV mutations in treatment-naive patients, the clinical characteristics of patients were compared between 271 patients without mutations and 156 Pretreatment mutations d (Continued on next page) a Well-known NA resistance mutations (primary and secondary) with phenotypic data. No secondary resistance mutations were found. b Putative mutations that were relevant to NA resistance but not experimentally confirmed.
c Genotype-dependent amino acid polymorphic positions identified in this study.
patients comprising 115 patients with a single mutation and 41 patients with multiple mutations. The results showed that patients with mutations, especially with multiple mutations, were significantly older and had significantly lower HBeAg-positive ratio, HBV DNA loads, and levels of HBsAg quantification than those without mutations (Table 4). Additionally, to investigate the relationship between the severity of liver disease and RT mutations, the results of liver biopsies from 102 patients including 72 patients without RT mutations, 22 patients with a single RT mutation, and 8 patients with multiple RT mutations were analyzed. The severity of liver disease was evaluated according to the grade of inflammation (G) and the stage of fibrosis (S) (Fig. 2). The results showed that there was no significantly difference in the grade of inflammation among nonmutation group, single-mutation group, and multimutation group. Although no significant difference was found between the nonmutation group and single-mutation group, a significant difference in the stage of fibrosis was found between the nonmutation group and multimutation group (P ϭ 0.030), indicating that patients with multiple mutations in the RT region had a greater risk of developing more serious liver fibrosis.
Comparison of mutant characterization in HBV reverse transcriptase gene by next-generation sequencing and Sanger sequencing. Due to the limited sensitivity of Sanger sequencing, next-generation sequencing was performed on 66 treatment-  naive patients with chronic HBV infection from rt201 to rt335 (Table 1), and Sanger sequencing was also performed on 63 of these 66 patients for comparison (Fig. 3). The results tested by NGS were consistent with those by the conventional Sanger sequencing (Fig. 3 and 4). It showed that mutations mainly clustered in rt212 to ϳrt228 located in the C-D interdomain, rt247 to ϳrt270 located in domain E, and rt295 to ϳrt335 located in the E-RNA H interdomain ( Fig. 3A and 4A1 to C1). Moreover, the mutations seemingly tended to occur in patients with negative HBeAg, genotype B HBV, or CHB (Fig. 3B to D and 4A1 to C1).

DISCUSSION
With the widespread use of NAs, there has been increasing numbers of nucleot(s)ide resistance (NA r ) mutations reported in the HBV RT region. However, the existence of drug resistance mutations in treatment-naive patients remains controversial. In this study, except rtA181T, the classical primary drug resistance mutations (i.e., I169T, A181T/V, T184A/C/F/G/I/L/M/S, A194T, S202C/G/I, M204I/V/S, N236T, and M250I/L/V) were not detected by Sanger sequencing in treatment-naive patients (Table 3), which was consistent with previous reports (6,25,26). Considering that Sanger sequencing is a population-based sequencing approach used to detect mutations with an intrahost rate of more than 20% (6,27), the minor mutations with an intrahost rate of less than 20% were likely to be ignored by Sanger sequencing. In this study, using nextgeneration sequencing, mutations were found at rt202 (rtS202R/I/N/C/G), rt204 (rtM204L/R/I), rt236 (rtN236T/K/H), and rt250 (rtM250I/L) ( Table 5). Although it was reported that naturally occurring NA r mutations were not dominant in the treatmentnaive patients, the effect of minor quasispecies with primary NA r mutations on the subsequent antiviral treatment was worth being further investigated.
In the range of amino acid sequences from rt145 to rt289 tested by Sanger  sequencing in this study, 12 genotype-dependent amino acid polymorphic positions (rt151, rt191, rt221, rt222, rt223, rt224, rt226, rt238, rt256, rt267, rt269, and rt271) were identified for genotypes B and C; this was important for the definition of genotypic mutation. For example, rtA (alanine) 222T (threonine) was considered to be a novel mutation, according to a previous report (28). However, in this study, alanine and threonine at rt222 were identified to be genotype-dependent wild-type amino acids for genotypes B and C, respectively. Moreover, a previous document showed that in the polymorphic positions from rt145 to rt289 which were analyzed in this study, rt221, rt224, rt238, and rt256 were identified to be the genotype-dependent amino acid polymorphic positions for genotypes B and C in patients from Beijing, China (6). Compared with this result described above, besides rt221, rt224, rt238, and rt256, another 8 amino acid positions (rt151, rt191, rt222, rt223, rt226, rt267, rt269, and rt271) were identified as being polymorphic sites in patients from Fujian, China. The difference implied that even in different areas of the same country, there were also some differences in the HBV wild-type sequence. Therefore, instead of using the same reference sequence in different areas, it might be more appropriate for genotypic mutation analysis that the consensus sequence was obtained from the alignment for HBV nucleotide sequences among local treatment-naive patients and regarded as the reference sequence. Currently, the well-known classical NA resistance mutations are mainly located in domains B, C, D, and E, such as rtI169T, rtA181T/V, and rtT184A/C/F/G/I/L/M (located in domain B), rtS202C/G/I and rtM204I/V/S (located in domain C), rtN236T (located in domain D), and rtM250I/L/V (located in domain E). Therefore, mutations in these domains have a greater risk of NA resistance. A previous study had shown that the prevalence of mutations in the A-B interdomain was higher than in the RT domain or non-A-B interdomains (6). However, the difference in prevalence of mutations between each other of sections in RT was still unknown. Therefore, we have analyzed a total of 46 mutations in domain B, the B-C interdomain, domain C, the C-D interdomain, domain D, the D-E interdomain, and domain E from rt164 to rt270 to address that issue and assess the risk of NA resistance. The results showed that mutant hot spot fractions were different among the HC, CHB, and ALD groups. Given that some patients in the CHB group might be eligible for antiviral treatment, mutations in domain E with significantly higher mutation frequency in the CHB group (Fig. 1), such as rtS/C256G, which is associated with ETV treatment, should be noticed (29).
In this study, we compared the clinical characteristics of patients with and without RT mutations. We found that people with RT mutations, especially with multiple mutations, were apparently older than those without mutations and had a significantly lower positive ratio of HBeAg, lower HBV DNA load, and lower HBsAg level (Table 4). Recently, it was documented that HBV seemed to be under host immune pressure in chronic HBV infection, particularly in the HBeAg-negative status (30). Therefore, in this study, patients with RT mutations who had lower positive ratios of HBeAg might be under greater immune pressure. In turn, the greater immune pressure could result in the decreased HBV DNA load and HBsAg level. Also, under the greater immune pressure, more mutations might be selected for and accumulate for immune escape over time. In fact, the results in this study have shown that many mutations in RT sites (36/56 [64.29%]) were found to change the B or T cell epitopes in the RT or S protein (Table 3). Interestingly, although many mutations could change the epitopes, mutations found in the D-E interdomain and domain E appeared not to change the B or T cell epitopes in the RT or S protein, implying that mutations in the D-E interdomain and domain E were not responsible for immune escape. In addition, the correlation between RT mutations and stage of liver fibrosis was analyzed in this study. The results have shown that patients with multiple RT mutations have more severe liver fibrosis than do those without mutations (Fig. 2), implying that mutations in RT might be involved in the progression of liver disease. However, the mechanism of the relationship between multiple RT mutations and liver fibrosis was still unclear.
Further, next-generation sequencing was performed on 66 treatment-naive patients, and Sanger sequencing was also performed on 63 of those patients at the same time. We found that more mutations occurred in HBeAg-negative patients, genotype B-infected patients, and CHB patients than in HBeAg-positive patients, genotype C-infected patients, and HC or ALD patients, respectively ( Fig. 3A and 4A1 to C1), which might be attributed to the higher immune pressure. Indeed, Desmond et al. (30) defined viral adaptation as the presence of the relevant HLA and the escaped amino acid, and they found that the adaptation was significantly lower in HBeAg-positive than in HBeAg-negative patients among genotype B-infected patients, while this result was not significant among those infected with genotype C HBV. Furthermore, it should be noticed that more mutations occurred at rt251, rt266, rt274, rt280, rt283, rt284, and rt286 located in domain E and the E-RNA H interdomain than in the other RT sections in ALD patients (Fig. 4C2), indicating that those mutations were associated with liver disease progression. However, the phenotypic mechanisms of those mutations remain unclear and deserve to be elucidated.
In conclusion, through Sanger sequencing and NGS, the present study realized the analysis of mutations in the RT region in different liver disease stages, providing a possible explanation to illuminate the mechanisms of their evolution and selection under varied levels of immune pressure. Classical primary drug resistance mutations were found at a low rate by next-generation sequencing in the treatment-naive patients but were not detected by Sanger sequencing. It was worth investigating the effect of those NA r mutations in a low rate on the subsequent treatment. In addition, the correlation was also discussed between the accumulation of RT mutations and clinical indexes, including HBeAg, HBV DNA, HBsAg, age, and stage of liver fibrosis, emphasizing the clinical significance of the overall effect of naturally occurring RT mutations on the treatment-naive patients. The detailed functional mechanisms of particular mutations need to be further explored.