ABSTRACT
Influenza A viruses cause yearly epidemics, in part, due to their ability to overcome immunity from previous infections through acquisition of mutations. Amino acid sequences encoded by genes 4 (HA), 6 (NA), 7 (M), and 8 (NS) from 77 H3N2 influenza A isolates, collected between November 2003 and March 2005, were analyzed to determine the extent to which the viruses mutated within epidemic periods and between the epidemics. Nucleotide and amino acid sequences were stable throughout the epidemics but experienced substantial changes between epidemics. Major changes occurred in the HA gene in 5 to 7 amino acids and the NA gene in 11 to 13 amino acids and changes of 5 amino acids occurred in the M and NS genes. In the HA gene, changes occurred in sites known to be epitopes that determine the hemagglutination inhibition reactivity, and these were shown to be associated with a change of strain from A/Fujian/411/2002-like to A/California/7/2004-like viruses. Our findings indicate that genotype determination promises to be a rapid approach for detecting new strains of influenza A viruses in a population.
Influenza viruses are well-established etiological agents of yearly epidemics which occur in temperate climates during the colder seasons. The viruses are predominantly classified as influenza A and influenza B, of which the former is generally more prevalent, with greater morbidity, and manifests higher genetic variation over time (22). Influenza A is further classified into subtypes based on antigenicity of the hemagglutinin (HA) and neuraminidase (NA) proteins. Although there are currently 16 HA types and 9 NA types of influenza A virus recognized, only H3N2, H1N1, and occasionally, H1N2 viruses are currently in circulation (3, 10). The H1 and H3 subtypes are subject to regular antigenic changes, as measured by the hemagglutination inhibition test with reference antisera (5). Over the past decade, four subtypes have arisen, namely, A/Sydney/5/97, A/Panama/2007/99, A/Fujian/411/2002, and A/California/7/2004. During the subsequent influenza season, the introduction of new strains has often been associated with increased influenza activity or poor efficacy of the vaccine lacking the new strain (17). The public health response to influenza consists mainly of vaccination, making the timely and accurate identification of the circulating strain essential.
Identification of new strains of influenza A by the hemagglutination inhibition assay and virus neutralization, although well established, are relatively time-consuming. The process requires the demonstration that a specific reference antiserum inhibits hemagglutination by the virus in question to a higher titer than sera to antecedent viruses (15). Moreover, for newly emergent strains, such reference antisera are not likely to be available. Analysis of the inferred amino acid sequence of the hemagglutinin encoded by gene 4 of the influenza virus has shown that there are 5 sites on the globular domain, designated A through E, where amino acid changes may lead to the expression of new epitopes identified by the hemagglutination inhibition assay. There is an association between amino acid changes at these sites associated with strain changes, as measured by the hemagglutination inhibition assay. By tracking these changes, it may therefore be possible to obtain an early insight into the arrival of novel strains of influenza A viruses (9, 14, 20, 21). In fact, it was shown that a change of at least 4 amino acids in two or more antigenic sites can generate a new strain to which previous immunity is no longer effective (6).
Likewise, gene 6 of influenza A viruses, which encodes the neuraminidase protein, has been reported to undergo substantial variation over time (1). Subtyping of viruses based on the neuraminidase is less well developed and is not as consistently used. Less is known of the longitudinal variation of genes 7 and 8 which encode internal proteins (M, M2) and nonstructural proteins (NS1, NS2) of the virus and, hence, are not considered to be under significant antigenic pressure to mutate. We wish to report on genotype determination from nucleic acid sequences of genes 4, 6, 7, and 8, which, respectively, encode proteins HA, NA, M, and NS in influenza A isolates, from 2003 to 2005 and the identification of the influenza A/California/7/2004-like viruses in our setting from as early as 1 April 2004 onward.
MATERIALS AND METHODS
Viruses and cells.Influenza viruses used in these studies were obtained from a subset of clinical specimens from British Columbia submitted for respiratory virus diagnosis between 7 November 2003 and 31 March 2005. As part of the diagnostic workup, the specimens were inoculated into tube cultures of primary rhesus monkey kidney cells, and isolates from cultures showing cytopathic effects were passaged in MDCK cell cultures grown in HyQ SFM4 MegaVir medium (HyClone, Logan, Utah) in the presence of l-1-tosylamido-2-phenylethyl chloromethyl ketone-treated trypsin (7). All of the isolates were typed by the hemagglutination inhibition test (15).
Nucleic acid extraction and amplification.Genomic influenza RNA was extracted from MDCK cell lysates using the QIAGEN MinElute virus spin kit (QIAGEN, Mississauga, ON, Canada). Reverse transcription (RT)-PCR master mixes were prepared using the QIAGEN one-step RT-PCR kit with 1/10 dilutions of the RNA extracts. Primers (0.6 μM) used in the amplification step of our study are shown in Table 1 along with the PCR cycling conditions used to amplify genes 4 (HA), 6 (NA), 7 (M2), and 8 (NS) (13). The position of the primers in the sequence of the respective genes is indicated.
Primers used for amplification and sequencing studies
Cycle sequencing.cDNA amplicons were purified from the RT-PCR preparation using the QIAQuick PCR purification kit (QIAGEN, Mississauga, ON, Canada) and sodium acetate-ethanol precipitation and subjected to cycle sequencing. Two microliters of Big Dye Terminator V3.1 was added to 8 μl of 2.5× Tris MgCl2 sequencing buffer, 3 pmol of one of the primers stated in Table 1, and approximately 20 to 40 ng of the template cDNA. The total volume was made up to 20 μl with PCR-grade water. Cycle sequencing conditions were as follows: 25 cycles of 94°C denaturation step for 20 s, 50°C annealing step for 30 s, and 72°C extension time for 4 min. Sequencing was performed on an Applied Biosystems ABI Prism 3100 genetic analyzer. Sequence analysis was performed using DNA Star and Clustal W software, allowing the multiple alignment and comparison of gene sequences obtained with the sequence of influenza A/Wyoming/03/2003.
Nucleotide sequence accession numbers.Sequences have been deposited into GenBank under accession numbers DQ227423 to DQ227454.
RESULTS
Because the comprehensive sequence information for all 4 genes of A/Fujian/411/2002 was not readily available nor could the virus itself be readily obtained, we sequenced the relevant portions of genes encoding the HA, NA, M, and NS of A/Wyoming/03/2003, an A/Fujian-like strain, which was made available to us by A. Klimov, Centers for Disease Control and Prevention. The virus was passaged once in embryonated eggs, and the viral RNA extracted from the allantoic fluid was sequenced.
A total of 77 influenza isolates, all subtyped as H3N2, were analyzed by comparing the sequences of their HA, NA, M, and NS genes. These targets were chosen as representatives of genes hypothesized to be under greater antigenic pressure, in the case of the HA and NA genes, and lesser antigenic pressure, in the case of M and NS genes. In addition, only a portion of gene 7 encoding the M2 protein was analyzed, since mutations in this region have been correlated with amantadine resistance (12). During the period of this study, the laboratory diagnosed a total of 1,285 cases of influenza from the province of British Columbia. In months of high influenza activity, a subset of 5 to 10 isolates was sequenced (for a total of 67 isolates) and analyzed, while in the interepidemic periods, all influenza isolates were sequenced (10 isolates). Changes in the 4 genes at the amino acid and nucleotide levels in reference to the A/Wyoming/03/2003 strain are shown in Table 2. Throughout both the 2003 to 2004 and 2004 to 2005 epidemics, the genomes of the isolates remained highly conserved. Compared to A/Wyoming/03/2003, the HA of the 2003 to 2004 epidemic isolates differed by 14 to 21 nucleotides and 6 to 7 amino acids. Likewise, the NA gene differed by 34 to 38 nucleotides and 10 to 11 amino acids, and the M and NS genes, respectively, differed by 10 to 11 and 8 to 9 nucleotides and 3 to 4 amino acids.
Changes in sequence of the HA, NA, M, and NS genes between epidemic and interepidemic periodsa
Following a period of minimal influenza activity during February and March 2004, sporadic cases of influenza were diagnosed in long-term care facility outbreaks between April and September (isolates 04-09238 and 04-12419 in Table 3). Sequence analyses of the 4 genes of these interepidemic isolates showed that, in reference to A/Wyoming/03/2003, they differed by 17 to 24 nucleotides and 8 to 10 amino acids in the HA gene, 15 to 19 nucleotides and 3 to 5 amino acids in the NA gene, and 0 to 3 nucleotides and 0 to 1 amino acid in both the M and NS genes. This genotype remained relatively conserved in isolates from the subsequent epidemic period between November 2004 and March 2005. The isolates were typed by the hemagglutination inhibition test. They were reported as A/Fujian/411/2002-like until early in 2005. After the A/California/7/2004 strain was recognized and typing sera became available, the 2004 to 2005 isolates were reclassified as A/California/7/2004-like.
Changes in amino acid sequence encoded by the HA gene
When the sequences of the HA gene were compared at the amino acid level to A/Wyoming/03/2003 and A/Fujian/411/2002, it was observed that the genetic drift occurring in the interepidemic period was localized to specific sites. The most relevant of these are sites A through E, each known to contain epitopes recognized by antibodies reactive in the hemagglutination inhibition assay, which form the basis of strain characterization (19). These include amino acids 133 to 146 of site A; 155 to 160 and 186 to 197 of site B; 50, 52, 273, and 277 of site C; 172, 201, 205, 207, 217, 226, 229, and 247 of site D; and 57 and 83 from site E. As shown in Table 3, in the 2003 to 2004 epidemic period, only 2 amino acid changes from A/Fujian/411/2002, namely Asn126Asp and Glu479Gly, were noted in the hemagglutinin. This change resulted in the loss of a potential N-linked glycosylation site at position 126. However, the HA of these 15 isolates and A/Fujian/411/2003 differed from A/Wyoming/03/2003 at Ala128Thr, Val186Gly, Tyr219Ser, and Iso226Val. The April 2004 isolate represented a marked change in the HA sequence, with additional amino acid differences from the reference A/Wyoming/03/2003 strain at Lys145Asn, Tyr159Phe, Ser189Asn, and Ser227Pro. In addition, the changes at Ala128Thr, Val186Gly, and Tyr219Ser persisted as with A/Fujian and the 2003 isolates, but there were reversions to the A/Wyoming/03/2003 sequence at Iso226Val and Asp126Asn, of which the latter would have restored the potential N-linked glycosylation site. Subsequent interepidemic isolates retained these changes but also underwent an additional change at Thr361Iso. During the 2004 to 2005 epidemic period, sporadic changes of Asn312Ser and Arg142Gly were noted as well. The genotypes first noted 3 June and 25 October 2004 were predominant and were present in over half of the isolates analyzed.
The amino acid sequence encoded by the NA gene, shown in Table 4 remained stable during the 2003 to 2004 epidemic period. Ten to 11 amino acid differences were noted between this sequence and that of A/Wyoming/03/2003, and of these, 7 were also present in A/New York/61A/2003. However, in the April 2004 isolate, 10 amino acid changes were noted from the isolates of the just-completed epidemic. These were substantially more similar to the sequence of the A/Wyoming/03/2003 strain from which this virus differed by only 3 amino acids, namely Asn93Asp, Glu199Lys, and Gln432Glu. During the subsequent interepidemic period, additional changes were noted at Lys221Arg, Lys221Glu, and Iso307Thr. This genotype persisted into the 2004 to 2005 epidemic period, with a change at Asp151Gly. In total, 31 of the NA genotypes of which the first was detected in July 2004 and 17 of which the first was detected in Oct 2004 persisted throughout the subsequent epidemic.
Changes in amino acid sequence encoded by the NA gene
Likewise, the amino acid sequences encoded by the M gene were virtually unchanged from the A/New York/12/2003 sequence during the 2003 to 2004 epidemic periods as shown in Table 5 but exhibited changes in the interepidemic and subsequent 2004 to 2005 epidemic isolates which were consistent with those present in the corresponding A/Wyoming sequence. In the region of gene 7 whose sequence was analyzed, the changes included Asn23Ser, Ile51Val, Arg56Lys, and in single isolates, Val28Iso and Ala30Val. In addition, the Asp88Asn change from the A/Wyoming/03/2003 sequence was noted early in the 2004 to 2005 epidemic.
Amino acid changes encoded by the M gene
Analysis of the NS gene sequences shown in Table 6 again illustrated the similarity of the 2003 to 2004 epidemic isolates and the A/New York/12/2003 strain, which differed in 3 amino acids from the reference A/Wyoming/03/2003 strain. The subsequent April 2004 isolate NS gene had a sequence indistinguishable from the A/Wyoming/03/2003 strain which persisted throughout most of the subsequent interepidemic and epidemic periods. A single further change was noted in Met116Iso in October 2004, seen in 2 isolates, and in November 2005, a further change of Met116Thr was seen in 9 subsequent isolates. Finally, beginning in March 2005, a Glu96His change was noted in 4 isolates.
Amino acid changes encoded by the NS gene
Phylogenetic trees, shown in Fig. 1 and 2, were constructed to determine the genetic relationship among the epidemic and the interepidemic isolates of 2003 to 2005, based on the HA and the NA genes, using the consensus sequence of A/Panama/2007/99 as the phylogenetic root and consensus sequences of A/Wyoming/03/2003, with or without A/Fujian/411/2002, as a reference strain for each year's epidemic. Two clusters were noted in both cases, with the interepidemic and 2004 to 2005 epidemic isolates clustering together, though distinctly separate from those of 2003 to 2004 epidemic isolates at the level of both genes.
Phylogenetic tree of the HA gene rooted by A/Panama/2007/99 as determined by the CLUSTAL W algorithm of MegAlign (version 5.01; DNAStar).
Phylogenetic tree of the NA gene rooted by A/Panama/2007/99 as determined by the CLUSTAL W algorithm of MegAlign (version 5.01; DNAStar).
DISCUSSION
The correlation of the genetic drift of influenza A viruses with the corresponding antigenic changes has recently been reviewed for the H3N2 viruses from 1968 to 2003 (19). Our studies have examined the genomic evolution of influenza viruses that circulated in our setting over the past 2 years. This longitudinal analysis of the sequence of genes 4, 6, 7, and 8 of representative influenza isolates from the 2003 to 2004 and 2004 to 2005 epidemics and the interepidemic period has provided valuable insights into the manner in which influenza A undergoes genomic variation which is characteristic of this virus. Similar findings, though less comprehensive, have been reported previously (1, 11). Recently reported studies of an outbreak of influenza in Nepal in 2004, describing amino acid changes from the A/Fujian/411/2002 strain in the hemagglutinin showed changes consistent with our observations for positions 145, 189, 226, and 227 (8). Based on antigenic analyses, the concept of antigenic drift comprised of mutations in specific epitopes of the globular domain of the hemagglutinin is a well-recognized phenomenon (9). The changes in amino acids from successive epidemics have been analyzed in detail previously (3). In these studies, 18 codons in the HA gene were identified which are positively selected with antigenic drift. In our analysis, codons for amino acids 142 and 145 at the antibody binding site and the codon for amino acid 226 at the sialic receptor binding site (18) were shown to be among these positively selected codons.
However, our findings show that, in our setting, it was not only the HA gene sequence that underwent such changes but the NA gene sequence showed an even greater degree of change, as did the sequences of the M and NS genes which encode proteins not believed to be under the same antigenic pressure as the spike proteins responsible for receptor interaction. The NA, M, and NS gene sequences of the isolates that followed the 2003 to 2004 epidemic were more closely related to the A/Wyoming/03/2003 strain. We would therefore hypothesize that the strain detected in our setting in April 2004, which became dominant in the 2004 to 2005 epidemic period, was unlikely to have evolved locally from the just-ended 2003 to 2004 epidemic but was more likely introduced de novo into our population from an outside source. This strain was possibly an A/Wyoming/03/2003-like parent strain that acquired a substantially different HA gene either through mutation or reassortment with a yet unknown strain. It fully replaced the A/Fujian/411/2002-like and related A/New York/61A/2003-like strains which were present in our 2003 to 2004 epidemic.
Our findings support the observations previously reported that the sequence of the NA gene encoding the neuraminidase can vary substantially from year to year (1). In these studies, changes in amino acid sequence of the neuraminidase gene were noted to occur at approximately 1% per year within the A/Sydney/97 strain of influenza and at over 3% between the first year of the A/Sydney 97 strain and A/Beijing 89 strain (11). In our setting, the amino acid sequence encoded by the NA gene remained relatively constant throughout the epidemic periods and experienced a substantial change in the interepidemic period. Since the NA has a less prominent role than the HA in initiating the infection, it may not be under the same antigenic pressures to mutate. However, in our findings, the antigenic variation of the NA was found to be as complex as that of the HA.
Likewise, the M and NS genes, generally expected to be under lower antigenic pressures, exhibited considerable changes at the amino acid level, namely, 3 to 5 amino acids. In our analysis, the sequence of gene 7 was shown to have undergone mutations known to be associated with amantadine resistance: Ala30Val in a specimen from November 2003 and Ser31Asn in a specimen from July 2004. In the period after the termination of our study, namely, the 2005 to 2006 influenza season, it has been reported that the proportion of influenza A isolates that are amantadine resistant has increased substantially (2).
These sequence analyses showed that, from a genotypic perspective, influenza remained a very stable virus population throughout the epidemic periods. In contrast to the epidemic periods during which only a minimal number of changes in sequence occurred, substantial genetic heterogeneity was observed in the interepidemic period. Hence, sequences from the isolates from April 2004 onward differed substantially from those from the preceding epidemic isolates in all 4 genes tested. These findings support the concept that the novel strain of influenza A virus that entered our population during the interepidemic period was likely an A/Wyoming-like strain parent that may possibly have arisen as a reassortant with A/Fujian/2002-like clades rather than having evolved from the virus that remained endemic to our area in the preceding epidemic period. Such reassortments have been described previously (16, 22).
Sequencing of viral genes from a subset of isolates throughout the year has obvious epidemiological implications. In our study, we were able to identify the presence of a new strain of influenza from isolates in April 2004. These isolates were initially typed as A/Fujian/411/2002-like by hemagglutination inhibition using the reference sera available at the time. It was only when the 2004 to 2005 epidemic was under way that it was recognized that the strain that was circulating was different and, as such, was designated A/California/7/2005 (4). From these observations, it can be concluded that it would be appropriate to monitor influenza isolates, especially during the interepidemic period, by genome sequence analysis to detect the presence of a newly evolved viral genotype entering the population. This would greatly augment our ability to monitor the antigenic drift and to detect a more rarely occurring antigenic shift of the virus virtually on a real-time basis. If such a genotype exhibited amino acid changes at critical epitopes on the hemagglutinin gene, the isolate could readily be used to produce reference antibody which would allow for the identification of a new strain by hemagglutination inhibition testing, and depending on the time of the year it was identified, it may even have an impact on the design of the following year's vaccine. Finally, by identifying and monitoring the appearance of such novel genotypes, it may be possible to develop models that would predict the activity of influenza for the coming season.
ACKNOWLEDGMENTS
This work was supported in part by funds from the Province of British Columbia, Canada, and the Michael Smith Foundation for Health Research.
The sequencing of the A/Wyoming/2003 genes was performed by R. Chow.
FOOTNOTES
- Received 14 December 2005.
- Returned for modification 1 March 2006.
- Accepted 24 July 2006.
- Copyright © 2006 American Society for Microbiology