Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, Michigan,1 Department of Neurobiology and Developmental Sciences, College of Medicine, University of Arkansas for Medical Sciences,2 Central Arkansas Veterans Healthcare Center, Little Rock, Arkansas3
Received 11 May 2005/ Returned for modification 16 June 2005/ Accepted 5 July 2005
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
Differences among M. tuberculosis clinical isolates in restriction fragment length polymorphism patterns determined using PGRS probes and Western analysis patterns determined using cross-reactive antibodies demonstrate variation in size of both the PE_PGRS genes and the expressed PE_PGRS proteins (1, 14, 17), suggesting a possible role in antigenic variation. Comparison of the PE and PE_PGRS gene sequences between H37Rv and CDC1551 showed that all 37 of the PE genes were the same in both strains. However, 39 of the 62 common PE_PGRS genes had differences that would result in the absence of the protein due to frameshift mutations or a difference in size due to insertion-and-deletion events (1). Genetic variation of the PE_PGRS genes is thought to be mediated by deletions or insertions in the glycine-alanine repeats (3).
To date, most of the research on the PE_PGRS of M. tuberculosis has been done on PE_PGRS33 (Rv1818c). PE_PGRS33, which has a transmembrane domain, is associated with the cell wall and expressed on the cell surface during infection (7, 8). Transposon mutagenesis of the BCG homologue of the M. tuberculosis Rv1818c gene resulted in dispersed growth in liquid media and decreased ability to enter into or survive within macrophages. Complementation of the mutant with the wild-type Rv1818c gene restored the wild-type phenotype, suggesting that PE_PGRS33 may play an important role in the interactions with other mycobacterial cells as well as with macrophages (4).
The Epstein-Barr virus nuclear antigen 1, which has homology to the PGRS domain, interferes with antigen processing and presentation through the major histocompatibility complex class I pathway via a small glycine- and alanine-rich peptide (6, 11). DNA vaccine studies suggest that the PGRS domain of PE_PGRS33 may also be involved in the prevention of antigen processing and presentation of the PE domain in M. tuberculosis (7).
In comparison to the PE_PGRS33 of H37Rv, which contains 32 Gly-Gly-Ala-Gly-Gly repeats, the PE_PGRS33 of CDC1551 would show a loss of 30 amino acids and four glycine-alanine repeats at one position and the gain of three amino acids and one glycine-alanine repeat at another (3). However, genetic variation in the PE_PGRS33 gene and any other M. tuberculosis PE_PGRS gene among clinical isolates has not been characterized to date.
Because of the potential role of PE_PGRS in antigenic variation, the genetic diversity of the PE_PGRS genes among clinical isolates is of interest. Also, because PE_PGRS33 may be involved in prevention of antigen processing (3), genetic variation of the PE_PGRS33 gene among clinical isolates could potentially account for some of the differences in their ability to evade the host immune system. In order to gain a better understanding of the genetic basis of the interactions between M. tuberculosis and the host and the role of PE_PGRS in antigenic variation and evasion of the host immune system, we investigated the genetic diversity of the PE_PGRS33 gene among 123 clinical M. tuberculosis isolates.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Assignment of strains to three principal genetic groups. Genotypic grouping based on single-nucleotide polymorphisms (SNPs) found in the katG and gyrA genes, as described by Sreevatsan et al. (15), was also used to assess the genetic relatedness of the isolates. Codon 463 of the katG gene and codon 95 of the gyrA gene were PCR amplified using the BD Advantage 2 PCR kit (BD Biosciences Clontech, Palo Alto, CA). The primers used to amplify the portion of the katG gene were katGF (5'-AGC CGC CTT TGC TGC TTT CTC TA-3') and katGR (5'-TGC TGG CCA CTG ACC TCT CGC T-3'), located 547 bp and 1,094 bp upstream of the end of the katG gene sequence, respectively. The primers used to amplify the portion of the gyrA gene were gyrAF (5'-AAC CGG TTG ACA TCG AGC AGG AGA-3') and gyrAR (5'-ATT TCC CTC AGC ATC TCC ATC-3'), located 47 bp and 434 bp downstream of the beginning of the gyrA gene sequence, respectively. Each standard 50-µl reaction mixture consisted of 5 µl of 10x reaction buffer, 20 pmol of each primer in 2 µl, 1 µl of a 50x deoxyribonucleoside triphosphate mix, 1 µl of 50x BD Advantage 2 polymerase mix, 4 µl DNA solution containing 40 ng DNA template, and 35 µl PCR-grade water. The thermocycling program used was 1 cycle at 94°C for 1 min; 26 cycles of 94°C for 30 seconds, 68°C for 30 seconds, and 72°C for 1.5 min; and a final cycle at 72°C for 10 min. The PCR products were purified using a QIAquick PCR purification kit according to the manufacturer's instructions (QIAGEN Inc., Valencia, CA) and sequenced using the same primers used for PCR amplification.
PCR of the PE_PGRS33 gene. The PE_PGRS33 gene was PCR amplified for DNA sequencing using the BD Advantage-GC 2 PCR kit (BD Biosciences Clontech, Palo Alto, CA). We selected primers specific for amplification of the PE_PGRS33 gene using the BLAST program of the National Center for Biotechnology Information (www.ncbi.nlm.nih.gov/BLAST). The primers were PE_PGRS33F (5'-CTA CGG TAA CCC GTT CAT CCC-3'), located at the end of the PE_PGRS33 gene sequence, and PE_PGRS33R (5'-GCG CCC GCC GAA GTG TAA G-3'), located 152 bp upstream of the beginning of the PE_PGRS33 gene sequence. The inclusion of the 152-bp flanking region, positions 2062675 through 2062826 of the H37Rv complete genome sequence (NC_000962), allowed for further confirmation during sequence analysis that the sequences obtained were specific for the PE_PGRS33 gene. M. tuberculosis H37Rv was used as a positive control, and PCR-grade water was used as a negative control. Each standard 50-µl reaction mixture consisted of 10 µl of 5x reaction buffer, 5 µl of GC melt, 20 pmol of each primer in 2 µl, 1 µl of a 50x deoxyribonucleoside triphosphate mix, 1 µl of 50x BD Advantage 2 polymerase mix, 6 µl DNA solution containing 60 ng DNA template, and 23 µl PCR-grade water. The thermocycling program used was 1 cycle at 94°C for 1 min; 30 cycles of 94°C for 30 seconds, 64°C for 30 seconds, and 72°C for 2.5 min; and a final cycle at 72°C for 10 min. All PCR amplification was performed using a 96-well Perkin-Elmer thermocycler (P-E 960; Applied Biosystems, Foster City, CA). PCR products were examined by 0.8% (wt/vol) agarose gel electrophoresis performed with 1x Tris-borate-EDTA buffer.
Automated DNA sequencing. PCR products were sequenced to identify any insertions, deletions, or SNPs in the PE_PGRS33 gene sequence. The PCR products used for DNA sequencing were purified using a QIAquick PCR purification kit according to the manufacturer's instructions (QIAGEN Inc., Valencia, CA). DNA sequencing was first performed using the PE_PGRS33F primer and the PE_PGRS33R primer that were used for the PCR. After the completion of the first round of sequence analysis, a second round of sequencing was performed using the PGRS0660R primer (5'-CGG CGG AGA CGG CGG GTT GTT-3'), located 739 bp upstream of the end of the PE_PGRS33 gene sequence, to sequence the end of the PE_PGRS33 gene and also to confirm the SNPs found during the first round of sequencing. The primers PGRS0778F (5'-CAC CAA TAC CGC CCA CCC CAC CAC-3') and PGRS0778R (5'-GTG GTG GGG TGG GCG GTA TTG GTG-3'), located 857 bp upstream of the end of the PE_PGRS33 gene, were also used to confirm SNPs. All SNPs were confirmed by double-strand sequencing. Sequencing was performed in Applied Biosystems DNA sequencers (models 3700 and 3730 sequencers) at the Sequencing Core of the University of Michigan. The PE_PGRS33 gene sequences were compared to that of the M. tuberculosis reference strain H37Rv using the BLAST program of the National Center for Biotechnology Information (www.ncbi.nlm.nih.gov/BLAST).
Data analysis. A dendrogram illustrating the relationships between the 23 different PE_PGRS33 alleles found among the 123 isolates was constructed using MEGA version 3.0 (10). The dendrogram was generated using the neighbor-joining method, and the distance was calculated using the number of sequence variations found in each of the 23 PE_PGRS33 alleles. Deletions, insertions, and SNPs were included in the analysis, and each was scored as one change in sequence.
| RESULTS |
|---|
|
|
|---|
Genotypic grouping based on the SNPs found in the katG and gyrA genes, as described by Sreevatsan et al. (15), placed 16 (13.0%), 79 (64.2%), and 28 (22.8%) of the 123 isolates into principal genetic groups 1, 2, and 3, respectively. Groups 1 and 2 contained both low- and high-copy-number isolates; in contrast, the group 3 isolates were all high-copy-number isolates.
Genetic diversity of the PE_PGRS33 gene. Relative to the sequence of H37Rv, 84 (68.3%) of the 123 isolates had at least one sequence variation in the PE_PGRS33 gene, and these 84 isolates included both high-copy-number and low-copy-number strains. The 39 (31.7%) isolates that did not have any sequence variations were all high-copy-number strains, harboring more than five copies of IS6110. A total of 25 different sequence variations, designated A through Y, were observed. The observed sequence variations included three insertions, nine deletions, one insertion-and-deletion event, and 12 SNPs (Table 1).
|
|
SNPs in the PE_PGRS33 gene. Of the 12 SNPs observed, 9 occurred in the PGRS domain and 3 in the PE domain (Fig. 1). Of the nine SNPs found in the PGRS domain, four were synonymous and five were nonsynonymous. Three of the nine SNPs in the PGRS domain were located within a glycine-alanine repeat unit, defined as Gly-Gly-Ala-Gly-Gly (3). Of these three, one was synonymous, and the other two would result in a loss of a repeat and the gain of a repeat unit, respectively. Of the three SNPs that were found in the PE domain (sequence variations Q, R, and S), two were synonymous and one was nonsynonymous. Two of the SNPs found in the PE domain (sequence variations R and S) were in the same isolate in consecutive positions (Table 1).
Combinations of sequence variations among the 123 isolates. Thirty-nine (31.7%) of the 123 isolates had PE_PGRS33 gene sequences identical to that of H37Rv. Among the remaining 84 (68.3%) isolates, 22 different combinations of sequence variations were observed. Sixty-eight (81.0%) of the 84 isolates had more than one sequence variation (Fig. 2 and Table 2).
|
|
Relationships among the PE_PGRS33 sequence variations. Based on the analysis of SNPs in the katG and gyrA genes, M. tuberculosis strains can be placed in three principal genetic groups. Principal genetic group 1 is ancestral to groups 2 and 3, and principal genetic group 2 is ancestral to group 3 (15). A dendrogram illustrating the relationships among the 23 different alleles of the PE_PGRS33 gene shows a clear separation of principal genetic group 1 from group 3 but no separation between groups 1 and 2 or between groups 2 and 3 (Fig. 2).
Based on two SNPs, one in the katG gene and one in the gyrA gene, the M. tuberculosis 210 strain (a widespread member of the Beijing family), CDC1551, and H37Rv belong to principal genetic groups 1, 2, and 3, respectively. The 16 group 1 isolates all had either sequence variation A or N, and 14 (87.5%) of these 16 isolates had both sequence variations A and N, which are found in both the 210 strain and CDC1551. Among the 79 group 2 isolates, 50 (63.3%) shared the A and N sequence variations with the 210 strain and CDC1551, and 20 (25.3%) had PE_PGRS33 gene sequences identical to that of H37Rv. In addition, four (5.1%) of the group 2 isolates had PE_PGRS33 gene sequences identical to that of CDC1551. Of the 28 group 3 isolates, 19 (67.9%) had PE_PGRS33 gene sequences identical to that of H37Rv, and none had either the A or N sequence variations found in the 210 strain and CDC1551 (Fig. 2).
Effects of sequence variations on the resulting PE_PGRS33 amino acid sequence. The combination of sequence variations in each isolate was analyzed to examine the overall effect on the PE_PGRS33 gene product (Table 2). Some of the sequence variations occurring with sequence variation D (sequence variations T, P, and A) would not have any effect on the resulting amino acid sequence in these isolates because these sequence variations are downstream of sequence variation D, which results in a premature stop codon. Sequence variations T and P were found exclusively in isolates that also had sequence variation D, and, therefore, these sequence variations would not have any effect on the PE_PGRS33 amino acid sequence in any of the 123 isolates. Eighty-three (98.8%) of the 84 isolates having a sequence variation in the PE_PGRS33 gene would have a resulting PE_PGRS33 amino acid sequence different from that of H37Rv. The isolate containing sequence variation Y was the only isolate that had a sequence variation in the PE_PGRS33 gene but would have a resulting PE_PGRS33 amino acid sequence identical to that of H37Rv.
| DISCUSSION |
|---|
|
|
|---|
Based on the analysis of SNPs in the katG and gyrA genes, M. tuberculosis strains can be placed in three principal genetic groups. It is proposed that principal genetic group 1, containing the M. tuberculosis 210 strain, is ancestral to groups 2 and 3 and that principal genetic group 2, containing M. tuberculosis CDC1551, is ancestral to group 3, which contains M. tuberculosis H37Rv (15). The dendrogram in Fig. 2 identifies two major branches of the PE_PGRS33 alleles. Genetic group 1 and group 3 isolates are exclusively (except for one genetic group 1 isolate) in one of these two allele branches, while group 2 isolates fall into both of the two branches. The emerging genetic group 3 isolates are associated with new PE_PGRS33 alleles that are distant from alleles found in genetic group 1 isolates but still close to alleles found in genetic group 2 isolates. This suggests that genetic group 1 isolates and genetic group 3 isolates are evolutionarily linked through genetic group 2 isolates. The analysis of the genetic relationships among the different PE_PGRS33 alleles lends further support to the three principal genetic groups proposed by Sreevatsan et al. (15), because the analysis is based on all the sequence variations present within the PE_PGRS33 gene among clinical M. tuberculosis isolates, as opposed to previously determined markers based on comparison of the two sequenced genomes.
The positions of the sequence variations found within the PE_PGRS33 gene among the 123 clinical isolates could be informative of the importance of certain regions of the protein. The carboxy-terminal end of the PGRS domain and the transmembrane domain were highly conserved, suggesting that these regions may have an important function. However, although there were no sequence variations found within the 258 bp that encode the last 86 amino acids of the PGRS domain, sequence variation D would result in the loss of the last 124 amino acids resulting from the frameshift-mediated premature stop codon. Sequence variation D was observed in 16 (13.0%) of the 123 isolates; however, the frequency of this sequence variation in the population of M. tuberculosis has not been determined. Future studies that investigate the pathogenicity of isolates with this truncated PGRS domain in comparison to that of isolates without this sequence variation may be informative of the role of the different regions of PE_PGRS33 in pathogen-host interactions.
Both slipped-strand mispairing during replication and homologous recombination of repetitive sequences could potentially account for the high frequency of insertions and deletions found in the PGRS domain of the PE_PGRS33 gene (1, 13). With the exception of the frameshift mutation, the insertions and deletions found within the PGRS domain in this study did not change the reading frame. This suggests that PE_PGRS33 may be important to the survival of M. tuberculosis and that there is a selective pressure to maintain the reading frame of this gene. It is possible that genetic variation of the PE_PGRS33 gene is advantageous to the tubercle bacilli because it aids in escape from the host immune system, but changes that alter the reading frame are selected against because PE_PGRS33 has other essential functions, such as facilitating interactions with and surviving within macrophages.
Future studies are needed to investigate the effect of these PE_PGRS33 alleles on the interaction of the tubercle bacillus with macrophages and evasion of the host immune system. While transposon mutagenesis of the PE_PGRS33 gene results in a decreased ability of M. bovis BCG to enter into or survive within macrophages (4), it is not clear what effect sequence variations in the PE_PGRS33 gene found among M. tuberculosis clinical isolates have on interactions with and survival within the host. Some of the PE_PGRS33 gene sequence variations found among clinical isolates could potentially account for some of the differences among strains in their ability to evade the host immune system and can serve as a basis for future investigations of the differences in function of M. tuberculosis PE_PGRS33 among clinical isolates. Studies of the associations between the genetic polymorphisms of the PE_PGRS33 gene and the clinical phenotypes of the isolates will also generate information useful for the development of new vaccines and diagnostic and therapeutic agents.
| ACKNOWLEDGMENTS |
|---|
We thank Dong Yang for her assistance in DNA preparation and M. tuberculosis culturing.
| FOOTNOTES |
|---|
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Antimicrob. Agents Chemother. | Clin. Microbiol. Rev. |
|---|---|
| Clin. Vaccine Immunol. | ALL ASM JOURNALS |
|---|