Previous Article | Next Article ![]()
Journal of Clinical Microbiology, January 2007, p. 39-46, Vol. 45, No. 1
0095-1137/07/$08.00+0 doi:10.1128/JCM.02483-05
Copyright © 2007, American Society for Microbiology. All Rights Reserved.
,
Division of Infectious Disease, Department of Medicine and the Ruy V. Lourenço Center for the Study of Emerging and Re-emerging Pathogens, New Jersey Medical School, University of Medicine and Dentistry of New Jersey, Newark, New Jersey,1 National Food Safety and Toxicology Center, Michigan State University, East Lansing, Michigan,2 The Institute for Genomic Research, Rockville, Maryland3
Received 30 November 2005/ Returned for modification 26 January 2006/ Accepted 10 October 2006
|
|
|---|
|
|
|---|
Other studies have suggested that LSPs do have a critical role in M. tuberculosis pathogenesis. Clinical M. tuberculosis LSPs had low consistency indices when incorporated into a phylogeny composed of single-nucleotide polymorphisms (SNPs), LSPs, and clinical parameters (16). Furthermore, investigations of clinical strains have detected three apparent genomic "hot spots" for insertion of IS6110 and associated chromosomal deletions (13, 14, 24, 30). Genomic analysis indicates that LSPs almost always include segments of open reading frames (16, 23), although this may be due to a paucity of noncoding regions in the M. tuberculosis genome (7). Finally, Yang et al. (33) recently showed that clinical M. tuberculosis isolates with deletions in the plcD gene (one of the known deletion hot spots) are indeed phenotypically different, exhibiting a twofold increased risk of causing extrapulmonary tuberculosis. Taken together, these results indicate that some LSPs have evolved repeatedly in the radiation of M. tuberculosis and suggest that LSP-associated indels provide a selective advantage to certain M. tuberculosis strains.
Unfortunately, the rates of indels underlying M. tuberculosis LSPs cannot be conveniently measured in the laboratory. This makes it difficult to differentiate experimentally between mutations that are UEPs and mutations that have a tendency to occur repeatedly. Phylogenetic analysis makes an alternative approach available. Clinical strains containing a particular indel can be mapped onto a phylogenetic tree and then examined to determine whether or not they can be traced to a single ancestral event. Indels that have arisen independently multiple times in the population may have significant biological roles. Mutations that appear to have a single origin are more likely to represent UEPs that are evolutionarily neutral (2). A variation of this approach was undertaken in prior LSP studies (16, 23). However, these previous studies also used the LSPs themselves as markers in the phylogenetic tree construction. Furthermore, the largest of these studies defined distinct LSPs quite strictly, choosing to analyze LSPs separately if they had different deletion sites, even if the deletions mapped to identical or overlapping genes. In investigating the biology of LSPs, we propose that it is more important to categorize LSPs according to the gene or genes that are deleted (or inserted) rather than by the exact location of the indel site. This is because the effect of an LSP on microbial phenotype is more likely to be due to the genes that are disrupted or otherwise affected by the LSP rather than the exact indel sites where the LSP occurred. Therefore, we have favored a less restricted definition of LSP that is based on the presence or absence of a gene region rather than on the presence or absence of a specific deletion.
In this report, we present a phylogenetic analysis of gene deletions found in M. tuberculosis LSPs, using an "unequivocal" phylogenetic tree constructed with synonymous SNP markers. We present phylogenetic evidence that many of the gene regions contained within LSPs have been deleted (or possible inserted) multiple times as separate events in the history of M. tuberculosis divergence, and we identify several possible mechanisms for these genomic changes. Our results suggest that LSPs represent an important mechanism of genetic variation in M. tuberculosis and indicate that further investigations into the functional relevance of LSPs may provide insights into M. tuberculosis pathogenesis and immunity. LSPs, as defined in our study, that recur independently with high frequency may be precluded as phylogenetic markers.
|
|
|---|
LSP identification.
Eighty-six LSPs larger than 10 base pairs were identified by comparing the genomes of M. tuberculosis strains CDC1551 and H37Rv in a previous investigation (16). Seventeen LSPs were further studied: LSPs 1 through 12 were selected from sequences that were present in CDC1551 but absent from H37Rv; LSPs 13 to 17 were selected from sequences that were present in H37Rv but absent from CDC1551. DNA probes were then prepared for one gene in each LSP by PCR (16). We limited our study to 17 LSPs because of the technical complexity of studying each LSP in large numbers of M. tuberculosis samples. The coordinates for each probe and primer were described in this previous work. Approximately 2 µg of genomic DNA from the clinical M. tuberculosis isolates or CDC1551 and H37Rv was suspended in 2x SSC (1x SSC is 0.15 M NaCl plus 0.015 M sodium citrate) at a final volume of 200 µl. Each sample was boiled for 5 min and then cooled on ice. A multislot hybridization apparatus (Immunoblotter; Immunetics, MA) was assembled as per the manufacturers recommendations with the modification that the cushion was replaced with five pieces of dry 3-mm Whatman paper underneath one piece of 1-mm Whatman paper soaked in 2x SSC. A prewetted Biotrans Plus nylon membrane (ICN Pharmaceuticals, CA) was placed on top of the thin Whatman paper. The apparatus was assembled, and the cooled genomic DNA was bound in longitudinal strips onto the membrane by rapidly loading the DNA mixture into the apparatus. Bubbles were avoided inside the apparatus by loading a slight excess volume of DNA solution. The apparatus was then disassembled, and the membrane was removed, rinsed in 2x SSC, and then cross-linked with UV light. For identification of the LSPs present in each DNA sample, the membrane was prehybridized for 1 hour in Rapid Hyb buffer (Amersham, CT) at 69°C in a hybridization oven. The still-wet membrane was then reinserted into the multislot hybridization apparatus at 90°C from its previous orientation, using the manufacturer's cushion instead of Whatman paper (described above) to seal the apparatus. Each slot was then loaded with approximately 200 µl of boiled and then rapidly ice-cooled hybridization buffer containing
-32P-labeled probes for the 17 LSPs. The openings of the apparatus were sealed with Parafilm, and the apparatus was incubated at 69°C with occasional gentle rocking for 2 h. The Parafilm was carefully removed, unhybridized probe was sucked out of each hybridization well using a vacuum attached to the wash device supplied by the manufacturer, and each slot was washed (again using the vacuum wash device) with 2x SSC. The apparatus was then dissembled; the membrane was washed one more time in 2x SSC and three times in 0.1x SSC at 69°C and then exposed on film. Using this protocol, 44 different genomic DNA samples could be slotted in an array consisting of 44 lines extending across the membrane. Hybridizing of probes for each LSP at a 90° angle to this array permitted every probe to come into contact with every genomic DNA sample. The presence of a particular LPS in a DNA sample was determined by examining the developed autoradiogram for dark spots. An example of a LSP blot has been shown previously (16).
SNP identification. We had previously identified six SNP markers that were sufficient to classify a global M. tuberculosis collection into seven phylogenetically distinct "SNP cluster groups" (SCGs) (15). For the current study, we selected a different set of nine SNP markers that enabled us to further subdivide the SCGs into subgroups (SC subgroups), for a total of seven SCGs and five SC subgroups (Table 1). All of the study samples were then tested at the nine SNP loci by using hairpin primer assays as described previously (22) (Table 2), and the alleles were determined.
|
View this table: [in a new window] |
TABLE 1. SNP set used to assign the SCGs and SC subgroups
|
|
View this table: [in a new window] |
TABLE 2. Genome locations and hairpin assay primers used for the nine-SNP set
|
![]() View larger version (21K): [in a new window] |
FIG. 1. Phylogeny of the M. tuberculosis study isolates. M. tuberculosis isolates were assigned to each SCG or SC subgroup based on SNP alleles at nine loci. The SCG and SC subgroup designations had been defined in a previous work (15). The number of study strains and the number of clinical isolates, as defined by identical RFLP patterns, are shown for each location on the tree. The locations of the three M. tuberculosis reference strains (H37Rv, CDC1551, and 210) and one M. bovis strain (M. bovis AF 2122/97) with sequenced genomes are also shown.
|
|
|
|---|
We selected 17 M. tuberculosis LSPs from a larger set of previously identified LSPs (16) to study their distribution on the strain phylogeny (Table 3). The distribution of these LSPs has not been previously examined in a set of phylogenetically characterized clinical strains. Three LSPs (LSPs 10, 11, and 13) were located near two IS1547 elements, which are known to be "hot spots" for IS6110 insertions (17). Each M. tuberculosis isolate was examined for the presence or absence of each of the 17 LSPs by probing for an internal DNA sequence. All of the LSPs were then mapped onto the phylogenetic tree. We found that the majority of LSPs did not appear to be UEPs. Unlike the distribution of the selectively neutral SNPs shown in a previous report (2), only four of the 17 LSPs studied (LSPs 1, 9, 13, and 16) (Fig. 2A) were situated on the phylogenetic tree such that their presence could be explained by a single event in a common ancestor. We have called these LSPs group A LSPs in subsequent discussions. Two other LSPs (LSPs 12 and 14) appeared to have occurred independently at least two times (Fig. 2B). We have called these LSPs group B LSPs. The remaining 11 LSPs (LSPs 2, 3, 4, 5, 6, 7, 8, 10, 11, 15, and 17) were situated on the phylogenetic tree such that they could not have arisen from a single common ancestor and must have arisen independently multiple times (Fig. 3). These LSPs were renamed group C LSPs.
|
View this table: [in a new window] |
TABLE 3. LSP groups and their attributes
|
![]() View larger version (9K): [in a new window] |
FIG. 2. Distribution of group A and group B LSPs on the SNP tree. M. tuberculosis strains containing each designated LSP are indicated next to each tree branch. Numbers refer to the total number of strains with the indicated LSP/total number of isolates with the indicated LSP. Thick lines are used to indicate the phylogenetic location of a hypothetical common ancestor in which the LSP first occurred and its progeny. (A) All group A LSPs in the study. (B) All group B LSPs in the study. The locations of the SCG and SC subgroups of these trees as well as the total numbers of strains and isolates present in each SCG and SC subgroup can be found in Fig. 1.
|
![]() View larger version (15K): [in a new window] |
FIG. 3. Distribution of group C LSPs on the SNP tree. M. tuberculosis strains containing group C LSPs in this study are shown. Numbers refer to the total number of strains with the indicated LSP/total number of isolates with the indicated LSP. Thick lines are used to indicate the phylogenetic location of a hypothetical common ancestor in which the LSP first occurred and its progeny. The locations of the SCG and SC subgroups of these trees as well as the total numbers of strains and isolates present in each SCG and SC subgroup can be found in Fig. 1.
|
The LSPs associated with IS6110 in the reference strains did not occur at a higher frequency than other LSPs. The phylogenetic analysis of each LSP (Fig. 2 and 3) suggested that there were 65 independent LSP events in the 165 M. tuberculosis isolates (Table 3) (this population contained many more LSPs, but a group of phylogenetically related isolates with the same LSP were considered to constitute one LSP event). Approximately one-third (6/17) of the LSPs studied were associated with IS6110, and these LSPs were associated with 27/65 (42%) of the independent LSPs in the population. This did not differ significantly from the approximately two-thirds (11/17) of the LSPs studied that were not associated with IS6110 in the reference strains. These LSPs accounted for at least 38/65 (58%) of the independent LSPs.
Twelve of the 17 LSPs in this study represent sequences that are absent in H37Rv but present in CDC1551 (although LSP 6 appears to be present in some H37Rv isolates and must therefore have been deleted recently in a subset of H37Rv isolates in experimental use [16]). Each of these LSPs was also found to be missing in at least one clinical isolate, demonstrating that the H37Rv LSPs did not include unique deletion events that might have occurred as a consequence of a prolonged in vitro culture.
Confirmation of LSP identification and variability within IS6110-defined clusters. It was important to ensure that the results of this study were not due to artifacts of the LSP identification process. Inconsistencies in detecting LSPs could make it falsely appear as if LSPs were occurring repeatedly as independent events. Repeated probing of the same strain gave identical LSP results, suggesting that the LSP identification process was sound. We also examined strains that were identical by IS6110 RFLP analysis to determine if these closely related strains contained the same LSPs. We found only six instances, in 17 clusters involving 66 isolates, where two isolates within a cluster did not have exactly the same LSP pattern. In each of these cases, only 1 of the 17 LSPs was discordant between the isolates. Furthermore, all six of the mismatched LSPs were group C LSPs (four were LSP 6, one was LSP 2, and one was LSP 10). These results suggest that the small variation in LSP patterns that we observed within isolates of a cluster is due to the propensity of M. tuberculosis to develop independent deletions in these regions. The exact time frame of LSP generation cannot be deduced from this study because the epidemiological connections among the clustered isolates were not well characterized in our data set. Prior reports suggest that differences in LSP patterns are not observed among RFLP-identical isolates with known epidemiological links (16). However, these results do strongly suggest that different LSPs are generated at different rates.
Phylogenetic analysis of M. tuberculosis populations by using LSP markers. LSPs appear to be useful phylogenetic markers for studies of M. tuberculosis (20, 23, 28), especially when the specific identity of each LSP can be confirmed by sequencing the ends of each deletion (23). End sequencing makes it possible to identify which deletions within a similar genome region are, in fact, independent deletions. However, large-scale sequencing of deletion sites is not practical, and even PCR-based identification of specific deletion sites may be difficult if LSPs of similar sizes occur near the same genomic locus. We studied the ability of the LSPs identified in this study to accurately describe phylogenetic relationships among M. tuberculosis isolates. Each of the 163 clinical isolates (H37Rv and CDC1551 were not included in this analysis) were classified into one of 58 LSP types, based on the pattern of LSPs that were present (see Table S1 in the supplemental material). Each LSP-T was then located on the SNP tree, and the proximity of all of the isolates with the same LSP-T was examined. LSP-Ts that placed M. tuberculosis isolates together in a manner that was consistent with the SNP tree would be considered good phylogenetic assignments. LSP-Ts that conflicted with the SNP tree would represent inaccurate assignments. Our results showed that LSP-Ts situated most of the M. tuberculosis isolates on the same or an adjoining branch of the SNP tree (Fig. 4). However, six of the LSP-Ts incorrectly grouped isolates together that were more distantly related according to the SNP tree (Fig. 4, LSP-Ts 3, 5, 7, 9, 14, and 15). Many of the LSP-Ts contained only a single M. tuberculosis isolate. We performed a secondary analysis restricted to commonly occurring LSP-Ts by eliminating LSP-Ts that contained fewer than two isolates. This analysis reduced the study to 31 LSP-Ts and 137 isolates. We found that 6/31 (19%) of the LSP-Ts that contained two or more isolates continued to produce important conflicts with the SNP tree. These results confirm our findings with the total study sample.
![]() View larger version (22K): [in a new window] |
FIG. 4. Locations of LSP-Ts on the SNP tree. The locations of clinical M. tuberculosis strains identified by LSP-T are shown relative to the location of each SCG and SC subgroup on the SNP tree. Colored LSP-Ts and connecting lines indicate LSP-Ts that are present on multiple SNP tree branches. Tree not drawn to scale.
|
|
|
|---|
Group C LSPs are much more variable and appear to have been generated by at least two mechanisms. Forty-five percent of the group C LSPs were flanked by IS6110 transposable elements on at least one side of a reference strain. The presence of IS6110 in proximity to LSP regions that are not present (and likely to be deleted) in other isolates suggests that recombination between nearby IS6110 elements produced a deletion, creating the LSP. IS6110 transposition events may be advantageous, neutral, or detrimental to the bacterial cell depending on the genes involved. Yang et al. (33) have shown that plcD deletions (LSP 4, a group C LSP flanked by IS6110 in our study) do indeed affect bacterial phenotype, in this case showing a strong association with extrapulmonary tuberculosis. This work supports the hypothesis that the variation associated with group C LSPs affects bacterial phenotype (although it is unclear whether an extrapulmonary phenotype should be considered selectively advantageous); it also provides further evidence that IS6110 is a contributing force driving genetic diversity in the M. tuberculosis complex. Indeed, as IS6110 may also be present in the clinical isolates at sites where H37Rv and CDC1551 do not contain IS6110, this element may be playing an even more pivotal role. We speculate that the group C LSPs have occurred under positive selective pressure, and these deletions (LSPs) enhance transmission and other virulence features of M. tuberculosis. Day et al. have demonstrated similar events in Shigella strains, where parallel losses of the cadA locus in different lineages of Shigella were found to be pathoadaptive (8). An alternative hypothesis is that group C LSPs represent highly unstable genomic regions that are repeatedly deleted because the genes encompassed by these LSPs are nonfunctional. Under these circumstances, the repeated loss of these genes could reflect a selective advantage for loss of nonfunctional DNA. However, observations in other bacteria suggest that deletion of nonfunctional DNA is a progressive occurrence that begins with mutation of nonfunctional genes into pseudogenes and is only later followed by a series of deletion events (21). In M. tuberculosis, there is no evidence that any of the deleted genes have mutated to pseudogenes. One of the group B and three of the group C LSPs were in PPE genes that others have speculated may be involved in immune variation and evasion (6, 9, 12). Their recurrent deletion in different M. tuberculosis lineages is consistent with the hypothesis that these are escape mutants created by silencing these gene products during the course of infection of mammalian hosts.
Our findings do not directly contradict the work of Hirsh et al. (23), which suggested that virtually all LSPs were unique evolutionary events. First, this prior investigation excluded LSPs originating or terminating in PPE genes, whereas these LSPs were included in our study. Second, our investigation included regions that were deleted in H37Rv relative to the genome of CDC1551. Hirsh et al. examined only regions that are missing in clinical isolates relative to the genome of H37Rv. Finally, we used a hybridization-based approach to identify the presence or absence of genomic regions known to be encompassed by LSPs. In contrast, Hirsh et al. sequenced across each end of the LSP, confirming the exact deletion sites and distinguishing among similar deletion events. It is likely that a reanalysis of this previous work would demonstrate that many LSPs overlap, differing only at the specific deletion sites, and would confirm our observation that many genomic regions were likely to be deleted independently.
Other investigators have suggested that LSPs can provide an accurate genetic marker system for molecular epidemiological and evolutionary studies of M. tuberculosis (20, 28). Our results suggest that LSPs may be informative markers in situations where discrimination of strains is the main objective. However, phylogenetic inference will be complicated by the multiple origins and parallel evolution of many LSPs, which will generate incompatibilities with other phylogenetic markers such as SNP loci. The extent to which this problem can be alleviated by direct sequencing of LSP deletion sites requires further study.
In summary, this work demonstrates that LSPs are predominately genomic deletions that result in an unexpected degree of genomic plasticity in clinical M. tuberculosis isolates. At least one-third of the plasticity in specific genomic regions appears to involve recombination between IS6110 elements in the region. The repeated evolution of some LSPs suggests that these polymorphisms are a critical source of genetic variation that is adaptive and may underlie variation in virulence among M. tuberculosis strains; however, this is difficult to test and warrants future investigational studies of pathogenicity and immunity.
Published ahead of print on 1 November 2006. ![]()
Supplemental material for this article may be found at http://jcm.asm.org/. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»