Previous Article | Next Article ![]()
Journal of Clinical Microbiology, July 2004, p. 3240-3247, Vol. 42, No. 7
0095-1137/04/$08.00+0 DOI: 10.1128/JCM.42.7.3240-3247.2004
Copyright © 2004, American Society for Microbiology. All Rights Reserved.
Pathogen Evolution Group, Centre for DNA Fingerprinting and Diagnostics, Nacharam, Hyderabad 500 076,1 Central JALMA Institute for Leprosy, Tajgunj, Agra 282001,2 Mahavir Hospital and Research Centre, Mahavir Marg, Hyderabad 500 004,3 Blue Peter Research Centre, Lepra India, Cherlapally, Hyderabad 501 301,4 Department of Medicine, A.I.I.M.S., New Delhi 110029,5 Jawaharlal Nehru Centre for Advanced Scientific Research, Jakkur 560 064, India,8 Dipartimento di Scienze Biomediche, University of Sassari, B07100 Sassari, Sardinia, Italy,6 Johns Hopkins University School of Public Health, Baltimore, Maryland 212057
Received 9 February 2004/ Returned for modification 15 March 2004/ Accepted 25 March 2004
|
|
|---|
|
|
|---|
The study of the molecular epidemiology of tubercle bacilli changed after the availability of genome sequence data in the public domain (8). Recently, DNA microarrays have been used for comparative genomics with different M. tuberculosis clones for which clinical and epidemiological information was available (4, 10). The deletion patterns described recently suggest that there is substantial genomic variability among different M. tuberculosis genotypes in the world (13). It is likely that new genotypes do indeed exist but that they go largely unnoticed because of the nonavailability of high-resolution genomic tools. Juxtaposing host diversity with the bacterial population structure could throw some light on evolutionarily significant genomic events, such as the eukaryotic-prokaryotic gene fusion (9). Based on the presence or absence of an M. tuberculosis-specific deletion (TbD1), a new evolutionary scenario for the evolution of the M. tuberculosis complex and the origin of human tuberculosis has been proposed (7).
The currently available typing systems designed for molecular epidemiology (3) are not capable of classifying strains on the basis of the whole genome, including various evolutionary changes and random base substitutions. There is a need for a genome sequence-based (11) classification of predominantly rampant strains that will also assist global efforts aimed at controlling this deadly disease. These issues, coupled with our continued interest in the evolution of M. tuberculosis genotypic diversity (1, 2, 17), have led to the present study, which aimed at revisiting the global population diversity of M. tuberculosis. We analyzed the nature of the global diversity of tubercle bacilli, with an emphasis on evolutionary genomics, by whole-genome fingerprinting and genotyping of M. tuberculosis isolates from different world populations followed by analyses of large sequence polymorphisms representing two important deletions, TbD1 and Rd9 (7). We used fluorescent amplified fragment length polymorphism (FAFLP) typing (1, 2, 11, 17), a fluorescent form of AFLP analysis (24), as a stand-alone, portable approach with a valid phylogenetic basis (12, 15, 16). Polymorphisms in band patterns were mapped to specific loci, allowing the individual strains and isolates to be genotyped or differentiated based on the alleles they carry. The five major clusters representing the corresponding FAFLP patterns observed in this study should help us to understand the worldwide spread and partitioning of M. tuberculosis genotypes and the possible evolutionary background of this organism. Also, it should be possible to identify informative markers for the identification and epidemiology of different M. tuberculosis populations on a global scale. Such markers are undeniably the most highly resolving tools for studying host and environmental impacts on pathogen evolution and could be developed as genotype-phenotype databases (17).
|
|
|---|
|
View this table: [in a new window] |
TABLE 1. Distribution of M. tuberculosis strains according to FAFLP amplitypes
|
Computer-assisted genotypic analysis of FAFLP data. Based on computer modeling with the chosen restriction enzymes, the M. tuberculosis H37Rv sequence data (8) were divided into various categories based on sizes (in base pairs). Genotyper software (Applied Biosystems) was used on these categories to allow comparisons of the FAFLP fragment data for field isolates. Before this, the predicted fragments were used for homology searches at the Institute for Genome Research server (http://www.tigr.org) and locus information was obtained from the TubercuList database sever at the Institut Pasteur (http://genolist.pasteur.fr/TubercuList). Based on the presence or absence of monomorphic and polymorphic bands or peaks, different FAFLP profiles were identified as amplitypes. These amplitypes were color coded, tiled, and superimposed in various ways to estimate the marker size in base pairs, the fluorescence intensity (peak height), data points on the gel, and the frequency of monomorphic (unique) bands. The bands were sized and genotyped for all of the isolates within the user-defined categories of marker size in base pairs. The presence or absence of markers within the categories was scored by a user-defined Genotyper macro that generated final output in the form of a binary table for all of the samples. Phylogenetic trees were generated (1, 2) from the binary data to delineate divergence and relatedness among the amplitypes. All of the amplitypes were deposited in the AmpliBASE MT database (http://210.212.212.4/) (17).
Analysis of LSPs by PCR-based identification of TBd1 and Rd9 deletions. Representative samples from different FAFLP clusters were selected for large sequence polymorphism (LSP) analysis by PCR-based genotyping for the presence and absence of the TbD1 and Rd9 regions (7). Two hundred isolates originating from India (n = 96), France (n = 20), Australia (n = 13), Italy (n = 20), Peru (n = 21), and The Netherlands (n = 30) were subjected to deletion typing. PCR amplifications using genomic DNA samples were carried out with flanking as well as internal primers, as described by Brosch et al. (7).
|
|
|---|
95% band sharing) and were therefore analyzed according to the in silico restricted map of strain H37Rv.
Geographic differences in predominant M. tuberculosis genotypes.
The FAFLP data were phenetically analyzed for regional and geographic affinities. The distribution of various markers in the analyzed strains was determined by computer-assisted genotyping. As many as 36 important markers derived from the M. tuberculosis H37Rv chromosome were found to be nonuniformly distributed geographically and were found to be the basis of phylogenetic clustering. In addition to this, several new markers with no available locus information were amplified. This was more pronounced in the case of Indian strains. Genotypic analysis was therefore also extended to score for these new markers. Representative clustering data are shown in Fig. 1 and 2, and the full set of polymorphic markers is summarized in Table 2. In particular, type A motifs (amplitype A) predominated in strains from The Netherlands but were never seen in other European strains from France and Italy. A few (at least 10%) of the Indian strains also revealed similarity to the amplitype A strains from The Netherlands. Similarly, type B fragment sets were observed for Indian strains and some European strains. Almost all 800 strains tested from Indian patients invariably carried nearly similar genotypes and clustered very closely, with an average genetic distance of
20%. Strains from some parts of the Indian States of Gujarat and Rajasthan showed divergent amplitypes compared with the other isolates from India. However, this did not significantly alter the clustering pattern of type B strains. The type C amplitype was found to represent a more diverse group of strains, mainly those from Peru, Australia, Italy, and France. The average genetic distance within this cluster was as high as
50%. The Peruvian isolates formed a distinct subcluster within the type C cluster. A few of the Peruvian isolates clustered along with Vietnamese strains with IS6110 null genotypes. Canadian and Tanzanian isolates branched out as separate lineages within the type C cluster, but they were found to be genetically linked to the Australian strains. Italian strains clustering within type C all carried a single copy of the IS6110 element and were of a bovine origin.
![]() View larger version (48K): [in a new window] |
FIG. 1. Genescan analysis output for five different strains that are predominant in different geographical areas of the world. FAFLP profiles for each of the representative strains were termed amplitypes. These amplitypes were color coded and were superimposed or extrapolated to compare the data points representing important loci. The horizontal scale indicates the sizes of the traces in base pairs. Peak heights as a function of the amplification products are indicated by the vertical scale.
|
![]() View larger version (29K): [in a new window] |
FIG. 2. Phylogenetic clustering of M. tuberculosis isolates. A neighbor-joining tree was generated from differences in Genotyper output read as binary data. The lower scale represents the genetic distances between the isolates and/or amplitypes. MDR, multidrug resistant; TB, tuberculosis.
|
|
View this table: [in a new window] |
TABLE 2. Details of polymorphic FAFLP markers used for genome-wide sampling of biogeographic differences among predominant strains from across the world
|
Correlation between FAFLP analysis and IS6110 typing patterns. Many of the strains of Indian origin were low-copy-number IS6110 types, and about 10% of the strains carried only a single copy of IS6110. About 80% of the isolates were from North India, mainly from Delhi, Chandigarh, and Agra. The remaining 20% of the isolates were from Western India, mainly isolated from Jaipur and Ahmedabad. The division of the Indian cluster (type B) into distinct subclusters of low- and high-copy-number IS6110 strains was more remarkable. This was indeed based on specific genomic signatures revealed by high-copy-number IS6110 isolates as well as low-copy-number IS6110 isolates. The correlation of IS6110 profiles with the overall clustering of the strains we studied was very significant in the case of the Indian cluster. The microbiological, biochemical, and molecular genetic characteristics of many of the isolates from this cluster have already been published (19, 20).
FAFLP analysis also subdivided the large type C cluster into several subclusters. Many of the strains of this type had one to four copies of the IS6110 element. The branching out of the Canadian, Tanzanian, and Vietnamese strains was more remarkable, as these strains carried only one copy or no copies of the IS6110 element. Interestingly, however, strains of Italian origin carried only a single IS6110 copy, yet 15 of the 20 strains analyzed clustered with French strains which carried 8 to 14 copies of the element. Similarly, most of the strains isolated in The Netherlands displayed multicopy IS6110 profiles and these clustered together. This indicates that the clustering occurred mostly based on the pattern of chromosomal markers that have shown a strong geographic bias, so much so that the IS6110 copy number could not influence regional clustering to a large extent. This is a very significant observation in the context of the molecular epidemiology of M. tuberculosis vis-à-vis IS6110-based typing methods.
LSPs in different FAFLP clusters. LSP analysis using primers specific for flanking and internal regions of the TbD1 and Rd9 elements was performed with representative samples from India (amplitype B), Italy (amplitype E), Australia (amplitype C), Peru (amplitype C), and France (amplitype E). TbD1, as well as the Rd9 region, was found to be present in 13 of the 96 strains from India (13.54%). Similarly, these were also intact in 4 (20%) of the 20 French strains analyzed. Despite this similarity in the intactness of evolutionarily significant genomic landmarks (Fig. 3), the French and Indian strains clustered separately with geographically closer strains. Four of the 13 (30.76%) human-derived strains of the M. tuberculosis complex from Australia revealed the absence of the TbD1 and Rd9 regions from the genome. The TbD1 region was also absent from 3 of the 20 (15%) Italian M. bovis strains. These strains, however, did not cluster with the Australian strains. All of the Peruvian and Dutch isolates tested revealed normal TbD1 and RD9 profiles, but these isolates clustered separately by FAFLP profiling. Therefore, the TbD1 and Rd9 analysis clearly indicates that the repertoire of genomic rearrangements scanned by FAFLPs is far more specific than are LSPs. It appears that LSPs occur rather randomly in some (15 to 30%) strains across the continent, irrespective of the host status and geographic location.
|
View larger version (19K): [in a new window] |
FIG. 3. TbD1 LSPs in representative Indian and French isolates. PCR products of 500 bp indicate a deletion of the TbD1 region from the genome, whereas products of 2,500 bp indicate the intactness of the region.
|
|
|
|---|
In this study, variations in terms of band sharing, even up to a difference of 1 bp, indicated the discriminatory power of the technique. DNA sequencing of 36 of the 136 ± 1 FAFLP fragments revealed that the genomic diversity in our collection of strains was mostly due to base substitutions or deletions in important genes, such as those for aconitase, amidase, permease, several members of the PE and PPE family, putative regulators of transcription, efflux pumps, hypothetical proteins, conserved hypothetical proteins, etc. (Table 2). Our observations that genotypic differences were indeed due to the availability or abolishment of EcoRI and MseI sites support the idea that the rate of silent substitutions may be higher than was previously expected and that M. tuberculosis may not be evolutionarily as very recent as was previously proposed (21).
We did not analyze multiple-drug-resistant strains with candidate gene mutations separately in the context of resistance type-phenotype correlations, although such strains predominated in the Indian (19, 20) and Dutch (data not shown) clusters. Instead, they were subjected to FAFLP typing in a blinded manner and later analyzed for amplitype-specific, regional representations in the clusters. It would have been very exciting if such strains had carried potential resistance-linked markers. However, we do not know how significant such resistance-associated markers would have been in terms of influencing the current biogeographic clustering. Many of the strains that we analyzed (16) were already characterized by their IS6110 restriction fragment length polymorphisms and were supplied to us as blind-coded DNA samples. We found the IS6110 clusters of epidemiologically related strains to be further subdivided when FAFLP data were used for phylogenetic analyses. The subdivision of Indian clusters containing only one or zero IS6110 copies was more remarkable. One possible reason for geographically related strains to cluster together, independent of their IS6110 profiles, could be the fact that the FAFLP amplitype is essentially a sampled output of various base substitutions and specific deletions across the genome as a whole, taking into consideration various selection pressures that might have operated on the bacterial chromosome in totality, independent of the mobility of the IS6110 element. The amplitype, in its true sense, therefore, is a representative of base substitutions (or deletion events) in a given isolate and is not a reflection of the criteria used for the selection of isolates for the study.
Our analysis of LSPs in representative isolates from different FAFLP clusters was indeed suggestive of a deletion mechanism occurring as a random, not a universal, phenomenon. The evolutionary scenario proposed by Brosch et al. (7), however, is driven by the assumption that large genomic deletions are highly specific and precisely timed phenomena. This does not appear to be the case, however, at least for the >150 isolates that we analyzed. We studied a large representative collection of isolates from all different geographical regions, particularly the Indian, South American, and Australian regions. It is therefore apparent that LSPs had a negligible impact on clustering, which was rather dictated by regional geographic forces, including the environmental impact and host diversity (1), and these dominated over various insertion, deletion, and substitution events across the genome as a whole.
Our "phylogenomic" analyses of strains from diverse populations support the idea that M. tuberculosis may have undergone adaptive evolution as a result of selection at many loci. We suggest that the distinctiveness of M. tuberculosis genotypes in South Asia, Australia, and the West could be due to strain variants of M. tuberculosis that are particularly well adapted to patients in these regions. These variants possibly spread through different populations during different time periods and then adapted. Subsequent recombination has not been sufficiently frequent to break such geographical groupings. In another scenario, genetic rearrangements seem to have occurred on a global scale for most of the motifs studied, such as the 20 variable regions resulting from insertion-deletion events (6) on an evolutionary timescale (7) and some of the candidate gene polymorphisms (21), but the genome sequence has remained broadly conserved at the nucleotide level for some of the genes. In both cases, our data may be significant enough to speculate about the age of the M. tuberculosis genotype in South Asia and the West and to predict the time and circumstances under which M. tuberculosis possibly became associated with humans.
Our findings on genetic differences among the genotypes may be the topic of further investigations to find out the exact role of the polymorphisms detected by FAFLP. Much of our understanding of M. tuberculosis genome plasticity, pathogenicity, and dissemination dynamics is based on strains from industrialized countries. Independent of this, the genomic differences among various isolates from across the world at many loci, as highlighted in this study, should encourage large-scale genomic profiling of strains. Isolates that are indigenous to some of the understudied geographic regions that are relatively unaffected by homogenizing forces such as immigration, international travel, and business tourism should be analyzed on priority. Such a geographic genomics approach may unravel newer possibilities toward our understanding of mycobacterial pathogenesis and the host component. However, since environmental and host factors clearly contribute to the clinical and epidemiologic behavior of strains, these must be carefully integrated into the investigative process.
Financial assistance from the Department of Biotechnology, Government of India, the Council for Scientific and Industrial Research (CSIR), The World Health Organization (TDR), and the International Society for Infectious Diseases (ISID) is gratefully acknowledged. This work was supported by core grants to the CDFD from the Department of Biotechnology, Government of India.
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»