Previous Article | Next Article ![]()
Journal of Clinical Microbiology, December 2003, p. 5640-5644, Vol. 41, No. 12
0095-1137/03/$08.00+0 DOI: 10.1128/JCM.41.12.5640-5644.2003
Copyright © 2003, American Society for Microbiology. All Rights Reserved.
MRC Centre for Molecular and Cellular Biology, Department of Medical Biochemistry,1 Department of Paediatrics and Child Health, Faculty of Health Sciences, Stellenbosch University, Tygerberg, South Africa,2 Department of Medicine, McGill University Health Centre, Montreal, Quebec, Canada3
Received 6 May 2003/ Returned for modification 8 July 2003/ Accepted 26 August 2003
|
|
|---|
|
|
|---|
It is generally accepted that in order for a marker to be useful for epidemiological tracking, its rate of evolution must be low enough that strains from epidemiologically linked cases will have identical RFLP patterns (recent transmission) (2, 7, 14), while the rate of evolution must be sufficiently rapid to enable the discrimination of strains from closely related cases from those from more distantly related cases (i.e., transmission versus reactivation). Strains with unique RFLP patterns therefore reflect reactivations of dormant infections, strains from an outside community, or strains transmitted from unidentified sources (2, 7, 14). This scenario, while convenient, is an oversimplification, as it ignores the possibility that as an organism multiplies within its host it may give rise to clonal variants of itself (6, 11, 22, 23), characterized by minor changes in the IS6110-based RFLP banding pattern. These evolved strains may, in turn, be transmitted to new hosts (12, 21), leading over time to increasing genetic diversity in the bacterial population.
By assuming a constant rate of mutation, genetic distance (GD), defined as the number of mutational events separating two strains, may be regarded as an indicator of the evolutionary time since their divergence from a common ancestor. Thus, a high degree of similarity between two strains implies close temporal coupling. Conversely, with a longer evolutionary time since divergence, the probability of accumulation of mutations will be higher and therefore the GD will be greater. Studies of the stability of IS6110 fingerprints have demonstrated a half-life on the order of 2 to 3.2 years (6, 11, 23). Those investigators concluded that this rate of change is sufficiently low to facilitate epidemiological tracking. However, the calculated rate is also high enough to significantly influence the interpretation of relatedness in molecular epidemiological studies, especially those in which the study duration is similar to or longer than the half-life of the marker system. Given this nonzero rate of IS6110 fingerprint variation, clonal variants that appear within a relatively short time frame and whose patterns differ by a few bands may represent recently evolved strains and therefore may be regarded as constituting an ongoing transmission chain (12, 21, 23). In a recent study (21) we found evidence for this and reported that the manifestation of evolution was associated with transmission. We suggested that these events probably reflect the overall evolutionary dynamics of the bacterial population in the study setting. Failure to account for recent evolution in assessing epidemiological transmission may therefore hinder our understanding of the factors driving the epidemic. This is the case for most algorithms used at present, which regard clonal variants to be part of independent transmission chains or reactivation events (2, 7, 14, 19).
In the study described here we have examined the effect of evolution within transmission chains on the interpretation of M. tuberculosis molecular epidemiological data. We used a systematic approach based on interstrain GDs to group strains into molecular superclusters representing chains of ongoing transmission. Analysis of these data indicates that the rate of evolution of M. tuberculosis strains remains constant and is largely independent of the IS6110 copy number. We estimated the amount of ongoing transmission to be at least 20% higher than that predicted by more established methods (14). Our results show that the incorporation of evolution in the algorithms quantifying the extent of ongoing transmission may have a profound influence on our understanding of the dynamics of the disease, with consequent implications for epidemiological control strategies.
|
|
|---|
Study population. From mid-1992 to December 1998, M. tuberculosis isolates were collected from patients residing in and attending health care clinics in two adjacent suburbs in Cape Town, South Africa (3). Approximately 350 new bacteriologically confirmed adult cases of tuberculosis per 100,000 population are reported in this community each year. Prior to 1996, all patients were treated at one of the primary care clinics by directly observed therapy, although there was no systematic surveillance for cure rates. In 1996 the World Health Organization directly observed therapy strategy was implemented with all its attendant requirements, resulting in the availability of cure rates.
DNA fingerprinting. Each isolate was classified by DNA fingerprinting by the internationally standardized protocol (16, 20). The resulting autoradiographs were scanned and analyzed with the GelCompar II program (Applied Maths, Sint-Martens-Latem, Belgium). M. tuberculosis isolates with less than six IS6110 bands were excluded from the study, as it has previously been shown that the IS6110 banding patterns in these strains show very little diversity (21), precluding their use in epidemiological tracking (14). A total of 168 isolates suspected of being cross-contaminants (17) or identified as nontuberculosis mycobacteria were excluded from the study.
GD. The RFLP fingerprints were aligned by use of the GelCompar program to maximize the number of matching bands between each fingerprint pair, with tolerance parameters allowing a 6% shift in each pattern as a whole and a 0.4% variance in individual band positions. This yielded 332 strains (as defined by distinct IS6110 patterns) with more than five IS6110 elements from 708 disease cases. Cases from the same patient were considered distinct if the IS6110 patterns of the strains differed by more than four bands. An exhaustive pairwise comparison of each IS6110 banding pattern was performed by using a band-matching algorithm (GelCompar II) to generate a GD matrix. This consisted of an N-by-N table of the number of mismatched IS6110 bands between each pair of strains. The results were imported into a Microsoft Access database as a table of strain pairs with their corresponding GDs. On the basis of the assumption that recent evolutionary events are represented by a maximum of four banding pattern differences (6, 21), strain pairs with GDs of less than or equal to four were assigned a putative transmission status of source or secondary on the basis of the order of appearance in the community. These assignments were made subsequent to the application of the following filters. (i) Strains occurring only in patients who were <12 years of age or who did not present with pulmonary tuberculosis were excluded as possible sources, as they were considered unlikely to transmit the bacillus (1). (ii) Strain pairs for which the time interval between the last case caused by the source strain and the first case caused by the secondary strain was greater than a defined interval (2 or 5 years) were excluded. These intervals were arbitrarily chosen to represent the minimum and maximum periods within which progression to active disease would be considered recent infection. For each remaining secondary strain, the source with the nearest GD (NGD; i.e., the one that was the most similar) was selected as the most probable candidate. When two or more possible source strains had the same NGD, candidacy was equally apportioned between them.
To assess the propensity of the result of RFLP analysis with the IS6110 probe to change as a function of the number of insertion elements present in the genome, we determined the number of variant strains produced by source strains, which were categorized by the number of IS6110 bands, as a proportion of all new cases attributable to those source strains.
Estimation of recent transmission. To calculate the extent of ongoing transmission, strain pairs were linked together into transmission chains on the basis of common source or secondary strain type by using a custom-written Perl script (the source code is available at http://www.sun.ac.za/med_biochem/). This process was performed for each maximum NGD in the range of 0 to 4. Isolates belonging to strain pairs with NGDs less than or equal to the chosen limit were grouped into superclusters. The ongoing transmission rate was determined by the formula (N - S)/T, where N is the number of cases grouped in superclusters, S is the number of superclusters, and T is the total number of cases in the study. Because the probability of detecting a primary index case diminishes with increasing temporal proximity to the study commencement date, we formulated an alternative estimate of ongoing transmission in which the calculation of S was limited to those superclusters initiated after the first 18 months of the study. Thus, ongoing transmission is defined as (NE + NL - SL)/T, where NE is the number of cases in superclusters initiated within the first 18 months of the study, and NL and SL are the number of cases in superclusters initiated after the first 18 months of the study and the number of superclusters into which they are grouped, respectively.
|
|
|---|
Pairwise analysis of the 332 high-copy-number IS6110 banding patterns was used to quantify the number of differences between all possible strain pairs. From this data set a total of 3,019 strain pairs with less than four band differences were identified, of which 1,168 fulfilled the case inclusion criteria and reflected NGD pairs. In this study, we have assumed that NGDs of 1 to 4 reflect recent evolutionary events, as previous studies have shown that up to four IS6110 banding pattern changes may occur during recent transmission (12, 21) or persistent disease (11, 22, 23).
Analysis of these strain pairs showed that the propensity to evolve by either one or two mutational events (NGD = 1 or 2) appears to be independent of the number of IS6110-hybridizing bands present in the source strain (Fig. 1). This result differs from previous assumptions, which have suggested that the rate of change was proportional to the number of IS6110 elements in the source strain (13, 15). IS6110-mediated mutational events generating three or four banding pattern changes (NGD = 3 or 4) were also found to occur at a constant frequency in strains with 8 to 16 IS6110 insertions (Fig. 1). However, such events were largely absent from strains with 17 to 25 IS6110 insertions (Fig. 1), suggesting that multiple transposition events do not occur or are selected against in strains very high IS6110 copy numbers.
![]() View larger version (26K): [in a new window] |
FIG. 1. Numbers of variant strains produced as a proportion of observed transmission events from source cases with a defined number of IS6110 bands. Variant strains were produced by the loss or the gain of IS6110-hybridizing bands. Values are for strain pairs with NGDs equal to 1, 2, 3, and 4. The data shown are for a maximum interstrain interval of 2 years.
|
![]() View larger version (15K): [in a new window] |
FIG. 2. Frequency of appearance of new variant strains in the community as a proportion of observed transmissions over time elapsed since the study epoch. The data shown are for a maximum interstrain interval of 2 years and are plotted as a 5-month moving average.
|
|
View this table: [in a new window] |
TABLE 1. Estimates of the rate of variant strain production as a proportion of new cases due to transmission appearing per month
|
|
View this table: [in a new window] |
TABLE 2. Degree of superclustering and standard and alternative calculations of recent transmission for various allowable ranges of NGD for 2- and 5-year maximum interstrain intervals
|
|
|
|---|
In this study we have used an algorithmic method to explore the implications of IS6110-based RFLP pattern evolution on the understanding of an epidemic in a community with a high incidence of tuberculosis. Closely related M. tuberculosis strains were linked by NGD. In contrast to previous studies (13), this method of analysis shows that IS6110 evolution is independent of the number of IS6110-hybridizing bands present in the source strain. Consequently, GD is purely a measure of the number of band mismatches. However, a number of assumptions have been made in the interest of simplification in the calculation of GD. First, the loss and the gain of bands have been assumed to occur at the same rates, and therefore, a loss or a gain was assigned an equal GD. Since >60% of IS6110 fingerprint changes occur by replicative transposition, a more refined method might assign a higher GD to band loss. Second, a band shift has been counted as two events, i.e., a combination of a loss and a gain. The true frequency of this type of event is obscured by multiple events but is probably sufficiently rare to validate this assumption.
Analysis of the NGD data shows that closely related variant strains are appearing in the community at a constant annual rate. Thus, for a maximum NGD of 1 or 2 (by use of a 2-year interstrain interval), we found that 14 to 24% of transmission events produced variant strains. This value is similar to an estimate obtained in a previous study (21), in which we found that approximately 18.6% of transmission events within households generated a variant strain. The high proportion of newly evolved variant strains confirms that the M. tuberculosis strain population is diversifying at a constant rate in the study setting and that the process of its evolution is linked to transmission, significantly influencing molecular epidemiological calculations.
Consequently, studies depending on the stratification of cases according to genetic identity will underestimate the extent of transmission or incorrectly group cases for risk factor analysis. The factoring of NGD into clustering calculations suggests that transmission estimates may be 20% higher than those predicted by previously accepted formulae (14). However, the accuracy of this calculation is influenced by a number of factors. (i) The number of source cases initiating transmission chains affects the calculation. To minimize the overestimation of the number of primary index cases, we proposed an alternative formula in which it is assumed that clusters identified in the first 18 months of the study were initiated by source cases occurring prior to the onset of the study. (ii) The estimate assumes a 100% sample recovery. In this study only 70% of cases were included, and therefore, possible source or secondary cases may have been missed, leading to an underestimate of transmission (10). (iii) At present, there are few data on the extent of M. tuberculosis transmission in areas surrounding the study community. As this region experiences an extremely high incidence of disease, it is probable that patients may have been infected by sources outside of the community. (iv) While we have demonstrated the propensity of strains to change, this study also shows that most of the transmitted strains remain identical to their source strains and persist in the community for extended periods. By use of the present methodology, it is not possible to differentiate transmission from reactivation of such strains, possibly leading to an overestimate of transmission. Considering these limitations, we propose that the calculations presented here probably represent a conservative estimate of the true extent of disease due to transmission. Mathematical modeling predicts that ±95% of cases should correspond to transmission, given the high infection pressure in this community (18; P. B. Fourie, J. Lancaster, K. Weyer, and N. Beyers [Medical Research Council of South Africa] unpublished data; E. Vynnycky [London School of Hygiene & Tropical Medicine], personal communication, 2002).
Depending on the parameters chosen, this study estimates that the proportion of the local epidemic due to ongoing transmission is between 66 and 94%. This is considerably higher than the 55% estimated by using genetic identity as a measure of the rate of transmission. From these results, we conclude that the epidemic is predominantly driven by transmission and not, as indicated by conventional calculations, by reactivation of dormant infections. A positive implication of this conclusion is that interventions which target transmission have the potential to dramatically influence the epidemic. In a setting of passive case finding, largely based on positive smear microscopy results, it is hypothesized that the majority of transmission events occur prior to diagnosis and treatment. This is a component of the epidemic which is not targeted by the present World Health Organization directly observed therapy strategy. On the basis of these results, it is envisaged that interventions which interrupt transmission should coincide with a decrease in GD-based superclustering (8). We suggest that GD should prove to be a useful tool in the analysis of longitudinal molecular epidemiological data which will aid in determining the efficacies of M. tuberculosis control strategies with a focus on reducing transmission.
While the present study focused on a community with a high prevalence of tuberculosis, we believe that the approach presented here may also be relevant to communities in which the incidence of tuberculosis is lower. Given the sizeable evolutionary rates reported in studies conducted in such areas (6, 12, 23), a GD-based analysis of the data may well be expected to produce an estimate of recent transmission significantly different from that obtained by conventional calculations. The rapid diversification of M. tuberculosis isolates in the New York City outbreak of strain W provides further weight to this argument (4). We believe that the evidence presented above indicates the need for a similar study to be done in a community with a low incidence of tuberculosis. In addition, we suggest that this technique is not limited to the study of M. tuberculosis but may also prove to be useful in the analysis of epidemiological data from other disease epidemics.
E. Engelke, S. Carlini, and M. De Kock are thanked for technical assistance.
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»