Skip to main content
  • ASM
    • Antimicrobial Agents and Chemotherapy
    • Applied and Environmental Microbiology
    • Clinical Microbiology Reviews
    • Clinical and Vaccine Immunology
    • EcoSal Plus
    • Eukaryotic Cell
    • Infection and Immunity
    • Journal of Bacteriology
    • Journal of Clinical Microbiology
    • Journal of Microbiology & Biology Education
    • Journal of Virology
    • mBio
    • Microbiology and Molecular Biology Reviews
    • Microbiology Resource Announcements
    • Microbiology Spectrum
    • Molecular and Cellular Biology
    • mSphere
    • mSystems
  • Log in
  • My alerts
  • My Cart

Main menu

  • Home
  • Articles
    • Current Issue
    • Accepted Manuscripts
    • COVID-19 Special Collection
    • Archive
    • Minireviews
  • For Authors
    • Submit a Manuscript
    • Scope
    • Editorial Policy
    • Submission, Review, & Publication Processes
    • Organization and Format
    • Errata, Author Corrections, Retractions
    • Illustrations and Tables
    • Nomenclature
    • Abbreviations and Conventions
    • Publication Fees
    • Ethics Resources and Policies
  • About the Journal
    • About JCM
    • Editor in Chief
    • Editorial Board
    • For Reviewers
    • For the Media
    • For Librarians
    • For Advertisers
    • Alerts
    • RSS
    • FAQ
  • Subscribe
    • Members
    • Institutions
  • ASM
    • Antimicrobial Agents and Chemotherapy
    • Applied and Environmental Microbiology
    • Clinical Microbiology Reviews
    • Clinical and Vaccine Immunology
    • EcoSal Plus
    • Eukaryotic Cell
    • Infection and Immunity
    • Journal of Bacteriology
    • Journal of Clinical Microbiology
    • Journal of Microbiology & Biology Education
    • Journal of Virology
    • mBio
    • Microbiology and Molecular Biology Reviews
    • Microbiology Resource Announcements
    • Microbiology Spectrum
    • Molecular and Cellular Biology
    • mSphere
    • mSystems

User menu

  • Log in
  • My alerts
  • My Cart

Search

  • Advanced search
Journal of Clinical Microbiology
publisher-logosite-logo

Advanced Search

  • Home
  • Articles
    • Current Issue
    • Accepted Manuscripts
    • COVID-19 Special Collection
    • Archive
    • Minireviews
  • For Authors
    • Submit a Manuscript
    • Scope
    • Editorial Policy
    • Submission, Review, & Publication Processes
    • Organization and Format
    • Errata, Author Corrections, Retractions
    • Illustrations and Tables
    • Nomenclature
    • Abbreviations and Conventions
    • Publication Fees
    • Ethics Resources and Policies
  • About the Journal
    • About JCM
    • Editor in Chief
    • Editorial Board
    • For Reviewers
    • For the Media
    • For Librarians
    • For Advertisers
    • Alerts
    • RSS
    • FAQ
  • Subscribe
    • Members
    • Institutions
Mycobacteriology and Aerobic Actinomycetes

Phylogenetic Analysis of Mycobacterium tuberculosis Strains in Wales by Use of Core Genome Multilocus Sequence Typing To Analyze Whole-Genome Sequencing Data

R. C. Jones, L. G. Harris, S. Morgan, M. C. Ruddy, M. Perry, R. Williams, T. Humphrey, M. Temple, A. P. Davies
Geoffrey A. Land, Editor
R. C. Jones
aSwansea University Medical School, Institute of Life Science, Swansea University, Swansea, Wales, United Kingdom
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
L. G. Harris
aSwansea University Medical School, Institute of Life Science, Swansea University, Swansea, Wales, United Kingdom
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
S. Morgan
bHealth Protection Division (Mid and West Wales), Public Health Wales, Swansea, Wales, United Kingdom
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
M. C. Ruddy
cWales Centre for Mycobacteriology, Llandough Hospital, Cardiff, Wales, United Kingdom
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
M. Perry
cWales Centre for Mycobacteriology, Llandough Hospital, Cardiff, Wales, United Kingdom
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
R. Williams
cWales Centre for Mycobacteriology, Llandough Hospital, Cardiff, Wales, United Kingdom
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
T. Humphrey
aSwansea University Medical School, Institute of Life Science, Swansea University, Swansea, Wales, United Kingdom
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
M. Temple
bHealth Protection Division (Mid and West Wales), Public Health Wales, Swansea, Wales, United Kingdom
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
A. P. Davies
aSwansea University Medical School, Institute of Life Science, Swansea University, Swansea, Wales, United Kingdom
dPublic Health Wales Microbiology, Swansea, Wales, United Kingdom
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Geoffrey A. Land
Carter BloodCare & Baylor University Medical Center
Roles: Editor
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
DOI: 10.1128/JCM.02025-18
  • Article
  • Figures & Data
  • Info & Metrics
  • PDF
Loading

ABSTRACT

An inability to standardize the bioinformatic data produced by whole-genome sequencing (WGS) has been a barrier to its widespread use in tuberculosis phylogenetics. The aim of this study was to carry out a phylogenetic analysis of tuberculosis in Wales, United Kingdom, using Ridom SeqSphere software for core genome multilocus sequence typing (cgMLST) analysis of whole-genome sequencing data. The phylogenetics of tuberculosis in Wales have not previously been studied. Sixty-six Mycobacterium tuberculosis isolates (including 42 outbreak-associated isolates) from south Wales were sequenced using an Illumina platform. Isolates were assigned to principal genetic groups, single nucleotide polymorphism (SNP) cluster groups, lineages, and sublineages using SNP-calling protocols. WGS data were submitted to the Ridom SeqSphere software for cgMLST analysis and analyzed alongside 179 previously lineage-defined isolates. The data set was dominated by the Euro-American lineage, with the sublineage composition being dominated by T, X, and Haarlem family strains. The cgMLST analysis successfully assigned 58 isolates to major lineages, and the results were consistent with those obtained by traditional SNP mapping methods. In addition, the cgMLST scheme was used to resolve an outbreak of tuberculosis occurring in the region. This study supports the use of a cgMLST method for standardized phylogenetic assignment of tuberculosis isolates and for outbreak resolution and provides the first insight into Welsh tuberculosis phylogenetics, identifying the presence of the Haarlem sublineage commonly associated with virulent traits.

INTRODUCTION

Within the species Mycobacterium tuberculosis, seven major lineages have been recognized globally (1, 2), with these lineages showing different characteristics in terms of evolutionary status, transmissibility, drug resistance, host interaction, latency, and vaccine efficacy (3). The sublineages also show variations in virulence and pathogenicity (4): in particular, lineage 2 (East Asian) and lineage 4 (Euro-American) contain strains, such as the Beijing and Haarlem genotypes, respectively, which are notorious for their association with tuberculosis outbreaks and which are overrepresented among drug-resistant cases (5, 6).

Traditional PCR-based typing methods, such as mycobacterial interspersed repetitive unit–variable-number tandem-repeat (MIRU-VNTR) profiling and spoligotyping, have allowed the classification of isolates into phylogeographically related clades and families and led to the development of readily available databases, such as SpolDB4 (7, 8) and MIRU-VNTRplus (9). Two other typing methods that have been developed with results correlating with internationally recognized spoligotype families are the principal genetic grouping (PGG) and single nucleotide polymorphism (SNP) cluster grouping (SCG) methods. The PGG method classifies isolates into one of three groups based on nonsynonymous variants at the katG and gyrA genes (10). The SCG method classifies isolates into six phylogenetically distinct groups and a further five subgroups based on the nucleotides present at nine specific loci in the H37Rv reference genome (11, 12).

With the advent of whole-genome sequencing (WGS), comparative analysis has led to the use of single nucleotide polymorphism (SNPs) as robust genetic markers for phylogenetic assignment (2, 7). SNPs are reliable and phylogenetically informative markers, since the low sequence variation and lack of horizontal gene transfer in M. tuberculosis make independent recurrent mutations unlikely (7). However, the lack of WGS data standardization has been one of the barriers to the widespread usage of WGS (13, 14). Coll et al. (15) developed a robust SNP barcode method that analyzes 60 loci and that is capable of assigning M. tuberculosis isolates into major lineages and sublineages. The method has a higher level of resolution than the PGG and SCG methods and provides phylogenetic associations that can be correlated with spoligotype families, and the lineages assigned by the method can be compared with those in a globally established database (15). The development of WGS gene-by-gene multilocus sequence typing (MLST) methods and software, such as Ridom SeqSphere software (16), has resulted in a more standardized and user-friendly approach than traditional WGS SNP mapping for resolving and understanding outbreaks (14, 17, 18). Ridom SeqSphere allows isolate sequences to be aligned and compared in a standardized manner using a globally defined core genome MLST (cgMLST) scheme (13, 16, 18). To date, although this method has been used for providing a clinical resolution of tuberculosis outbreaks (13), it has not been used to analyze the phylogenetic composition of a M. tuberculosis isolate data set.

The phylogenetic diversity of strains of M. tuberculosis in Wales has not previously been studied. One aim of this work was to use for the first time the gene-by-gene-based core genome MLST (cgMLST) method, PGG, SCG, and SNP barcoding to phylogenetically analyze 66 Welsh M. tuberculosis isolates, assign them to phylogenetic groups, lineages, and sublineages, and carry out a comparison of the different methods. Identifying the presence of strains such as Haarlem and Beijing family strains, which are associated with outbreaks and resistance, would be of interest to public health and outbreak control organizations in Wales, the United Kingdom, and further afield and give insight into the diversity of tuberculosis within Wales.

cgMLST was also used to study a set of isolates from one particular outbreak of tuberculosis in south Wales in detail. This outbreak came to the attention of Public Health Wales (PHW) in 2006. At that time the outbreak involved 8 cases with cultured isolates and appeared to be circulating among individuals who frequented five local public houses within an area, with one public house having connections to several cases in the outbreak. The index case was the landlord of that public house, and at the time of that diagnosis in 2004, contact tracing of close contacts and the pub’s regular customers was carried out promptly and detected no other cases. The outbreak sparked a review by Public Health Wales of tuberculosis case records in the area. From 2006 to 2011, a further 5 cases with clinical isolates were reported, making a total of 13 reported isolate-confirmed cases in the area since 2004. Two were an estranged husband and wife pair. All the isolates were fully susceptible to all first-line antituberculous chemotherapy.

MATERIALS AND METHODS

Isolates.DNA from 66 M. tuberculosis isolates collected between 2004 and 2011 was obtained from the Wales Centre for Mycobacteriology, Cardiff, United Kingdom. Forty-two of the isolates were from 3 separate tuberculosis outbreaks in the southwest area of Wales according to both MIRU-VNTR typing and epidemiological investigations (isolate prefixes are LL, NPT, TH, and GO), and the remaining 24 were randomly selected endemic (background) isolates (and given the prefix BK). Outbreak isolates prefixed NPT were those from one particular public house-related outbreak of tuberculosis which was studied in detail, as outlined in the introduction.

Epidemiological investigation.Epidemiological information was obtained from face-to-face interviews with a nurse from the original PHW contact tracing investigation team and from documents produced during the outbreak investigation.

Sequencing and assembly.The genomic DNA was sequenced using Nextera XT library preparation kits (version 3; Illumina, San Diego, CA, USA) and a MiSeq benchtop sequencer (Illumina, San Diego, CA, USA), with paired-end reads being quality filtered with the Trimmomatic software tool (version 0.32; Usadellab, Germany) using a sliding-window approach of 5 bases and a quality score of Q20. The resulting contigs/genomes were assembled using the SPAdes genome assembler (version 3.9.0) (19). The k-mers used for SPAdes were 33, 55, 77.99, and 127. The sequence read archive (SRA) sequences for 179 lineage-defined isolates (NCBI) previously published (1) were also assembled using the SPAdes genome assembler.

cgMLST analysis and phylogenetic assignment.Assembled genomes were uploaded onto the Ridom SeqSphere software (version 4.1.9; Ridom; Münster, Germany). Each isolate sequence was aligned to the Ridom SeqSphere M. tuberculosis core genome MLST (cgMLST) scheme of 2,891 core genes (GenBank accession number NC_000962.3), previously defined for alignment and subsequent genomic analysis (14, 18). Successful alignments to the cgMLST were defined as good targets by the Ridom SeqSphere software, and full cgMLST analysis was carried out on isolate sequences that conferred >90% good targets. The cgMLST scheme was also used to compare the sequenced Welsh isolates and 179 isolates whose lineage was previously defined by Comas et al. (1). The 179 isolates selected from Comas et al. (1) were those whose genomes also exceeded the 90% quality threshold under the Ridom SeqSphere parameters. The resulting phylogeny comparison was made using an unweighted pair group method with arithmetic mean (UPGMA) tree produced by Ridom SeqSphere and further annotated and modified using the iTol tool (version 4; https://itol.embl.de) (20). The genome of Mycobacterium canettii, as the ancestral member of the M. tuberculosis complex, was used to root the tree.

WGS SNP barcoding and sublineage genotyping.Isolates were aligned to the H37Rv reference genome using the Burrows-Wheeler alignment (BWA; version 0.7.17) (21). The SAMtools program suite (version 1.3.1) (22) was then used to call SNPs from each of 60 designated loci previously described (15) (with the omission of 2 M. bovis loci). Thus, the isolates based on the SNP pattern (SNP barcode) at the designated loci were split into one of the phylogeographically related groups: lineage 1 (Indo-Oceanic), lineage 2 (East Asian), lineage 3 (East African-Indian), lineage 4 (Euro-American), lineage 5 (West Africa 1), lineage 6 (West Africa 2), or lineage 7 (Horn of Africa) (7, 15).

Each M. tuberculosis lineage determined by SNP mapping was also divided into one of the following sublineages: the Beijing (23), Latin American Mediterranean (LAM) (24), Haarlem (25), or X family (24). SNPs were initially identified through extraction of relevant gene sequences from each isolate using the sequence extraction application within Ridom SeqSphere and detected manually using BioEdit software. Concatenated SNPs were then used to produce a phylogenetic UPGMA tree using iTol software, and isolates were assigned to one of the sublineage genotypes listed above.

PGG and SCG.Gene sequences for gyrA and katG were extracted from the WGS of each isolate using Ridom SeqSphere and analyzed manually using BioEdit to identify the presence of principal genetic grouping (PGG)-defining amino acids at codons 95 and 493, the PGG informative sites within the genes gyrA and katG (10). Based on the composition of amino acids at these loci, each isolate was assigned to a PGG (26). For SNP cluster grouping (SCG) analysis, sequences were aligned to the H37Rv reference genome using BWA. SAMtools was then used to call SNPs from the previously defined nine specific loci (12), and each isolate then assigned to an SNP cluster group. Phylogenetic analysis was carried out only on isolates with each of the nine loci present (31 isolates).

RESULTS

N50 and the number of contigs for each assembled genome are shown in Table S1 in the supplemental material.

cgMLST association.Fifty-eight of the 66 isolates had a sequence quality sufficient for cgMLST analysis and were incorporated into a phylogeny that also included the 179 lineage-defined isolates (1). The resulting tree shows that the Welsh and lineage-defined isolates clustered into lineages 1 (n = 1), 2 (n = 3), and 4 (n = 53) (Fig. 1). Lineages 3, 5, 6 and 7 are not shown, as none of the Welsh isolates were assigned to them. All but one outbreak-associated isolate (isolate LL9) clustered with the lineage 4 isolates, while the endemic isolates showed more lineage diversity.

FIG 1
  • Open in new tab
  • Download powerpoint
FIG 1

An unweighted pair group method with arithmetic mean (UPGMA) tree based on the cgMLST association between 58 Welsh isolates. The UPGMA tree shows the phylogeny of the 58 Welsh isolates and the lineage-defined isolates (1) which had a sequence quality sufficient for cgMLST analysis. The M. canettii genome was used to root the tree. Lineages 3, 5, 6, and 7 are not shown, as none of the Welsh isolates were assigned to them. Green, lineage 1; yellow, lineage 2; pink, lineage 4; red, Welsh isolates; gray, M. canettii.

Phylogenetic composition using SNP barcoding and sublineage genotyping.SNP barcoding was carried out on the 59 Welsh isolates that had >90% sequence data, as required for the 60-locus SNP barcode analysis. The results were consistent with those from the cgMLST association. Lineage 4 (Euro-American) dominated the data set with 55 isolates (Fig. 2), and all but 1 outbreak-associated isolate clustered with this lineage. Fourteen of the 55 lineage 4 isolates were of the Haarlem sublineage, and of the 18 T family isolates, 13 showed a clonal pattern across the 60 SNPs, with 10 of these being from the same recognized outbreak. Twelve of the 16 X family isolates could be split into three clonally related clusters correlating to those seen in Fig. 2, and 3 lineage 2 Beijing strains were identified. The T family sublineage dominated the outbreak isolates (39%), followed by the Haarlem sublineage (33%) and the X family (27%). Table 1 shows a direct comparison between the cgMLST and SNP results, indicating a correlation at the lineage level for each Welsh isolate.

FIG 2
  • Open in new tab
  • Download powerpoint
FIG 2

Phylogenetic analysis of 59 Welsh M. tuberculosis isolates. The figure shows the 59 isolates that had >90% sequence data (as required for the 60-locus SNP barcode analysis) assigning the isolates to lineages and sublineages. Unweighted pair group method with arithmetic mean (UPGMA) tree showing SNP barcoding results. The scale bar indicates the genetic divergence relevant to branch length and is based on units of the number of nucleotide differences per site across 60 loci.

View this table:
  • View inline
  • View popup
TABLE 1

Lineaged, by cgMLST and SNP analysis, of 58 sequenced isolates that had sequence quality sufficient for cgMLST analysis, showing the correlation of both methods at the lineage level for each Welsh isolate

PGG and SCG analysis.Of the 66 isolates sequenced, 57 could be assigned to a PGG based on sequence data, as shown in Fig. 3A. Four isolates clustered within PGG1, 31 clustered within PGG2, and 22 clustered within PGG3, along with the H37Rv genome. Compared to the sublineage data, the Haarlem and X family and LAM sublineage isolates grouped with PGG2 and the T family and H37Rv-like isolates grouped with PGG3. All lineage 1 and 2 isolates were associated with PGG1. Fifty-six of the original 66 isolates could confidently be assigned to an SCG based on the sequence data provided. The SCG results identified two predominant SCGs, SCG-6a and SCG-3b, with 16 and 15 isolates clustering to these subgroups, respectively (Fig. 3B). Other subgroups present were SCG-4 (8 isolates), SCG-3c (7 isolates), SCG-6b (4 isolates), SCG-5 (3 isolates), SCG-2 (2 isolates), and SCG-1 (1 isolate). Nine isolates were excluded, as they did not yield sequence data for all nine loci, and SCG-3a was not represented in the data set. The SCG phylogeny split into two clear clades, with clade 2 being more diverse than clade 1. When PGG results were compared with SCG results, it was found that clade 1 contained all the PGG3 isolates and clade 2 contained all PGG1 and PGG2 isolates (Fig. 3B). The PGG2 isolates also divided into four different SCG groups. Within clade 2, isolates of SCG-3c and SCG-4 shared a closer relationship with each other than they did with isolates of SCG-3b and SCG-5, and vice versa.

FIG 3
  • Open in new tab
  • Download powerpoint
FIG 3

Neighbor-joining phylogeny showing the principal genetic grouping and single nucleotide polymorphism cluster grouping profiles of 57 and 56 Welsh isolates, respectively, with the reference genome H37Rv also being assigned. (A) PGG results. Red, PGG1; green, PGG2; blue, PGG3. Letters refer to the amino acids present at each locus: T, threonine; R, arginine; L, leucine; and S, serine. The scale bar highlights the genetic divergence relevant to the branch length and is based on units of the number of amino acid differences per site across the gyrA and katG loci. (B) SCG results, where the phylogeny harbors two clades, clade 1 and clade 2. The PGG assigned to each isolate is shown in the right column, and X denotes isolates that could not be assigned a PGG group.

NPT outbreak isolate analysis.All the NPT-designated outbreak isolates clustered as Euro-American T family isolates, except for NPTB6 (Fig. 2). In addition, a further 3 three background isolates (BK1, BK2, and BK3) also clustered clonally as T family isolates and were included in further downstream analysis (Fig. 2). NPTB6 did not cluster within the same T family sublineage but clustered with 6 X family sublineage isolates. This was evidence that NPTB6 had been wrongly included within this outbreak cluster and was unrelated. For further outbreak analysis, the 3 additional T family background cases were included with the NPT isolates when analyzed by cgMLST.

cgMLST analysis revealed that there were in fact 8 distinct isolates within the T family group, including the existence of 2 clusters (Fig. 4). The clusters defined by cgMLST consisted of one containing 9 isolates (outbreak 1) and one containing 2 isolates (outbreak 2, consisting of the isolates from the estranged husband and wife). In outbreak 1 there were 8 NPT isolates and 1 background isolate, previously thought of as an unrelated case. NPTA3 showed 16 allelic differences from its closest relative (NPTA7) and, thus, according to the definition of no more than 12 allelic differences (13, 14), could not be directly linked to either outbreak. Five other isolates showed no evidence of being directly linked with any other isolate within the data set: these included three NPT isolates (NPTB2, NPTB5, and NPTB6) and two background ones (BK1 and BK3). The data indicated that NPTA7 was the source case. This case, diagnosed with pulmonary tuberculosis in 2007, was known to a number of the other cases as a regular at the public house, although he denied this. The cgMLST results supported the epidemiological evidence that he was associated with the public house.

FIG 4
  • Open in new tab
  • Download powerpoint
FIG 4

A minimum-spanning tree of 17 cases constructed using Ridom SeqSphere software. Isolates sharing less than 12 allelic differences are classified as direct transmission events and are thus part of a clonal outbreak and are grouped accordingly into outbreak 1 and outbreak 2.

DISCUSSION

This study has provided the first insight into the phylogenetic diversity of M. tuberculosis isolates from Wales using cgMLST. In addition, it is one of the first independent confirmatory studies of the cgMLST scheme of Kohl et al. (13). Gene-by-gene MLST methods have previously been shown to be useful in clinical outbreak resolution and epidemiological investigations of human pathogens, such as methicillin-resistant Staphylococcus aureus and Campylobacter, as well as M. tuberculosis itself (17, 18). Specifically, the Ridom SeqSphere gene-by-gene cgMLST scheme has previously been used to look at tuberculosis outbreaks (13, 18) and consists of a portable, standardized database platform for use with WGS data in tuberculosis research. However, the method has not previously been used for the classification of M. tuberculosis isolates into well-defined phylogenetic lineages. This study provided for the first time a snapshot of tuberculosis phylogenetics across a geographical area based on cgMLST in comparison with the phylogenetics based on SNP calling methods. In this study, the resulting cgMLST phylogenetic tree contained all seven major M. tuberculosis sublineages and broadly matched that seen using SNP mapping-based methods (1, 27). Of the 66 isolates for which WGS was performed, 58 were successfully analyzed by cgMLST in conjunction with 179 lineage-defined isolates (1), with lineage 4, the Euro-American lineage, dominating the collection. Lineage 1 and 2 isolates were also identified, but in much lower numbers. Consistent with the findings of Comas et al. (1), lineage 2 and 3 isolates shared a closer relationship with each other than with lineage 4 isolates. Hence, despite the use of a different set of genomic data, the evolutionary positions of each lineage according to cgMLST were consistent with those found in other studies that used in-house SNP mapping pipelines for the construction of their phylogenies (1, 27, 28).

According to the SNP barcoding and subgenotyping methods, the results of which correlated with the cgMLST results, the data set contained a diverse collection of Euro-American sublineages, which were not dominated by a single sublineage, as isolates of the T family, X family, and Haarlem family made up a large proportion of the lineage 4 data set, with the Haarlem isolates being particularly prevalent in the outbreak-assigned cases. The proportion of Euro-American lineage isolates here is similar to Public Health England data for TB cases in indigenous people across the whole of the United Kingdom and Ireland (29). This study also identified 2% of the isolates to be lineage 1 and 6% to be lineage 2, again correlating with the data for the indigenous population of the United Kingdom (29) and Ireland (30, 31). The discovery of numerous Haarlem sublineage strains and some Beijing strains was an interesting finding.

The PGG results correlated well with the lineage groupings, as 31 of the Welsh isolates were PGG2 or PGG3, which have previously been associated with the Euro-American lineage, while PGG1 is associated with lineages 1, 2, and 3 (7). The SCG results revealed a predominance of SCG-3 and SCG-6 isolates, with SCG-3b and SCG-6a isolates being the most prominent. Unlike for PGG analysis, the SCG analysis highlighted a large degree of divergence within the Euro-American lineage, consistent with the diversity seen in the SNP barcode result. Such an association was expected, as SCGs have previously been shown to assign themselves with the SNP barcoding and sublineage groupings (7, 11).

Phylogenetic analysis confirmed that all the apparent NPT outbreak isolates except NPTB6 were clustered within the same sublineage, the Euro-American T family. In addition, the SNP barcode method identified three further apparently unrelated local isolates that clustered within this phylogeny, indicating that phylogenetic characterization may be useful in tuberculosis outbreak investigation.

Through the use of cgMLST, the relationship between the NPT outbreak isolates was resolved, and two clusters/outbreaks were confirmed. The cgMLST analysis also confirmed that the cases in outbreak 1 were directly linked to the public house, as assumed by the initial contact tracing team. However, a number of cases, including the estranged husband and wife pair, were unrelated, serving as a reminder that TB remains endemic in Wales and that cases occurring within a small area are not necessarily related. Such results could be used as a basis to support targeted outbreak control interventions around the public house and the identification of NPTA7 (who denied frequenting the public house, contradicting the evidence provided by other cases) as the source case.

SNP barcoding provides a very high level of resolution, is more established in terms of providing sublineage assignments, and provides a correlation with spoligotyping. However, it requires bioinformatic expertise and is difficult to standardize, as it is not linked to a global database. In addition, the SNP barcode used here is based solely on a set of markers (15) and so cannot provide an understanding of individual relationships within an outbreak, restricting its use to phylogenetics.

In comparison, cgMLST is a relatively new method. However, it has the advantage of being a simpler, standardized method for analyzing large amounts of genomic data which are easily uploaded to a global database for analysis using the user-friendly Ridom SeqSphere software, which could facilitate the use of genomics for tuberculosis surveillance. The results of cgMLST analysis were consistent with those obtained by traditional SNP mapping methods. Although cgMLST is yet to be developed to a level whereby isolates can be confidently assigned to a phylogenetic sublineage, this study provides evidence that, at least at the lineage level, the phylogenetic associations made using cgMLST correlate with those made using SNP barcoding. This work supports the use of cgMLST for standardized phylogenetic assignment of M. tuberculosis isolates, in addition to its use for delineating clinical outbreaks (13, 18).

ACKNOWLEDGMENTS

This work was funded by the St. David’s Medical Foundation and Coleg Cenedlaethol Cymraeg funding.

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

FOOTNOTES

    • Received 18 December 2018.
    • Returned for modification 30 January 2019.
    • Accepted 27 March 2019.
    • Accepted manuscript posted online 3 April 2019.
  • Supplemental material for this article may be found at https://doi.org/10.1128/JCM.02025-18.

  • Copyright © 2019 American Society for Microbiology.

All Rights Reserved.

REFERENCES

  1. 1.↵
    1. Comas I,
    2. Coscolla M,
    3. Luo T,
    4. Borrell S,
    5. Holt KE,
    6. Kato-Maeda M,
    7. Parkhill J,
    8. Malla B,
    9. Berg S,
    10. Thwaites G,
    11. Yeboah-Manu D,
    12. Bothamley G,
    13. Mei J,
    14. Wei L,
    15. Bentley S,
    16. Harris SR,
    17. Niemann S,
    18. Diel R,
    19. Aseffa A,
    20. Gao Q,
    21. Young D,
    22. Gagneux S
    . 2013. Out-of-Africa migration and Neolithic co expansion of Mycobacterium tuberculosis with modern humans. Nat Genet 45:1176–1182. doi:10.1038/ng.2744.
    OpenUrlCrossRefPubMed
  2. 2.↵
    1. Gagneux S,
    2. Deriemer K,
    3. Van T,
    4. Kato-Maeda M,
    5. De Jong BC,
    6. Narayanan S,
    7. Nicol M,
    8. Niemann S,
    9. Kremer K,
    10. Gutierrez MC,
    11. Hilty M,
    12. Hopewell PC,
    13. Small PM
    . 2006. Variable host-pathogen compatibility in Mycobacterium tuberculosis. Proc Natl Acad Sci U S A 103:2869–2873. doi:10.1073/pnas.0511240103.
    OpenUrlAbstract/FREE Full Text
  3. 3.↵
    1. Thwaites G,
    2. Caws M,
    3. Chau TTH,
    4. D'Sa A,
    5. Lan NTN,
    6. Huyen MNT,
    7. Gagneux S,
    8. Anh PTH,
    9. Tho DQ,
    10. Torok E,
    11. Nhu NTQ,
    12. Duyen NTH,
    13. Duy PM,
    14. Richenberg J,
    15. Simmons C,
    16. Hien TT,
    17. Farrar J
    . 2008. Relationship between Mycobacterium tuberculosis genotype and the clinical phenotype of pulmonary and meningeal tuberculosis. J Clin Microbiol 46:1363–1368. doi:10.1128/JCM.02180-07.
    OpenUrlAbstract/FREE Full Text
  4. 4.↵
    1. Anderson J,
    2. Jarlsberg LG,
    3. Grindsdale J,
    4. Osmond D,
    5. Kawamura M,
    6. Hopewell PC,
    7. Kato-Maeda M
    . 2013. Sublineages of lineage 4 (Euro-American) Mycobacterium tuberculosis differ in genotypic clustering. Int J Tuberc Lung Dis 17:885–891. doi:10.5588/ijtld.12.0960.
    OpenUrlCrossRef
  5. 5.↵
    1. Marais BJ,
    2. Victor TC,
    3. Hesseling AC,
    4. Barnard M,
    5. Jordaan A,
    6. Brittle W,
    7. Reuter H,
    8. Beyers N,
    9. van Helden PD,
    10. Warren RM,
    11. Schaaf HS
    . 2006. Beijing and Haarlem genotypes are overrepresented among children with drug-resistant tuberculosis in the Western Cape Province of South Africa. J Clin Microbiol 44:3539–3543. doi:10.1128/JCM.01291-06.
    OpenUrlAbstract/FREE Full Text
  6. 6.↵
    1. Bifani PJ,
    2. Plikaytis BB,
    3. Kapur V,
    4. Stockbauer K,
    5. Pan X,
    6. Lutfey ML,
    7. Moghazeh SL,
    8. Eisner W,
    9. Daniel TM,
    10. Kaplan MH,
    11. Crawford JT,
    12. Musser JM,
    13. Kreiswirth BN
    . 1996. Origin and interstate spread of a New York City multidrug-resistant Mycobacterium tuberculosis clone family. JAMA 275:452–457. doi:10.1001/jama.1996.03530300036037.
    OpenUrlCrossRefPubMedWeb of Science
  7. 7.↵
    1. Gagneux S,
    2. Small PM
    . 2007. Global phylogeography of Mycobacterium tuberculosis and implications for tuberculosis product development. Lancet Infect Dis 7:328–337. doi:10.1016/S1473-3099(07)70108-1.
    OpenUrlCrossRefPubMedWeb of Science
  8. 8.↵
    1. Brudey K,
    2. Driscoll JR,
    3. Rigouts L,
    4. Prodinger WM,
    5. Gori A,
    6. Al-Hajoj SA,
    7. Allix C,
    8. Aristimuño L,
    9. Arora J,
    10. Baumanis V,
    11. Binder L,
    12. Cafrune P,
    13. Cataldi A,
    14. Cheong S,
    15. Diel R,
    16. Ellermeier C,
    17. Evans JT,
    18. Fauville-Dufaux M,
    19. Ferdinand S,
    20. de Viedma D,
    21. Garzelli C,
    22. Gazzola L,
    23. Gomes HM,
    24. Guttierez MC,
    25. Hawkey PM,
    26. van Helden PD,
    27. Kadival GV,
    28. Kreiswirth BN,
    29. Kremer K,
    30. Kubin M,
    31. Kulkarni SP,
    32. Liens B,
    33. Lillebaek T,
    34. Ly H,
    35. Martin C,
    36. Martin C,
    37. Mokrousov I,
    38. Narvskaïa O,
    39. Ngeow Y,
    40. Naumann L,
    41. Niemann S,
    42. Parwati I,
    43. Rahim Z,
    44. Rasolofo-Razanamparany V,
    45. Rasolonavalona T,
    46. Rossetti ML,
    47. Rüsch-Gerdes S,
    48. Sajduda A,
    49. Samper S,
    50. Shemyakin IG
    , et al. 2006. Mycobacterium tuberculosis complex genetic diversity: mining the Fourth International Spoligotyping Database (SpolDB4) for classification, population genetics and epidemiology. BMC Microbiol 6:23. doi:10.1186/1471-2180-6-23.
    OpenUrlCrossRefPubMed
  9. 9.↵
    1. Weniger T,
    2. Krawczyk J,
    3. Supply P,
    4. Niemann S,
    5. Harmsen D
    . 2010. MIRU-VNTRplus: a web tool for polyphasic genotyping of Mycobacterium tuberculosis complex bacteria. Nucleic Acids Res 38:W326–W331. doi:10.1093/nar/gkq351.
    OpenUrlCrossRefPubMedWeb of Science
  10. 10.↵
    1. Sreevatsan S,
    2. Pan X,
    3. Stockbauer KE,
    4. Connell ND,
    5. Kreiswirth BN,
    6. Whittam TS,
    7. Musser JM
    . 1997. Restricted structural gene polymorphism in the Mycobacterium tuberculosis complex indicates evolutionarily recent global dissemination. Proc Natl Acad Sci U S A 94:9869–9874. doi:10.1073/pnas.94.18.9869.
    OpenUrlAbstract/FREE Full Text
  11. 11.↵
    1. Filliol I,
    2. Motiwala AS,
    3. Cavatore M,
    4. Qi W,
    5. Hazbon MH,
    6. Bobadilla Del Valle M,
    7. Fyfe J,
    8. García-García L,
    9. Rastogi N,
    10. Sola C,
    11. Zozio T,
    12. Guerrero MI,
    13. León CI,
    14. Crabtree J,
    15. Angiuoli S,
    16. Eisenach KD,
    17. Durmaz R,
    18. Joloba ML,
    19. Rendón A,
    20. Sifuentes-Osornio J,
    21. Ponce de León A,
    22. Cave MD,
    23. Fleischmann R,
    24. Whittam TS,
    25. Alland D
    . 2006. Global phylogeny of Mycobacterium tuberculosis based on single nucleotide polymorphism (SNP) analysis: insights into tuberculosis evolution, phylogenetic accuracy of other DNA fingerprinting systems, and recommendations for a minimal standard SNP set. J Bacteriol 188:759–772. doi:10.1128/JB.188.2.759-772.2006.
    OpenUrlAbstract/FREE Full Text
  12. 12.↵
    1. Alland D,
    2. Lacher DW,
    3. Hazbon MH,
    4. Motiwala AS,
    5. Qi W,
    6. Fleischmann RD,
    7. Whittam TS
    . 2007. Role of large sequence polymorphisms (LSPs) in generating genomic diversity among clinical isolates of Mycobacterium tuberculosis and the utility of LSPs in phylogenetic analysis. J Clin Microbiol 45:39–46. doi:10.1128/JCM.02483-05.
    OpenUrlAbstract/FREE Full Text
  13. 13.↵
    1. Kohl TA,
    2. Diel R,
    3. Harmsen D,
    4. Rothganger J,
    5. Walter KM,
    6. Merker M,
    7. Weniger T,
    8. Niemann S
    . 2014. Whole-genome-based Mycobacterium tuberculosis surveillance: a standardized, portable, and expandable approach. J Clin Microbiol 52:2479–2486. doi:10.1128/JCM.00567-14.
    OpenUrlAbstract/FREE Full Text
  14. 14.↵
    1. Walker TM,
    2. Ip CLC,
    3. Harrell RH,
    4. Evans JT,
    5. Kapatai G,
    6. Dedicoat MJ,
    7. Eyre DW,
    8. Wilson DJ,
    9. Hawkey PM,
    10. Crook DW,
    11. Parkhill J,
    12. Harris D,
    13. Walker AS,
    14. Bowden R,
    15. Monk P,
    16. Smith EG,
    17. Peto TE
    . 2013. Whole-genome sequencing to delineate Mycobacterium tuberculosis outbreaks: a retrospective observational study. Lancet Infect Dis 13:137–146. doi:10.1016/S1473-3099(12)70277-3.
    OpenUrlCrossRefPubMedWeb of Science
  15. 15.↵
    1. Coll F,
    2. McNerney R,
    3. Guerra-Assuncao JA,
    4. Glynn JR,
    5. Perdigao J,
    6. Viveiros M,
    7. Portugal I,
    8. Pain A,
    9. Martin N,
    10. Clark TG
    . 2014. A robust SNP barcode for typing Mycobacterium tuberculosis complex strains. Nat Commun 5:4812. doi:10.1038/ncomms5812.
    OpenUrlCrossRefPubMed
  16. 16.↵
    1. Junemann S,
    2. Sedlazeck FJ,
    3. Prior K,
    4. Albersmeier A,
    5. John U,
    6. Kalinowski J,
    7. Mellmann A,
    8. Goesmann A,
    9. von Haeseler A,
    10. Stoye J,
    11. Harmsen D
    . 2013. Updating benchtop sequencing performance comparison. Nat Biotechnol 31:294–296. doi:10.1038/nbt.2522.
    OpenUrlCrossRefPubMed
  17. 17.↵
    1. Maiden MC,
    2. Van Rensburg MJJ,
    3. Bray JE,
    4. Earle SG,
    5. Ford SA,
    6. Jolley KA,
    7. McCarthy ND
    . 2013. MLST revisited: the gene-by-gene approach to bacterial genomics. Nat Rev Microbiol 11:728–736. doi:10.1038/nrmicro3093.
    OpenUrlCrossRefPubMed
  18. 18.↵
    1. Kohl TA,
    2. Harmsen D,
    3. Rothganger J,
    4. Walker T,
    5. Diel R,
    6. Niemann S
    . 2018. Harmonised genome wide typing of tubercle bacilli using a web-based gene-by-gene nomenclature system. EBioMedicine 34:131–138. doi:10.1016/j.ebiom.2018.07.030.
    OpenUrlCrossRef
  19. 19.↵
    1. Bankevich A,
    2. Nurk S,
    3. Antipov D,
    4. Gurevich AA,
    5. Dvorkin M,
    6. Kulikov AS,
    7. Lesin VM,
    8. Nikolenko SI,
    9. Pham S,
    10. Prjibelski AD,
    11. Pyshkin AV,
    12. Sirotkin AV,
    13. Vyahhi N,
    14. Tesler G,
    15. Alekseyev MA,
    16. Pevzner PA
    . 2012. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 19:455–477. doi:10.1089/cmb.2012.0021.
    OpenUrlCrossRefPubMed
  20. 20.↵
    1. Letunic I,
    2. Bork P
    . 2016. Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees. Nucleic Acids Res 44:W242–W245. doi:10.1093/nar/gkw290.
    OpenUrlCrossRefPubMed
  21. 21.↵
    1. Li H,
    2. Durbin R
    . 2009. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754–1760. doi:10.1093/bioinformatics/btp324.
    OpenUrlCrossRefPubMedWeb of Science
  22. 22.↵
    1. Li H,
    2. Handsaker B,
    3. Wysoker A,
    4. Fennell T,
    5. Ruan J,
    6. Homer N,
    7. Marth G,
    8. Abecasis G,
    9. Durbin R
    , 1000 Genome Project Data Processing Subgroup. 2009. The sequence alignment/map (SAM) format and SAMtools. Bioinformatics 25:2078–2079. doi:10.1093/bioinformatics/btp352.
    OpenUrlCrossRefPubMedWeb of Science
  23. 23.↵
    1. Mestre O,
    2. Luo T,
    3. Dos Vultos T,
    4. Kremer K,
    5. Murray A,
    6. Namouchi A,
    7. Jackson C,
    8. Rauzier J,
    9. Bifani P,
    10. Warren R,
    11. Rasolofo V,
    12. Mei J,
    13. Gao Q,
    14. Gicquel B
    . 2011. Phylogeny of Mycobacterium tuberculosis Beijing strains constructed from polymorphisms in genes involved in DNA replication, recombination and repair. PLoS One 6:e16020. doi:10.1371/journal.pone.0016020.
    OpenUrlCrossRefPubMed
  24. 24.↵
    1. Comas I,
    2. Homolka S,
    3. Niemann S,
    4. Gagneux S
    . 2009. Genotyping of genetically monomorphic bacteria: DNA sequencing in Mycobacterium tuberculosis highlights the limitations of current methodologies. PLoS One 4:e7815. doi:10.1371/journal.pone.0007815.
    OpenUrlCrossRefPubMed
  25. 25.↵
    1. Cubillos-Ruiz A,
    2. Sandoval A,
    3. Ritacco V,
    4. Lopez B,
    5. Robledo J,
    6. Correa N,
    7. Hernandez-Neuta I,
    8. Zambrano MM,
    9. Del Portillo P
    . 2010. Genomic signatures of the Haarlem lineage of Mycobacterium tuberculosis: implications of strain genetic variation in drug and vaccine development. J Clin Microbiol 48:3614–3623. doi:10.1128/JCM.00157-10.
    OpenUrlAbstract/FREE Full Text
  26. 26.↵
    1. Grimes CZ,
    2. Teeter LD,
    3. Hwang L-Y,
    4. Graviss EA
    . 2009. Epidemiologic characterization of culture positive Mycobacterium tuberculosis patients by katG-gyrA principal genetic grouping. J Mol Diagn 11:472–481. doi:10.2353/jmoldx.2009.080171.
    OpenUrlCrossRefPubMed
  27. 27.↵
    1. Gagneux S
    . 2012. Host–pathogen coevolution in human tuberculosis. Philos Trans R Soc Lond B Biol Sci 367:850–859. doi:10.1098/rstb.2011.0316.
    OpenUrlCrossRefPubMed
  28. 28.↵
    1. Firdessa R,
    2. Berg S,
    3. Hailu E,
    4. Schelling E,
    5. Gumi B,
    6. Erenso G,
    7. Gadisa E,
    8. Kiros T,
    9. Habtamu M,
    10. Hussein J,
    11. Zinsstag J,
    12. Robertson BD,
    13. Ameni G,
    14. Lohan AJ,
    15. Loftus B,
    16. Comas I,
    17. Gagneux S,
    18. Tschopp R,
    19. Yamuah L,
    20. Hewinson G,
    21. Gordon SV,
    22. Young DB,
    23. Aseffa A
    . 2013. Mycobacterial lineages causing pulmonary and extrapulmonary tuberculosis, Ethiopia. Emerg Infect Dis 19:460–463. doi:10.3201/eid1903.120256.
    OpenUrlCrossRefPubMed
  29. 29.↵
    Public Health England. 2014. Tuberculosis in the UK 2014 report. Public Health England, London, United Kingdom.
  30. 30.↵
    1. Fitzgibbon M,
    2. Gibbons N,
    3. Roycroft E,
    4. Jackson S,
    5. O’Donnell J,
    6. O’Flanagan D,
    7. Rogers TR
    . 2013. A snapshot of genetic lineages of Mycobacterium tuberculosis in Ireland over a two-year period, 2010 and 2011. Euro Surveill 8(3):pii=20367. https://www.eurosurveillance.org/content/10.2807/ese.18.03.20367-en.
    OpenUrl
  31. 31.↵
    1. Ojo OO,
    2. Sheehan S,
    3. Corcoran DG,
    4. Nikolayevsky V,
    5. Brown T,
    6. O'Sullivan M,
    7. O'Sullivan K,
    8. Gordon SV,
    9. Drobniewski F,
    10. Prentice MB
    . 2010. Molecular epidemiology of Mycobacterium tuberculosis clinical isolates in Southwest Ireland. Infect Genet Evol 10:1110–1116. doi:10.1016/j.meegid.2010.07.008.
    OpenUrlCrossRefPubMedWeb of Science
PreviousNext
Back to top
Download PDF
Citation Tools
Phylogenetic Analysis of Mycobacterium tuberculosis Strains in Wales by Use of Core Genome Multilocus Sequence Typing To Analyze Whole-Genome Sequencing Data
R. C. Jones, L. G. Harris, S. Morgan, M. C. Ruddy, M. Perry, R. Williams, T. Humphrey, M. Temple, A. P. Davies
Journal of Clinical Microbiology May 2019, 57 (6) e02025-18; DOI: 10.1128/JCM.02025-18

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Print

Alerts
Sign In to Email Alerts with your Email Address
Email

Thank you for sharing this Journal of Clinical Microbiology article.

NOTE: We request your email address only to inform the recipient that it was you who recommended this article, and that it is not junk mail. We do not retain these email addresses.

Enter multiple addresses on separate lines or separate them with commas.
Phylogenetic Analysis of Mycobacterium tuberculosis Strains in Wales by Use of Core Genome Multilocus Sequence Typing To Analyze Whole-Genome Sequencing Data
(Your Name) has forwarded a page to you from Journal of Clinical Microbiology
(Your Name) thought you would be interested in this article in Journal of Clinical Microbiology.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Phylogenetic Analysis of Mycobacterium tuberculosis Strains in Wales by Use of Core Genome Multilocus Sequence Typing To Analyze Whole-Genome Sequencing Data
R. C. Jones, L. G. Harris, S. Morgan, M. C. Ruddy, M. Perry, R. Williams, T. Humphrey, M. Temple, A. P. Davies
Journal of Clinical Microbiology May 2019, 57 (6) e02025-18; DOI: 10.1128/JCM.02025-18
del.icio.us logo Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
  • Top
  • Article
    • ABSTRACT
    • INTRODUCTION
    • MATERIALS AND METHODS
    • RESULTS
    • DISCUSSION
    • ACKNOWLEDGMENTS
    • FOOTNOTES
    • REFERENCES
  • Figures & Data
  • Info & Metrics
  • PDF

KEYWORDS

Mycobacterium tuberculosis
outbreak
phylogenetics
tuberculosis
whole-genome sequencing

Related Articles

Cited By...

About

  • About JCM
  • Editor in Chief
  • Board of Editors
  • Editor Conflicts of Interest
  • For Reviewers
  • For the Media
  • For Librarians
  • For Advertisers
  • Alerts
  • RSS
  • FAQ
  • Permissions
  • Journal Announcements

Authors

  • ASM Author Center
  • Submit a Manuscript
  • Article Types
  • Resources for Clinical Microbiologists
  • Ethics
  • Contact Us

Follow #JClinMicro

@ASMicrobiology

       

ASM Journals

ASM journals are the most prominent publications in the field, delivering up-to-date and authoritative coverage of both basic and clinical microbiology.

About ASM | Contact Us | Press Room

 

ASM is a member of

Scientific Society Publisher Alliance

 

American Society for Microbiology
1752 N St. NW
Washington, DC 20036
Phone: (202) 737-3600

 

Copyright © 2021 American Society for Microbiology | Privacy Policy | Website feedback

Print ISSN: 0095-1137; Online ISSN: 1098-660X