Genomics Reveals the Worldwide Distribution of Multidrug-Resistant Serotype 6E Pneumococci

The pneumococcus is a leading pathogen infecting children and adults. Safe, effective vaccines exist, and they work by inducing antibodies to the polysaccharide capsule (unique for each serotype) that surrounds the cell; however, current vaccines are limited by the fact that only a few of the nearly 100 antigenically distinct serotypes are included in the formulations. Within the serotypes, serogroup 6 pneumococci are a frequent cause of serious disease and common colonizers of the nasopharynx in children. Serotype 6E was first reported in 2004 but was thought to be rare; however, we and others have detected serotype 6E among recent pneumococcal collections. Therefore, we analyzed a diverse data set of ∼1,000 serogroup 6 genomes, assessed the prevalence and distribution of serotype 6E, analyzed the genetic diversity among serogroup 6 pneumococci, and investigated whether pneumococcal conjugate vaccine-induced serotype 6A and 6B antibodies mediate the killing of serotype 6E pneumococci. We found that 43% of all genomes were of serotype 6E, and they were recovered worldwide from healthy children and patients of all ages with pneumococcal disease. Four genetic lineages, three of which were multidrug resistant, described ∼90% of the serotype 6E pneumococci. Serological assays demonstrated that vaccine-induced serotype 6B antibodies were able to elicit killing of serotype 6E pneumococci. We also revealed three major genetic clusters of serotype 6A capsular sequences, discovered a new hybrid 6C/6E serotype, and identified 44 examples of serotype switching. Therefore, while vaccines appear to offer protection against serotype 6E, genetic variants may reduce vaccine efficacy in the longer term because of the emergence of serotypes that can evade vaccine-induced immunity.

T he pneumococcus (Streptococcus pneumoniae) is one of the most important pathogens worldwide. An estimated 1.3 million children die every year from pneumonia, and the pneumococcus is the leading cause (1,2). It is also a leading cause of death due to bacteremia and meningitis among young children and is a major cause of disease among adults, particularly the elderly, among whom there is also a high risk of death (3,4). Pneumococcal conjugate vaccines (PCVs) are administered to children in many developed and resource-poor countries and have been an enormous public health success, significantly reducing morbidity and mortality in the countries that have implemented widespread vaccination (5,6).
However, nearly 100 different serotypes have been characterized and new ones continue to be discovered (10)(11)(12)(13). Current PCV formulations have limited serotype coverage, and their use has been associated with a significantly altered serotype distribution. Disease due to vaccine serotype pneumococci decreases, but an increase in the proportion of disease caused by nonvaccine serotype pneumococci has been observed, although there is heterogeneity in this serotype replacement disease phenomenon that is not well understood (14,15). Furthermore, the prevalence of commensal (carriage) pneumococci in the nasopharynx, its ecological niche, generally remains the same after PCV but reorders in favor of nonvaccine types (16). Vaccine escape is also possible, and new genetic variants can spread rapidly (17)(18)(19). Consequently, protection from pneumococcal disease remains a challenge.
Apart from two known exceptions (serotypes 3 and 37), the polysaccharide capsule is synthesized by the Wzx/Wzy-dependent pathway and the associated genes are located in the capsular polysaccharide synthesis (cps) locus. The majority of the genes in the cps locus are present in all cps loci, and there are three genes in particular, wciP, wzy, and wzx, that have serotype-specific alleles (20,21). Horizontal genetic exchange of all or part of the cps locus sequence, between related and unrelated pneumococcal lineages, has been well documented (17,18,(22)(23)(24)(25).
Serogroup 6 is a particularly important serogroup, as it is one of the most common serotypes found in the nasopharynxes of unvaccinated young children and is a major cause of serious pneumococcal disease among all age groups (5,26). Serotypes 6A and 6B have been recognized for many decades, but more recently, serotypes 6C and 6D, which are genetically similar to serotypes 6A and 6B, were discovered (27)(28)(29). From a vaccine perspective, it was shown that the serotype 6B antibodies elicited by PCV7 were partially protective against serotype 6A but not serotype 6C (15,30,31). However, PCV13-induced antibodies were shown to elicit killing of serotypes 6A, 6B, and 6C (31,32) and PCV10induced antibodies mediated the killing of serotype 6B and possibly serotype 6A (15,33). Serotype 6D pneumococci have been reported infrequently, although their prevalence in South Korea was estimated to be 10%, and a PCV7-induced crossprotective immune response to serotype 6D was demonstrated (34,35). Serotypes 6F, 6G, and 6H have been described very recently, and whether PCVs provide any protection against these serotypes is unknown (10,13).
The first report of serotype 6E pneumococci was by Mavroidi et al., whose study explored sequence diversity and evolution among serogroup 6 pneumococci. Internal fragments of the three serotype-specific genes were sequenced in a diverse collection of 102 isolates of serotype 6A and 6B pneumococci. While they found little sequence divergence between serotype 6A and most of the serotype 6B isolates, they did identify a group of what they called "class 2" serotype 6B sequences, which were Ͼ5% divergent from the majority of serotype 6B isolates (36). Two subsequent studies of serogroup 6 diversity and evolution in other pneumococcal collections confirmed the existence of "class 2" serotype 6B or what one report called "6B-III" or possible "serotype 6E" strains (29,37). Very recently, investigators have reported serotype 6E pneumococci in several Asian countries (38)(39)(40). As part of an ongoing vaccine impact study characterizing Icelandic pneumococci pre-and postimplementation of PCV10, we also discovered serotype 6E strains. Furthermore, we interrogated the genome sequences of several serotype 6B Pneumococcal Molecular Epidemiology Network (PMEN) reference strains and found that they all possessed a serotype 6E cps locus sequence. As far as we are aware, the biochemical structure of serotype 6E polysaccharide is not known. Therefore, we compiled and investigated a large and diverse data set of ϳ1,000 serogroup 6 pneumococcal genomes with three aims: (i) to determine the prevalence, distribution, and epidemiology of serotype 6E (as defined by the cps locus sequence); (ii) to examine the genetics of the serogroup 6 cps locus and the molecular epidemiology of serogroup 6 lineages; and (iii) to assess whether the serotype 6B polysaccharides in PCV7 and PCV13 induce the production of protective antibodies to serotype 6E. published genome data sets (41)(42)(43)(44)(45)(46)(47), GenBank sequence data (http://www .ncbi.nlm.nih.gov/GenBank/), and unpublished genome data from an ongoing vaccine impact study in Iceland ( Fig. 1; see Table S1 in the supplemental material). The vaccine impact study is collecting pneumococcal isolates from healthy children and from patients of all ages with pneumococcal disease and sequencing 3,100 isolates with the Illumina platform. Pre-and postvaccine pneumococci from 2009 to 2015 will be analyzed and compared, and a report on the complete data set will be published in due course. The genome data set included four serotype 6B pneumococcal reference strains from the PMEN collection (48) whose genome sequences were available ( Table 1). The genomes from GenBank were downloaded directly, and all other genomes in the data set were downloaded as raw sequence reads from the European Nucleotide Archive (ENA), assembled with Velvet (49), and deposited in the rMLST database, which is powered by BIGSdb (50,51). Corresponding metadata were manually acquired from the original publications and matched to the genome data. Genome sequence quality was assessed, and poor-quality sequences (e.g., those with gaps or non-full-length gene sequences in the cps locus) were removed, leaving 974 genomes for analysis. All of the genome assemblies analyzed in the present study, with the corresponding metadata, are available from the pneumococcal PubMLST website (http: //pubmlst.org/spneumoniae/).
Serotyping based on the cps locus sequence. The cps locus sequences for each of the serogroup 6 references (serotypes 6A to 6G) were obtained from public databases ( Table 1). The serotypes associated with each of the 974 serogroup 6 genomes were differentiated on the basis of the sequence of the cps locus genes with an in-house serotyping pipeline. Briefly, following an initial screening against the serotype reference sequences, serogroup 6 genomes were serotyped by identifying the following amino acid residues and/or alleles specific for each serotype: serotypes 6A and 6B, wciP 195 ; serotypes 6C and 6D, presence of wciN␤ and wzy 117 ; serotype 6E, wzy 220 (for additional details, see Fig. S1 in the supplemental material or contact us).
Analyses of cps locus sequences. Thirteen cps locus genes were identified among all seven serogroup 6 reference sequences with cd-hit (52) by using a sequence identity threshold of Ͼ80%: wzg, wzh, wzd, wze, wchA, wciO, wciP, wzy, wzx, rmlA, rmlC, rmlB, and rmlD (see Fig. S2 in the supplemental material for phylogenetic trees for all 13 genes). Note that wciN was also present among all of the cps loci, but the ␣ and ␤ versions of wciN were highly divergent, as noted previously (29,53). The serotype 6A reference sequence for each of the 13 genes was then BLASTed (54) against the study genome data set to extract the sequences of the 13 genes from each genome. Pairwise estimates of evolutionary distance (p-distance; number of nucleotide sites that differ between two sequences divided by the total number of nucleotides compared) were calculated for each of the 13 genes in the 974 genomes, stratified by serotype.
The extracted cps locus sequences were aligned gene by gene with Muscle (55) before being concatenated together to obtain a 12,271-bp cps locus alignment for each of the 974 genomes. The concatenated sequences were then input into FastTree2 to construct a cps locus phylogeny by using a nucleotide general time-reversible model (56). The resulting phylogeny was annotated with iTOL (57). The p-distances between the serogroup 6 cps locus reference sequences were calculated with MEGA5 (58). Input sequences were 13,416 bp in length and spanned the cps locus from the start of wzg through the end of rmlD, including intergenic regions but excluding wciN, HG262 (present in the cps locus of serotypes 6A, 6F, and 6G only), and HG263 (present in serotypes 6B and 6E only).
Genome-wide assessment of sequence diversity. The Genome Comparator module in BIGSdb was used to compare all 974 serogroup 6 genomes to the annotated reference genome 670-6B (NC_014498.1), also known as PMEN2 or Spain 6B -2. The BLASTn parameters were set to Ն70% sequence identity and 100% sequence alignment length (50). The data were exported to an Excel spreadsheet that depicted the results of sequence comparisons of each annotated coding sequence (here referred to as a "gene" for simplicity) in the reference 670-6B genome to each of the 974 query genomes. Data were output on a gene-by-gene basis across the entire genome for every query genome. The 670-6B reference gene sequences were designated allele 1, and corresponding sequences in each query genome were designated X (not present), 1 (identical to the reference), N (sequence present but nonidentical to the reference, assigned allele numbers to indicate unique sequences), or T (sequence present but truncated). The exported Genome Comparator gene-by-gene data analysis for the full 2.24-Mb genome was read into R (http://www.r-project .org/) to create a pseudoheat map to compare gene presence or absence and sequence diversity across all 974 genomes.
The Genome Comparator analysis also revealed that 432 gene sequences were found in all 974 genomes in the full coding length; thus, for each of the genomes, these 432 genes were concatenated (292 kb in total) and FastTree2 and ClonalFrameML (59) were used to assemble a phylogenetic tree that represented all 974 genomes. The resulting phylogeny was annotated with iTOL. Multilocus sequence type (MLST) data were available for each genome, and the sequence types (STs) were clustered into clonal complexes (CCs) with Phyloviz (60) (see Table S1 in the sup-plemental material). The 432-gene phylogenetic tree was annotated with CC and serotype data for each genome.
Serological analyses of serotype 6E pneumococci. To evaluate functional antibody responses to serotype 6E, sera collected after primary vaccination from PCV7 and PCV13 recipients (n ϭ 8) who participated in previous vaccine studies (61, 62) were selected. Sera were analyzed by an opsonophagocytosis assay (OPA; a titer of Ն1:8 is considered positive) (63) utilizing five different isolates of serotype 6E pneumococci (PMEN2 and PMEN8 plus three recent Icelandic isolates). To assess the contribution of serotype-specific antibody in mediating the killing of cross-reactive antigens, sera were retested by OPA after the sera were adsorbed with purified serotype 6A, 6B (American Type Culture Collection, Manassas, VA), or 6C capsular polysaccharide (Statens Serum Institut, Copenhagen, Denmark) one at a time and reported as the titer where 50% of bacteria were killed.

Epidemiology of serogroup 6 pneumococci.
The study genome data set represented a diverse set of 974 serogroup 6 pneumococci   a Serotypes were determined from the nucleotide sequence of the cps locus. No pneumococci of serotype 6F or 6G were identified in the study genome data set. The hybrid serotype is genetically a combination of the serotype 6C and 6E cps locus sequences (see Results). b Age data were missing for 625 genomes, but available data indicated that isolates of serotypes 6A, 6B, 6C, and 6E were recovered from both children and adults. c Vaccine status refers to whether any pneumococcal conjugate vaccine (PCV) was being used in the country of origin at the time of pneumococcus isolation. d Susceptibility data were missing for many genomes, but the ranges of available data are given here. S, susceptible. See Table S1 in the supplemental material for more details. recovered in 16 different countries across five continents between 1972 and 2014 ( Table 2). Of the pneumococci recovered from individuals spanning a wide range of ages, 69% (n ϭ 675) were from colonized individuals and 27% (n ϭ 267) were from individuals with disease. A total of 78% (n ϭ 760) of the pneumococci were collected prior to any PCV introduction in the country of origin. Antibiogram data demonstrated a range of antimicrobial-susceptible and -resistant isolates among the serotypes, although susceptibility data were missing for many of the pneumococcal genomes. For the full list of genomes and associated metadata, see Table S1 in the supplemental material. A majority of the pneumococci in this study were originally serotyped by the Quellung and/or latex agglutination methods. See Fig. S3 in the supplemental material for a comparison of the serotype distributions among all 974 genomes using the original serotyping data versus sequence-based serotyping (serotype deduced on the basis of the cps locus sequence).
Sequence-based analysis of the data set revealed that 421 of the 974 pneumococci were serotype 6E. They were recovered from individuals between the ages of 6 months and 87 years residing in 15 different countries in Europe, North America, South America, Africa, and Asia ( Table 2). The serotype 6E pneumococci were isolated from 1981 onward, and 90% were isolated prior to the use of any PCVs. Serotype 6E isolates were recovered from both healthy young children and individuals of all ages with pneumococcal disease. The diseases caused by serotype 6E pneumococci spanned the range of typical pneumococcal diseases, i.e., otitis media, sinusitis, empyema, pneumonia, bacteremia, and meningitis (see Table S1 in the supplemental material).
One-third (n ϭ 318) of the genomes were serotype 6A pneumococci recovered between 1972 and 2013 from patients of all ages residing in six different countries. The majority of the serotype 6A pneumococci were recovered from healthy children, although 67 isolates from patients with invasive and noninvasive diseases were also included ( Table 2; see Table S1 in the supplemental material). Pneumococci with a serotype 6C cps locus sequence made up 12% (n ϭ 115) of the data set and were recovered predominantly from healthy children in Thailand, Iceland, and the United States since 2001.
Eleven percent (n ϭ 108) of the pneumococci possessed a serotype 6B cps sequence. Serotype 6B pneumococci were isolated between 2001 and 2013 from both carriage and disease (otitis media, pneumonia, and bacteremia), from patients of a wide range of ages. Four serotype 6D pneumococci from Thailand were identified among the genomes, but no serotype 6F or 6G pneumococci were identified. Eight carriage pneumococci from Thailand with a hybrid serotype 6C/6E cps locus were also identified, and these are discussed in more detail below.
Serotype-specific prevalence estimates. The study genome data set was diverse and compiled from several different genome collections; thus, estimates of serotype-specific prevalence based on the entire data set may not be representative of the global pneumococcal population. However, the carriage data sets from the Maela refugee camp in Thailand and from Massachusetts could reliably be used to assess the prevalence of serotypes within specific geographic locations during a specified time period.
The Maela genome data set was made up of 3,085 pneumococci collected from infants and mothers living in a rural refugee camp  (Table 1 contains the accession numbers). Red arrows are the 13 genes common to all serogroup 6 cps loci at Ͼ80% sequence similarity, wciN is blue, and the ␣ or ␤ allele is indicated, transposons and other pseudogenes are gray, and a hypothetical gene (hyp) is orange. The ␣ and ␤ versions of wciP are also indicated.
Serotypes among PMEN clones. Genome sequences of four PMEN reference strains were included in this study, PMEN2 (Spain 6B -2), PMEN8 (S. Africa 6B -8), PMEN12 (Finland 6B -12), and PMEN17 (Maryland 6B -17), as shown in Table 1 (48; http: //pubmlst.org/spneumoniae/). All four were previously identified as serotype 6B on the basis of the Quellung reaction, but all possess a serotype 6E cps locus sequence. Note that the cps locus sequence of Poland 6B -20 was not yet available and the genome sequence of Greece 6B -22 was incomplete for some of the cps locus genes and thus could not be analyzed in this study. Moreover, one data set included in this study was compiled specifically to study the PMEN2 lineage (43), which is a multidrug-resistant lineage of pneumococci detected in many countries around the world but was originally identified in Iceland and Spain in the 1980s (http://pubmlst.org/spneumoniae/). One hundred eighty-nine pneumococcal genomes were sequenced in the original PMEN2 study. One hundred seventy-two of these had complete cps locus sequences and were thus included in the present study, and all possessed a serotype 6E cps locus sequence.
Genetics and phylogeny of the cps locus. Thirteen cps locus genes were common to all 974 serogroup 6 pneumococci at a similarity threshold of Ͼ80% (Fig. 2), and synteny was preserved among all of the common genes. Pairwise distances (p-distances) of nucleotide variation between the cps loci of the seven reference strains were calculated, and notably, serotype 6E was 6.7% divergent (p-distance range, 0.065 to 0.068) from all of the other serotypes, whereas the divergences between all of the other serotypes were 0.4 to 1.7% (Table 3). The p-distances were also calculated individually for each of the 13 common genes by using the entire 974-genome data set stratified by serotype, and the median p-distance value was used to provide a simple summary statistic of gene-specific sequence diversity within each serotype (Table 3). Most notably, the serotype 6B, 6D, and 6E cps gene sequences were highly conserved within each serotype (median p-distance per gene ϭ 0), whereas nearly all of the cps locus genes of the serotype 6A and 6C isolates varied to some extent, with wzg, rmlA, and rmlB being the most diverse (in addition to wciN), as noted in a previous study (37).
A phylogenetic tree was constructed with concatenated sequences of the 13 common cps locus genes for each of the 974 genomes, and major serotype clusters were clearly delineated (Fig. 3). There were three major serotype 6A clusters and one cluster each for serotypes 6B, 6C, and 6D. No serotype 6F or 6G pneumococci Pairwise comparisons were made between the serogroup 6 references by using 13,416-bp cps locus sequences, which spanned the cps locus from the start of wzg through the end of rmlD but excluded wciN, HG262, and HG263 from the analysis (see Materials and Methods and Fig. 2). b Median pairwise distances for each cps locus gene were estimated by using the entire 974-pneumococcal-genome data set but stratified by serotype. No serotype 6F or 6G pneumococci were identified among the study genomes. c -, data not available.
van Tonder et al.
were identified, but the serotype 6F and 6G reference sequences were within serotype 6A clusters. This was consistent with the initial report describing these new serotypes as being nearly identical to the serotype 6A cps sequence (10). The serotype 6E pneumococci clustered in one group with minor within-cluster genetic variation, apart from a few pneumococci from South Africa in 1984 and 1985 and Massachusetts in 2004 (long purple branches) that had switched serotypes and are discussed below, and three Thai pneumococci at the tips of very short branches that possess predominantly the serotype 6E cps locus sequence but also have sequence regions that match the serotype 6C or 6D sequence. There was also a small cluster (colored gray) of eight pneumococci collected in Thailand from 2008 to 2010. This small cluster represented a hybrid serotype 6C/6E cps locus, with sequence differences as shown in Fig. 4. The hybrid failed to be classified properly as either serotype 6C or serotype 6E in the serotyping pipeline because it possessed wciN␣ like serotype 6E but the wzy-encoded amino acid sequence that differentiates serotype 6C (Fig. 4A). The reason for this serotyping failure was clearly apparent when the sequences for each of the cps locus genes was examined (Fig. 4B). The hybrid predominantly matches the serotype 6E sequence but contains a region of mainly serotype 6C-like sequence from wciP through roughly the first half of wzx.
Molecular epidemiology of serogroup 6 pneumococci. A phylogenetic tree was constructed on the basis of 432 concatenated gene sequences (292 kb) present in full coding length in all 974 pneumococci (Fig. 5). The tree was annotated by using CC and serotype data, which clearly defined the major CCs (those with Ն12 members) within the data set. The serotype data, represented by the outer colored ring, demonstrated which serotypes were associated with each CC and where serotype switching had occurred within CCs.
Serotype 6A pneumococci were members of 24 different CCs (45 STs) in the study data set, of which 11 CCs captured 90% of the serotype 6A population (see Table S1 in the supplemental material). Serotype 6C pneumococci were associated with 12 CCs, 5 of which defined 83% of the serotype 6C genomes. All but three strains of serotype 6B pneumococci were members of either CC176 (n ϭ 66) or CC138 (n ϭ 39), and all four serotype 6D genomes were ST4407.
Serotype switching among serogroup 6 pneumococci. There was clear evidence of serotype switching (horizontal genetic exchange of all or part of the cps locus sequence, conferring a change in serotype) within 11 CCs, since pneumococci within the CC were not exclusively defined by a single serotype (Fig. 5  and 6). Notably, CC315 was represented by three serotypes: serotypes 6E and 6C and the serotype 6C/6E hybrid. CC90 was defined predominantly by serotype 6E pneumococci, except for nine genomes from the Maela refugee camp that were ST5127 (a double-locus variant of ST90) and had a serotype 6A cps locus. CC1094 was a major South African lineage of serotype 6A, although three genomes isolated in the mid-1980s were serotype 6E.
The cps locus sequences of all 44 genomes associated with putative serotype switches were manually inspected and confirmed. All serotype switches were clearly evident from the sequence, either by an exchange of the entire cps locus or by mosaic patterns of DNA sequence fragments indicative of recombination events within the cps locus (see Fig. S4 in the supplemental material) (17,18,(22)(23)(24)(25). No other hybrid cps loci were identified apart from the serotype 6C/6E hybrid described above.
Genome-wide diversity among serotypes. Finally, all 974 genomes were compared to the PMEN2 genome reference with the Genome Comparator module of BIGSdb to investigate the presence or absence and diversity of 2,352 genes across each genome. These data were depicted as a pseudoheat map, ordered by serotype and CC (Fig. 7). A number of observations were immediately apparent. As expected, the genomes of CC90 6E (CC serotype ) pneumococci were very similar to the PMEN2 reference (ST90 6E ) genome used for comparison (largest white horizontal band). However, they were not identical-the bacteriophage sequences in the PMEN2 reference were either not present or of a different sequence in the CC90 6E study pneumococci, and there was a region of variable genes (blue) in the latter half of the genome in addition to several smaller variable regions in some genomes within CC90 6E . The serotype switch CC90 6A pneumococci were highly similar across the genome to the CC90 6E pneumococci, although CC90 6A genomes also did not have the PMEN2-like bacteriophages and smaller variable regions were identified. The STs within CC273 6E , CC4405 6E , and CC490 6A had some MLST alleles (5, 1-4, and 1, respectively) in common with the STs in CC90, but all possessed genes across the genome that matched CC90 6E identically (genes colored white). In contrast, across the genome, the CC315 6E genes were mainly of different alleles (i.e., mainly genes colored blue). Future studies will use these genome-wide data to investigate whether specific genotypic differences among serogroup 6 lineages relate to phenotypic differences between lineages and/or serotypes.

Vaccine-induced inhibition of serotype 6E.
Finally, a key question was whether or not PCVs would provide immunological protection against serotype 6E pneumococci. Stored sera from infants in the United Kingdom who had previously been vaccinated with either PCV7 or PCV13 were available and used to test for killing of serotype 6E pneumococci. Five strains were tested, and the results are shown in Table 4. Antibodies induced by both PCV7 and PCV13 mediated the killing of the five strains tested. Removal of antibodies specific for serotype 6B polysaccharide abolished the killing completely in most of the sera tested. Removal of serotype 6A antibodies varied by serum analyzed but in general was similar, irrespective of the type of vaccine used (6B alone in PCV7 or both 6B and 6A in PCV13), with only partial inhibition of killing demonstrated, findings consistent with those of a previous study of serotype 6A inhibition of serotype 6B killing (31). Anti-serotype 6C antibody removal also varied depending on the serum used but in general had little effect on killing.

DISCUSSION
This is the first in-depth large-scale interrogation of the genomic epidemiology of serogroup 6 strains, and it illustrates that serotype 6E pneumococci have been circulating for at least 33 years, preceding PCV introduction by nearly 2 decades. Our study revealed that 43% of the genome collection (previously thought to contain predominantly serotypes 6A, 6B, and 6C) in fact represented serotype 6E pneumococci of several major genetic lineages, three of which were multidrug-resistant PMEN lineages. They were distributed across 15 countries and five continents, and the pneumococcal PubMLST database provides evidence of an even wider geographical distribution. Serotype 6E pneumococci caused a range of diseases among all age groups but were also frequently recovered from healthy young children. We identified several major genetic clusters of serotype 6A cps locus sequences, discovered a new hybrid 6C/6E serotype, and revealed many examples of serotype switching involving serotypes 6A, 6B, 6C, 6D, and 6E. Importantly, serological assays demonstrated that vaccine-induced serotype 6B antibodies were able to mediate the killing of serotype 6E pneumococci.
For several decades, the existence of serotype 6E pneumococci was obscured because they cross-react to the serotype 6B antisera used in the Quellung reaction, and it was only by inspection of the cps locus sequences that the existence of serotype 6E was realized. Initially, serotype 6E was recognized by several research groups analyzing key regions of gene sequences within small collections of isolates, and now the high prevalence and worldwide distribution of serotype 6E has been revealed unequivocally here by the interrogation of a global and historical collection of genome sequences. Basically, the majority of what for many years were thought to be serotype 6B isolates were in fact serotype 6E pneumococci. "True" serotype 6B pneumococci were also identified in our study, but they were mainly of two genetic lineages, CC138 and CC176, both of which have also been detected in many countries around the world (http://pubmlst.org/spneumoniae/).
Our study revealed the sequence diversity among serogroup 6 cps loci, and of particular note was the finding of three major serotype 6A cps locus sequence clusters. Do the sequence-based changes in the serotype 6A cps locus result in changes to the polysaccharide, and if so, do PCV-induced serotype 6A antibodies differentially protect against alternative versions of serotype 6A polysaccharide? This warrants further investigation. Moreover, in this study, we discovered the serotype 6C/6E hybrid, which is sufficiently divergent to presumptively consider it yet another serotype, as well as evidence of many distinct serotype switches among the genetic lineages. Yet it is important to recognize that serotype switching and the creation of cps locus genetic variants appear to be normal biological processes among pneumococci and are not a direct consequence of vaccine use (25). However, vaccine-induced immune pressure does alter the pneumococcal population structure, which can select for the emergence of new genetic variants. The earliest reported evidence of vaccine escape pneumococci was the result of such a scenario (17)(18)(19). PCVs perturb the pneumococcal population with unpredictable consequences for those serotypes not targeted by the vaccines; therefore, the importance of genomics and molecular epidemiology in any pneumococcal surveillance program cannot be underestimated.
A detailed comparison of serogroup 6 polysaccharide biochemistry should be investigated as a matter of priority, to understand the biochemical structure of the polysaccharides in the context of the observed serotype-specific epidemiology and killing mediated by serotype 6B antibodies. It may be that the structures of the serotype 6B and 6E polysaccharides are similar enough to explain the inhibition of serotype 6E pneumococci by PCV-in-  (encoding a glucosyltransferase). The authors concluded that small changes in the sequence not only resulted in new capsular types, but they also posited that these changes could confer immunological changes in the human host response (10).
The majority of these serotype 6E study isolates were collected before PCV implementation, and our study showed that PCV7and PCV13-induced antibodies to serotype 6B were protective. Therefore, the overall prevalence of serotype 6E should be significantly reduced after PCV implementation, and PCV vaccine impact studies in many countries have demonstrated a significant reduction in the prevalence of serotype 6B and near elimination of serotype 6B carriage (5,6). This suggests the possibility that (i) PCV7 and PCV13, which contain serotype 6B polysaccharides, inhibit serotype 6B and sufficiently cross-protect against serotype 6E at the population level or (ii) the "serotype 6B" polysaccharides in the vaccines are actually serotype 6E polysaccharides. We have been unable to identify which strain(s) was specifically used to produce the serotype 6B polysaccharides used in the PCVs, in order to confirm the serotype on the basis of the cps locus sequence.
Whether or not current PCVs are, in fact, serotype 6E vaccines remains an open and important question. The overwhelming success of PCVs in reducing serotype 6B (6E) disease suggests that perhaps the question is purely an academic one; however, it would be relevant in cases of vaccine failure where there may be discordance between the vaccine serotype and the serotypes of pneumococci associated with vaccine failure. One clue to a mismatch might be if pneumococci responsible for serotype 6B vaccine failures were of the ST138 or ST176 lineage, as these appear to be the predominant (true) serotype 6B lineages that circulate worldwide. Country-specific estimates of the prevalence of serotype 6E before and after PCV implementation will be essential to assessing FIG 7 Visual representation of the Genome Comparator output for all 974 genomes as a pseudoheat map. Each genome is depicted horizontally and in the gene order defined by the reference PMEN2 genome sequence, which has 2,352 coding sequences (genes). Colors indicate the gene-by-gene presence or absence and sequence similarity of each query genome compared to the PMEN2 reference as follows: gray, the gene is not present in the query genome; white, the gene is present in the query genome and has a sequence identical to that of the reference genome; blue, the gene is present in the query sequence, but the sequence is not identical to that of the reference. Several regions of the PMEN2 reference genome are highlighted as follows: P1, phage remnant; P2, 11865-like phage; P3, 2167-like phage; cps, capsular locus; A1, ATP-synthase operon; ICESp6BST90, ICE element; Variable, variable region of the gene sequences in CC90-6E. whether PCVs are protective against serotype 6E at the population level. Genomics has revolutionized microbiological research, and our study reinforces just how influential the change has been and will continue to be. It is now relatively simple and cost-effective to use next-generation sequencing to obtain a (nearly) complete bacterial genome sequence, and the early challenges of contig assembly, the availability of databases in which to store and query genome sequences, and the development of tools for the analysis of large-scale databases are being overcome. There are currently Ͼ10,000 pneumococcal genome sequences available in public databases, and other genome sequencing projects are under way. Challenges remain, including making published genome assemblies widely accessible to all users, but the genomics field is moving apace.