ABSTRACT
Haemophilus haemolyticus has been recently discovered to have the potential to cause invasive disease. It is closely related to nontypeable Haemophilus influenzae (NT H. influenzae). NT H. influenzae and H. haemolyticus are often misidentified because none of the existing tests targeting the known phenotypes of H. haemolyticus are able to specifically identify H. haemolyticus. Through comparative genomic analysis of H. haemolyticus and NT H. influenzae, we identified genes unique to H. haemolyticus that can be used as targets for the identification of H. haemolyticus. A real-time PCR targeting purT (encoding phosphoribosylglycinamide formyltransferase 2 in the purine synthesis pathway) was developed and evaluated. The lower limit of detection was 40 genomes/PCR; the sensitivity and specificity in detecting H. haemolyticus were 98.9% and 97%, respectively. To improve the discrimination of H. haemolyticus and NT H. influenzae, a testing scheme combining two targets (H. haemolyticus purT and H. influenzae hpd, encoding protein D lipoprotein) was also evaluated and showed 96.7% sensitivity and 98.2% specificity for the identification of H. haemolyticus and 92.8% sensitivity and 100% specificity for the identification of H. influenzae, respectively. The dual-target testing scheme can be used for the diagnosis and surveillance of infection and disease caused by H. haemolyticus and NT H. influenzae.
INTRODUCTION
Haemophilus haemolyticus is a human commensal that colonizes the respiratory tract. It shares high similarities in morphology, biochemistry, and genetics with nontypeable Haemophilus influenzae (NT H. influenzae) (1–3). NT H. influenzae has emerged as the major cause of invasive H. influenzae disease since the implementation of H. influenzae serotype b (H. influenzae b) vaccine (4). NT H. influenzae infections such as childhood otitis media and respiratory tract infections in adults with chronic obstructive pulmonary disease (COPD) result in an enormous social and economic burden to societies (5). H. haemolyticus was recently reported to cause invasive disease (1). These cases were previously misidentified as NT H. influenzae due to the lack of proper identification methods to discriminate between the two species (6). Reevaluation of NT H. influenzae cases from previous years revealed that about 2% (7/374) of the NT H. influenzae invasive cases reported from Active Bacterial Core surveillance (ABCs) were in fact caused by H. haemolyticus (1).
Misidentification of H. haemolyticus as NT H. influenzae has been repeatedly reported by multiple research groups, with up to 40% of isolates being misidentified as NT H. influenzae using classical phenotypic methods in clinical microbiology laboratories (1, 7–11). The misidentification of H. haemolyticus has a potential impact on accurate assessment of the prevalence of antibiotic-resistant NT H. influenzae (12). Therefore, a rapid and accurate method is needed to distinguish H. haemolyticus from NT H. influenzae and to better understand the epidemiology of H. haemolyticus infections. Testing schemes, including both standard microbiological and molecular methods, have been proposed to improve the identification of the two bacteria. These molecular methods target a number of genes such as the lipo-oligosaccharide gene lgtC, the IgA protease gene iga, heme acquisition genes (hxuABC, hemR, and hup), the fuculose kinase gene fucK, and the Haemophilus protein D gene (hpd), with hpd-based PCR being the best method for discriminating NT H. influenzae and H. haemolyticus (13–16). However, identification of H. haemolyticus relies on negative results, as all these genes are present in H. influenzae strains and absent in the majority of H. haemolyticus strains. A test for specific identification of H. haemolyticus is needed to better understand the epidemiology of H. haemolyticus disease. In this study, we conducted comparative genomic analysis to identify unique target genes that are exclusively present in H. haemolyticus strains and can be used to develop PCR assays for specific detection and identification of H. haemolyticus.
MATERIALS AND METHODS
Bacterial strains and growth conditions.Bacterial strains used in this study included ATCC strains, clinical invasive isolates, and carriage isolates. Eighty-nine clinical invasive H. influenzae isolates were collected as part of the Active Bacterial Core surveillance (ABCs) of the Centers for Disease Control and Prevention's Emerging Infectious Program (http://www.cdc.gov/abcs/reports-findings/surv-reports.html). Seventy-eight H. influenzae isolates and 45 H. haemolyticus isolates were collected during a carriage survey in Minnesota in 2009 (17). Additional H. haemolyticus isolates were kindly provided by University at Buffalo, State University of New York (n = 31), and University of Michigan Medical Center (n = 104). The clinical and carriage isolates in this study have been characterized and confirmed using standard microbiology and 16S rRNA gene sequencing.
Whole-genome sequencing and comparison.Genome sequences of the five H. haemolyticus strains were used for comparison with those of H. influenzae. Details of whole-genome sequencing of the strains and the assembled sequences have been reported previously (18). Twenty complete and draft H. influenzae (19 NT H. influenzae and 1 H. influenzae b genome) genome sequences were downloaded from the NCBI RefSeq and Shotgun Assembly Sequence database (19) (Table 1). The evolutionary relationships among these H. haemolyticus and H. influenzae strains were characterized using analysis of 16S rRNA gene sequences and multilocus sequence typing (MLST) locus sequences. 16S rRNA and MLST locus sequences were obtained from the annotated GenBank entries at NCBI. Multiple copies of 16S rRNA loci were retained with postfix .a, .b, etc. 16S rRNA and concatenated sequences of the seven MLST loci were aligned using the program MUSCLE (20), and the resulting alignments were used to reconstruct phylogenetic trees using the neighbor-joining method (21) as implemented in the program MEGA (22). Evolutionary relationships among the H. haemolyticus and H. influenzae strains were also characterized using whole-genome sequences by computing the average nucleotide identity (ANI) (23, 24) between each pair of genomes using the program MUMmer (25).
H. haemolyticus and H. influenzae strains, phenotypes, and whole-genome sequences analyzed in this study
Identification of H. haemolyticus-specific genes by comparative genomics.Complete gene (protein-coding) sets from the H. haemolyticus and H. influenzae genome sequences were obtained from the annotated GenBank entries at NCBI (Table 1). All-against-all nucleotide sequence comparison of complete gene sets was performed using the BLASTCLUST algorithm (26) using default parameters. BLASTCLUST creates clusters of genes that share sequence similarity and coverage above a given threshold using single-linkage clustering. The resulting clusters are exclusive, with a gene mapping to exactly one cluster. The results of the clustering procedure are visualized as a presence/absence matrix.
Sanger sequencing of target genes.Three genes (purT, coding for putative phosphoribosylglycinamide formyltransferase 2; hdg, coding for putative hydrogenase-2 small chain; and sod, coding for putative superoxide dismutase [Cu-Zn]-like) from five additional Haemophilus haemolyticus isolates were sequenced using the Sanger sequencing method to validate their sequence conservation levels. Primers for amplification and sequencing of target genes are shown in Table 2. The PCR amplification and DNA Sanger sequencing were performed as described previously using chromosomal DNA template extracted from five additional isolates (27). The gene sequences were assembled using DNAStar Lasergene 9 (DNAStar, Inc., Madison, WI).
Primers and probes used for DNA sequencing and real-time PCR assaysa
RT-PCR.Chromosomal DNA and crude cell lysates were prepared as described previously (28). Primer Express 3.0 (Applied Biosystems) was used to design appropriate primers and probes for detecting purT. All primers and probes used in this study were optimized by testing in the range of 100 to 900 nM and 100 to 300 nM, respectively. Real-time PCR (RT-PCR) was performed as described previously (28). A chromosomal DNA or crude cell lysate was considered positive if the cycle threshold (CT) value was equal to or less than 35 and negative if the CT value was greater than 40. If a CT value was between 35 and 40, the sample was diluted 10-fold and retested to determine if PCR inhibitors were present. The specimen was considered positive if the CT value of the diluted specimen was equal to or less than 35 and negative if the CT value was greater than 35. To determine the lower limit of detection (LLD), genomic DNA was extracted from H. haemolyticus isolates (28). The DNA concentration was determined using a NanoDrop spectrophotometer (NanoDrop Technologies, Wilmington, DE), adjusted to 20 ng/μl, and then 10-fold serially diluted in PCR-grade water. Each dilution was tested by PCR in triplicate. DNA concentration was converted to genome equivalents per microliter on the basis of 1.8 Mb (carriage isolate) and 2.0 Mb (invasive isolate) per H. haemolyticus genome (AFQN00000000, AFQO00000000, AFQP00000000, AFQQ00000000, AFQR00000000). The LLD for an RT-PCR assay was defined as the DNA concentration that yielded a CT value of 35. Confidence intervals for the H. haemolyticus and H. influenzae sensitivity and specificity values, calculated based on the results of the RT-PCR assays, were determined using standard validation analyses. The validation analyses were performed using SAS 9.2 (SAS Institute Inc., Cary, NC), and exact binomial 95% confidence intervals were estimated using Stata v. 9.2 (StataCorp).
RESULTS
Comparative genomic analysis of H. haemolyticus and H. influenzae.Comparative analyses of 25 whole-genome sequences of H. haemolyticus (5) and H. influenzae (20) were performed in an effort to assess whether strains from these two species, which are often confused using classical phenotypic or biochemical methods, can be clearly distinguished at the genomic level. Phylogenetic analysis using individual (16S rRNA) or multiple (MLST) gene loci clearly distinguishes H. haemolyticus and H. influenzae evolutionary lineages (Fig. 1A and B). Comparison of whole-genome sequences using the ANI technique also yields unambiguous discrimination between the H. haemolyticus and H. influenzae strains (Fig. 1C). These results suggested the possibility that there may be individual gene loci with presence/absence patterns that distinguish H. haemolyticus from H. influenzae. Such loci would represent ideal targets for a real-time PCR (RT-PCR) typing scheme.
Comparison of H. haemolyticus versus H. influenzae evolutionary lineages. H. haemolyticus and H. influenzae strains analyzed here are labeled with their species abbreviations and the strain names shown in Table 1. (A and B) Phylogenetic trees based on 16S rRNA gene (A) and concatenated MLST loci (B) showing the evolutionary relationships of the H. haemolyticus and H. influenzae genome sequences analyzed here. Percent bootstrap values indicate support for internal nodes on the trees, and the branch length scale bars show P distances. (C) Results of whole-genome sequence comparisons among H. haemolyticus and H. influenzae based on ANI analysis. ANI values (percentages) between pairs of genomes are color coded as shown in the key, and the relationships among the genomes based on these values are shown as dendrograms on both axes.
Identification of H. haemolyticus-specific RT-PCR gene targets.All-against-all comparison of complete gene sets from the 25 genomes analyzed here was used to define clusters of homologous genes that are exclusive to the H. haemolyticus lineage. A total of 93 clusters were found with homologous genes that were present in all 5 of the H. haemolyticus genomes and absent in all 20 of the H. influenzae genomes (Fig. 2). All of these clusters represent potential gene targets for H. haemolyticus-specific RT-PCR, and clusters with conserved flanking regions (for primer and probe binding sites) were considered for the development of RT-PCR assays. Three genes (purT, coding for putative phosphoribosylglycinamide formyltransferase 2; hdg, coding for putative hydrogenase-2 small chain; and sod, coding for putative superoxide dismutase [Cu-Zn]-like) showed high sequence similarity among the 5 H. haemolyticus genomes (Fig. 3) and were further characterized using Sanger sequencing from 5 additional H. haemolyticus isolates in order to confirm the sequence similarity. Alignment of the 10 sequences for each of the 3 genes revealed a potential RT-PCR target purT (see Fig. S1 in the supplemental material); purT sequences showed more than 96% identity among the 10 H. haemolyticus strains. Two conserved regions within purT were selected for designing primers and probes for RT-PCR assays purT 1 and 2 (see Fig. S1 in the supplemental material). Sequences of the purT primers and probes were determined to be specific to H. haemolyticus by BLAST search against GenBank (Table 2).
Homologous gene cluster presence/absence matrix. Genomes are shown as rows, and homologous gene clusters are shown as columns. The presence of a gene cluster in a genome is indicated by green, and cluster absence is indicated by blue. H. haemolyticus genomes are shown on top of the matrix, and H. influenzae genomes are shown below. Core clusters found in all genomes are shown on the right of the matrix, and clusters exclusive to H. haemolyticus (i.e., potential RT-PCR targets) are shown in the upper left corner of the matrix.
H. haemolyticus-specific potential RT-PCR target genes. The presence (green) and absence (blue) of 3 H. haemolyticus-specific genes—hdg, purT, and sod—are shown for the H. haemolyticus (n = 10) and H. influenzae (n = 20) strains analyzed here. Strain names are as shown in Table 1. The percent identities of H. haemolyticus-specific genes to reference genes from strain M19107 are color coded as shown in the key. The sequences of hdg, purT, and sod of six H. haemolyticus strains (marked with asterisks) were determined by Sanger sequencing. Five H. haemolyticus strains were sequenced by WGS. The M19107 strain was characterized by both methods.
Evaluation of the RT-PCR assays.Optimal concentrations of the RT-PCR primers and probes for the purT 1 and 2 assays are listed in Table 2. Of the 65 strains of non-H. haemolyticus bacterial species that were tested by the two purT assays (Table 3), none was positive for purT, suggesting that the assays were specific for H. haemolyticus. Both assays have consistently low average LLDs under the tested conditions, CT values of 35 for 29 to 40 genomes/PCR, and an amplification curve with a higher plateau for assay 1. As a result, assay 1 was chosen for further validation using H. haemolyticus carriage and invasive strains.
Non-H. haemolyticus bacterial species tested in this study
A total of 347 strains (180 H. haemolyticus and 167 H. influenzae strains identified by 16S rRNA gene) were tested to determine the sensitivity and specificity of the purT 1 assay for the detection of H. haemolyticus. Of these strains, 178/180 H. haemolyticus strains (98.9% sensitivity) were positive for purT, 2 out of 180 H. haemolyticus strains were negative for purT, and 5 out of 167 H. influenzae strains were positive for purT. The hpd assay detects the Haemophilus protein D-encoding gene and is currently used for specific detection of H. influenzae regardless of the capsulation status (14). Using the same strain collection, the sensitivity and specificity of the hpd assay were 94% (157/167) and 97.8% (176/180) for the identification of H. influenzae, respectively.
A testing scheme combining the purT 1 and hpd assays was validated for distinguishing H. haemolyticus from H. influenzae, with purT+/hpd being the expected genotype for H. haemolyticus and purT/hpd+ for H. influenzae (Fig. 4A). Of the 180 H. haemolyticus strains, 174 were positive for purT and negative for hpd (purT+/hpd). Four were positive for purT and hpd (purT+/hpd+), and 2 were negative for both genes (purT/hpd), which are not the expected H. haemolyticus genotypes. Of the 167 H. influenzae strains, 155 were negative for purT and positive for hpd (purT/hpd+). Two were purT+/hpd+, 7 were purT/hpd, and 3 were purT+/hpd, which are not the expected genotypes for H. influenzae. Using 16S rRNA gene sequencing as the reference standard, the sensitivity and specificity of the testing scheme were 96.7% and 98.2% for the detection of H. haemolyticus, respectively, and 92.8% and 100% for the detection of H. influenzae, respectively (Fig. 4B).
Results of RT-PCR assays for the discrimination of H. haemolyticus versus H. influenzae strains. (A) Four possible results for combined purT and hpd RT-PCR assays are evaluated: purT+/hpd, purT/hpd+, purT+/hpd+, and purT/hpd. The numbers of H. haemolyticus (red) and H. influenzae (blue) strains with each combination are shown. (B) Sensitivity and specificity values for discrimination of H. haemolyticus and H. influenzae based on purT or hpd RT-PCR assays alone compared to combined RT-PCR assays for H. haemolyticus (purT+/hpd) and H. influenzae (purT/hpd+).
DISCUSSION
Historically, H. influenzae has been the most important species of Haemophilus causing invasive human disease and fatalities (4). Bacteremia caused by other Haemophilus species has not been frequently reported. H. haemolyticus was recently reported to cause invasive disease in the United States (1). Little is known about the mechanisms of H. haemolyticus pathogenicity. H. haemolyticus may cause disease more as an opportunistic pathogen. By comparative genomic analyses, we did not find any genes coding for the biosynthesis of capsular polysaccharide in H. haemolyticus strains, a major virulence factor that confers bacterial resistance to phagocytosis and complement-mediated host defense. However, genes encoding factors for colonization and invasion such as pili and IgA1 protease are present in these strains.
Unambiguous discrimination of the closely related species H. influenzae and H. haemolyticus is important for laboratory diagnosis and surveillance of H. influenzae and H. haemolyticus disease and carriage evaluations. Over the last decade, considerable research effort has focused on identifying molecular targets and suitable methodologies to differentiate NT H. influenzae from H. haemolyticus. One of the principal phenotypic differences between H. haemolyticus and NT H. influenzae is hemolysis by H. haemolyticus. However, this difference is often unreliable as H. haemolyticus can lose the defining hemolytic phenotype upon in vitro passage (1–3, 11, 29–31). A number of genetic targets (hpd, ompP2, opmP6, lgtC, 16S rRNA, fucK, and iga) have been evaluated for differentiation of NT H. influenzae from H. haemolyticus (15). However, no single gene target tested was able to unequivocally differentiate NT H. influenzae and H. haemolyticus. For example, 16S rRNA gene PCR permitted identification of H. haemolyticus and NT H. influenzae for only 90% of strains (6), while the ompP6-based assay detects about 97% of NT H. influenzae strains but also detects 12% of H. haemolyticus strains (2). Identification of H. haemolyticus-specific targets was challenging due to the lack of available H. haemolyticus genome sequences.
Whole-genome sequencing (WGS) provides a valuable tool for better understanding H. haemolyticus genetics and for identifying unique genetic targets for assay development. Recently, five H. haemolyticus strains were sequenced and annotated by our group (18). Comparative analysis revealed several H. haemolyticus-specific genes, including the purT gene; these genes were found to be present exclusively among 5 H. haemolyticus genomes and absent from 20 H. influenzae genomes. purT sequences are highly conserved among all of the H. haemolyticus strains examined in this study, which makes it an ideal target for RT-PCR development. While the purT assay was found to be highly sensitive and specific for H. haemolyticus detection, the addition of testing for the presence of hpd might provide enhanced discrimination for the detection of both organisms and could be used to enhance diagnosis and surveillance of H. haemolyticus and H. influenzae infections.
A few exceptions were observed (Fig. 4A): four H. haemolyticus and two H. influenzae strains were positive for both purT and hpd genes. Because of the high frequency of horizontal gene transfer among and between Haemophilus species, it is conceivable that, during pharyngeal colonization with both species, H. haemolyticus strains can acquire H. influenzae genes and vice versa. Two H. haemolyticus and seven H. influenzae strains were negative for both purT and hpd genes. The results seen with purT and hpd PCR-negative isolates may have been due to deletions at that locus or to sequence variation at any of the primer or probe binding sites (30, 32).
Multiple studies have shown that single-target-based tests are not ideal for discriminating bacterial species, particularly for closely related species, or for classifying the variant strains that have diverged from their species, such as fuzzy species of Neisseria (2, 15, 33). While multitarget-based approaches can improve the sensitivity and specificity, developing multiple tests for each bacterial species is a huge undertaking. WGS can potentially serve as a universal high-throughput method to improve the accuracy of species delineation and has been proven accurate for bacterial species classification (34). In addition, WGS provides much-enriched genetic information for strain subtyping and will be very useful for disease surveillance and outbreak investigations. Matrix-assisted laser desorption ionization–time of flight mass spectrometry (MALDI-TOF MS) is increasingly used for species classification in diagnostic microbiology laboratories. However, it may not be able to provide sufficient resolution for subtyping of bacterial pathogens, which is often performed in public health microbiology laboratories for disease surveillance and other large epidemiological surveys. The power of MALDI-TOF for identification of H. haemolyticus and H. influenzae is highly dependent on a well-defined reference spectrum for H. haemolyticus in species databases, which varies between laboratories (35, 36). Its utility in discriminating between H. haemolyticus and H. influenzae remains to be further validated.
As the laboratory bioinformatics capacity increases and costs of system acquisition decrease, the advanced molecular detection tools such as WGS and MALDI-TOF may be widely utilized in diagnostic and public health microbiology laboratories. However, rapid and less expensive methods with high throughput, such as PCR, remain valuable today in these laboratories for diagnosis and surveillance of infectious diseases.
ACKNOWLEDGMENTS
We are grateful to ABCs for providing strains and the Bacterial Meningitis Laboratory for technical support.
FOOTNOTES
- Received 13 July 2016.
- Returned for modification 3 August 2016.
- Accepted 26 September 2016.
- Accepted manuscript posted online 5 October 2016.
Supplemental material for this article may be found at http://dx.doi.org/10.1128/JCM.01511-16.
- Copyright © 2016, American Society for Microbiology. All Rights Reserved.