| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Previous Article | Next Article ![]()
Journal of Clinical Microbiology, October 2007, p. 3251-3256, Vol. 45, No. 10
0095-1137/07/$08.00+0 doi:10.1128/JCM.00898-07
Copyright © 2007, American Society for Microbiology. All Rights Reserved.

Respiratory and Systemic Infection Laboratory,1 Statistics, Modelling, and Bioinformatics Department, Health Protection Agency, Centre for Infections, London, United Kingdom2
Received 30 April 2007/ Returned for modification 17 July 2007/ Accepted 1 August 2007
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
In the research setting the SBT method has been shown to be robust, to have excellent interlaboratory reproducibility, and to yield epidemiological concordant results. Using this methodology, isolates from cases of travel-associated legionellosis can be examined in one country, and the results can be compared directly with those obtained by coworkers investigating the likely environmental source and other linked cases in another country (3, 8, 20, 22, 23). The public health, political, and economic consequences can be significant, if the source of infection is wrongly attributed (e.g., a hotel is erroneously implicated). Therefore, it is essential for the investigating authorities to have confidence in quality of the typing data reported.
The aims of this international, multicenter study were to (i) assess the ability of national and regional reference laboratories to correctly type coded distributions of L. pneumophila isolates using the standard EWGLI SBT protocol and associated web-based tools and (ii) seek to remedy any deficiencies identified, by using a comprehensive external quality assessment (EQA) program.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Bacterial isolates. Clinical and environmental isolates were selected from the EWGLI culture collection for inclusion in three distributions. The first distribution comprised 11 L. pneumophila serogroup 1 (sg1) isolates of which three pairs of isolates were epidemiologically related; the second distribution comprised six L. pneumophila sg1 and four non-sg1 isolates, of which two pairs were epidemiologically related. The same epidemiologically related pair (EUL 048 and EUL 056) and an unrelated isolate (EUL 137) were included in the first and second distributions. The second distribution, also contained single locus variants (SLVs): the allelic profile of the epidemiologically related pair (3,10,1,28,14,9) was distinct from the allelic profile of an epidemiologically unrelated isolate at one locus (pilE) (3,13,1,28,14,9); the third distribution comprised five L. pneumophila sg1 isolates, of which a pair of isolates were epidemiologically related and another pair were SLVs. Isolates were coded prior to dispatch to blind the participants to their identity (Table 1). Replicates of each isolate were prepared on buffered charcoal yeast extract agar (Oxoid) slopes and then dispatched to each participant by courier. The first distribution was dispatched in August 2003, the second was dispatched in October 2004 and the third was dispatched in January 2006. With each distribution, the coordinating center provided detailed instructions for completion and submission of results.
|
DNA sequence analysis. Participants were instructed to generate consensus sequence of the correct length by aligning forward and reverse sequences against a reference sequence. The trimmed reference sequence for each allele was available for download from the EWGLI SBT database pages (http://www.ewgli.org). No specific software package was stipulated for contig assembly.
Allele identification. Alleles (submitted as flat text files) were identified by comparison of consensus sequences with the sequences of preexisting alleles held in the dedicated online EWGLI SBT database. The database returned an allele type, e.g., 1, when a submitted sequence showed a 100% match to a predesignated allele type. Sequences with < 100% match were identified as the closest match to a preexisting allele type with the number of mismatches specified. Additionally bioinformatic algorithms supporting the database returned an alignment with mismatches highlighted.
Reporting arrangements. Results were submitted to the coordinating center by completing and returning a questionnaire indicating the SBT allelic profile for each coded isolate. This form also detailed other information such as the sequencing chemistry; sequencing platform; PCR thermocycler manufacturer and model; the DNA analysis software and version used for contig assembly.
Scoring of participant results. Analysis of submitted results was undertaken by the coordinating center and each participating center was provided with a report summarizing the results of the first survey. Participants were each given a confidential center number. Each intended allele for each isolate was scored as "1," incorrect alleles were scored as "0" (maximum score of 6 per isolate). The number of correct alleles for all isolates was calculated and expressed as a percentage. The inclusion of data for the mompS allele was considered optional; therefore, participants who only reported results for five alleles were scored out of 55, while those who examined all six were scored out of 66. Separately, participants were scored for the number of isolates for which they reported the intended allelic profile (maximum score of 11) and the number of epidemiologically related pairs which were given the same profile (maximum score of 3).
(ii) Second EQA distribution. Each participant was asked to examine 10 coded L. pneumophila (6 sg1 and 4 non-sg1) isolates (no. 12 to 21) using the revised SBT protocol (11). Participants were instructed to sequence at least four of the six targets for all isolates.
DNA sequence analysis. DNA sequence analysis was performed as described above for the first EQA distribution.
Allele identification and reporting arrangements. Allele identification and reporting arrangements were essentially as described for the first EQA distribution, but in addition the submitted alleles were captured online (as flat text files) for later review by the coordinating center. Participants also completed and returned their results via a questionnaire form as previously.
Scoring of participant results. Analysis of the submitted results was undertaken by the coordinating center as described for the first EQA distribution, except that for the second EQA study a global report on all results was also made available on the EWGLI website. Participants were scored as described previously, for sets of results returned via questionnaire, and for sets of results captured via web-based tools: the best set of results for each center was reported. Centers that only returned sequences from four genes were scored out of 40, those sequencing five genes were scored out of 50, and those sequencing all six genes were scored out of 60. Separately, participants were scored for the number of isolates for which they reported the intended allelic profile (maximum score of 10) and the number of epidemiologically related pairs which were given the same profile (score 2) and for correctly distinguishing the SLVs (score 1) (combined maximum score 3).
Error analysis. By review of the submitted text files, the coordinating center attempted to classify any incorrect allele designations into either process errors (PE) (e.g., the allele reported was present in a isolate in the panel but was attributed to the wrong isolate due to isolate switching, clerical error, or mislabeling of DNA sequence trace files) or to DNA sequencing errors (SE) (e.g., due to low quality DNA sequence data or erroneous editing, such that an allele was reported that did not exist in the database or which did exist but was not represented by any isolate in the panel).
(iii) Third EQA distribution. Each participant was asked to test a panel of five L. pneumophila sg1 isolates (no. 22 to 26) using the SBT protocol described by Gaia et al. (11). Participants were instructed to amplify and sequence all six loci.
DNA sequence analysis and allele identification. Participants were asked to analyze their data in two ways; firstly, as described above for the first EQA distribution, and secondly, using a newly developed automated sequence quality tool (SQT) accessible via the EWGLI website (http://www.ewgli.org).
SQT. This online SQT uses a web-based interface accessible via any standard internet browser, e.g., Firefox (Mozilla) or Internet Explorer (Microsoft Corp.), thus providing ready access for both the user and curator(s) of a database. The web pages are constructed dynamically using the perl programming language and cgi functionality hosted on a UNIX-based web server (21). The design of the application allows users to upload forward and reverse trace files, in standard file format (*.scf) (5) or ABI trace file format (*.abi) (Applied Biosytems), for individual SBT alleles. The tool then attempts to carry out base-calling on the uploaded traces using Phred software (version 0.020425.c) (6, 7), assembles contig(s) from the traces using the Phrap software (version 0.990329) (14), finds start and end (reference) positions in the contig, trims the contig using these positions, and then matches the trimmed contig against the preexisting allele sequences in the SBT database to give either an exact match or a closest match with the number of mismatches between the closest match and the uploaded sequence. The tool also produces a sequence quality report for each uploaded contig and generates a six-figure allelic profile (e.g., 3,4,1,1,14,9), as well as indicating whether the obtained allelic profile is of a novel combination (21). The quality scores are logarithmically linked to error probabilities, as called by the Phred software.
Scoring of participant results. Analysis of submitted results was undertaken by the coordinating center as described for the second EQA distribution. Participants were scored as described previously (i.e., for results returned via questionnaire) and for results captured via the SQT: their best set of results was recorded. Centers were scored out of 30. Separately, participants were scored for the number of isolates for which they reported the intended allelic profile (maximum score of 5) and for giving the epidemiologically related pair the same (intended) profile and correctly distinguishing the SLVs (combined maximum score of 2).
Error analysis. Error analysis was as described for the second EQA distribution.
DNA sequencing platform and data analysis software. In these studies, DNA sequencing was performed "in-house" or using a commercial provider on one of the following sequencing platforms: an ABI 310, 3100, 377, or 3730 DNA genetic analysis system (Applied Biosystems); a CEQ 8000 DNA analysis system (Beckman Coulter); or a Molecular Dynamics MegaBACE 1000 (Global Medical Instrumentation). Participants also used DNA analysis software, including BioNumerics and Kodon (Applied Maths), Bio-Edit (Ibis Therapeutics), ABI Prism AutoAssembler and SeqScape (Applied Biosystems), Lasergene editseq and megalign (DNASTAR), Chromas (Technelysium, Pty, Ltd.), Readseq (EMBL-EBI), Vector NTI (Invitrogen), or Align IR (LI-COR).
| RESULTS |
|---|
|
|
|---|
1 error for all isolates. Seven of ten centers correctly identified the allelic profiles of the three pairs of epidemiologically related isolates, scoring 3/3 (Table 2).
|
For centers that failed to achieve 100%, review of their submitted text files allowed the coordinating center to categorize each error as either a PE or SE. Centers 16 and 18 each made two PE (two alleles switched) and center 11 made four PE (two pairs of alleles switched). Centers 14, 15, 17, and 19 made one, four, four, and nine errors, respectively, most of which were SE. Four of the five centers that failed to achieve 100% in the first EQA distribution took part in this assessment and showed improved performance (three centers obtained a maximum score of 100%). It is noteworthy that the score from center 15, which had received training at the coordinating center between the two assessments, increased from 71 to 93%.
Third EQA distribution. Results were received from 27 of 29 centers, of which 25 had tested the six targets for the entire panel (Table 2). Of these 25 centers, 19 (76%) achieved a maximum score (100%), correctly identifying the allelic profile of all five isolates, including the related pair and the two SLVs. Of the remaining six centers, two centers correctly identified the epidemiologically related pair, and two centers were able to differentiate the two SLVs.
In this distribution participants used the SQT for data submission; thus, the coordinating center was able to review the submitted DNA sequence trace files to investigate the possible causes of the errors. Centers 5, 8, and 20 each made a single PE; center 11 made two PE. Only two centers made SE (centers 4 and 31 making 11 and 3 SE, respectively).
Overall performance. The overall performance scores improved on each successive EQA distribution: 50% (5 of 10) of participants achieved the maximum score for the first EQA, 56% (9 of 16, including center 9) for the second, and 76% (19 of 25) for the third. Nine centers reported valid sets of results for all three EQAs, of which three centers scored 100% (156 of 156 alleles correctly identified), two scored 99%, and one each scored 98, 96, 94, and 85%. Improvement in the performance of individual laboratories was most marked for center 15, whose score increased from 71 to 93 to 100%.
DNA sequencing platforms. The majority of centers participating in the present study used an ABI sequencing platform. However, there was no apparent difference in the sequences quality obtained with other platforms.
| DISCUSSION |
|---|
|
|
|---|
The first EQA distribution was distributed to 16 laboratories in August 2003, shortly after the SBT method was first reported (12). The fact that half the participants who returned data (5 of 10) scored 100% in the assessment clearly demonstrates that SBT can provide highly reproducible, unambiguous data. However, contrary to initial expectations, the other five laboratories did not achieve 100%, reporting between 1 and 19 incorrect, or invalid, alleles of the 55 or 66 they examined. Since the organizers had not anticipated that so many errors would be made, the EQA reporting system was not designed to collect data to determine the nature of these errors. Nevertheless, it appeared that while some problems were due to the actual process of the EQA itself (e.g., switching of isolate or data during analysis or reporting) some were undoubtedly due to technical problems, resulting in low-quality DNA sequence data. After this first EQA distribution, the coordinating center offered training and advice to some laboratories that had little previous experience of DNA sequencing, and EWGLI made some changes to the standard SBT protocol (11).
A larger number of laboratories (n = 19), registered for the second EQA distribution (October 2004), including the 10 centers that returned results from the first distribution. In an attempt to differentiate the errors caused by the actual EQA process (herein called PE) from errors caused by poor technique (SE), for this assessment, in addition to the allelic profiles of the coded isolates, the actual DNA sequence flat text files used to assign alleles were also captured by the online database. Again, approximately half (9 of 16, including center 9) of the participants achieved a maximum score, but a substantial number did not. Review of the captured flat text files from these laboratories allowed the coordinating center to classify the type of errors made. For three participants (including two that had scored 100% in the first EQA), all of the errors made were PE, but for four laboratories DNA SE were the main cause.
It was clear from the results of these first two EQA distributions that, while in experienced hands the SBT is an excellent method, in less-experienced hands problems with DNA sequence quality are frequent. Of the few DNA sequencing methodologic studies reported, one of the most revealing statistics was that the association between laboratory performance and the number of sequencing assays per year was statistically significant (1).
After the second distribution described above, the original DNA sequence trace files were requested from several centers for review by the coordinating center (data not presented). Comparison of these trace files against the actual flat text files submitted to the database indicated that where a laboratory obtained a low-quality DNA sequence trace, either by using software or by manual editing, they generated an unambiguous, but erroneous, DNA sequence text file which, when submitted to the database, identified an incorrect or "novel" allele. In an attempt to remove this important source of user error, bespoke software was written, incorporating the well-established phred and phrap algorithms, to provide an online DNA SQT (21). Diverse data, in terms of sequence string length and quality of management data, were reported from the European Union-funded sequencing program EQUALseq by Ahmad-Nejad et al. (1). The authors of that study emphasized the need to edit sequence data in order to generate valuable information. Although this approach has merit if the user is experienced, our approach differs fundamentally by using dedicated software to remove user bias.
The third EQA distribution was distributed in January 2006. By this time, the use of the EWGLI SBT method had increased to the extent that, in addition to EWGLI laboratories, coworkers in Australia, Canada, Japan, and the United States were also using this methodology (3, 4, 13, 22, 23). Consequently, participation was invited from this wider group, and a smaller EQA panel of five isolates was distributed to 29 centers in 20 countries. In addition to asking participants to submit data via the online databases, they were asked to submit data via the new SQT. Overall, the results of this third EQA were very encouraging, with 19 of 25 (76%) of participants achieving the maximum score. Of the six participants that failed to achieve 100%, four each made a single PE resulting in duplication (one instance) or switching (three instances) of data. Only two participants had technical DNA sequence problems, and both of these participants have only limited experience with the SBT method. The use of the automated SQT by participants clearly helped improve overall performance by providing a uniform, objective, and standardized measure of DNA-sequence quality.
Results from these EWGLI multicenter EQA studies illustrate that SBT (together with the dedicated web-based tools) is a rapid, robust, reproducible, and widely applicable method for the typing of L. pneumophila. It allows laboratories to distinguish with a high degree of discrimination between epidemiologically related and unrelated isolates and to transit these data with high fidelity between coworkers around the world. The degree of reproducibility achieved by some participants across the whole study is much higher than has been reported for any other method of typing L. pneumophila assessed by coded panels (9, 10). However, given the potential of the methodology and the consequences of reporting erroneous results, not all laboratories performed SBT to a high enough standard. Two problem areas stand out: inexperience and poor laboratory practice. Training of less-experienced laboratories was clearly beneficial in these studies (e.g., center 15, whose score increased from 71 to 100%), and we would strongly recommend that laboratories should not use the SBT method in real investigations until they have trained staff and have demonstrated competence (e.g., through participation in EQA schemes such as this). In some instances even highly experienced laboratories made errors, but these appeared to be due poor laboratory practice (e.g., poor or illogically labeled DNA sequence files which become switched) rather than technical problems. Although it might be argued that some of these PE are artifacts of an EQA scheme, it should be remembered that investigations of legionellosis are typically undertaken at short notice with very tight deadlines, and it is in just such situations that simple errors are most likely to occur. One solution to this problem is to prepare "best practice" guidelines that provide, among other things, clear guidance of the logical designation of DNA sequence data filenames, but currently these are lacking. It should also be noted that the phred and phrap software are not able to analyze all sequence trace file formats. Thus, additional methods of ensuring the quality of sequence from such platforms are required.
Compared to DNA fragment-based techniques, such as pulsed-field gel electrophoresis and amplified fragment length polymorphism analysis, DNA sequencing offers much greater reproducibility. However, as the use of the SBT method increases, it is imperative to constantly improve and assess laboratory performance. We believe that EQA schemes, such as that described here, are essential, and indeed others have emphasized the call for mandatory participation in EQAs (1). Such participation should provide a greater degree of confidence in laboratories responsible for reporting microbiological genotyping data, particularly with its increasing use in medicolegal cases.
| ACKNOWLEDGMENTS |
|---|
| FOOTNOTES |
|---|
Published ahead of print on 8 August 2007. ![]()
The members of the European Working Group for Legionella Infections who participated in this study were as follows: A. Indra (Vienna, Austria), A. Deplano and O. Soetens (Brussels, Belgium), P. Tilley (Calgary, Alberta, Canada), K. Bernard, (Winnipeg, Manitoba, Canada), J. Bangsborg (Herlev, Denmark), S. Nielsen (Copenhagen, Denmark), J. Etienne (Lyon, France), S. Mentula (Helsinki, Finland), P. C. Lück (Dresden, Germany), L. Franzin (Turin, Italy), M. Scaturro, and P. Visca (Rome, Italy), J. Amemura-Maekawa (Tokyo, Japan), J. Mossong (Luxembourg, Luxembourg), J. P. Bruin (Haarlem, The Netherlands), K. van der Zwaluw (Bilthoven, The Netherlands), D. A. Caugant (Oslo, Norway), T. Marques (Lisbon, Portugal), D. S. Lindsay (Glasgow, Scotland), S. Blanco (Barcelona, Spain), C. Pelaz (Madrid, Spain), F. Fendukly (Stockholm, Sweden), B. Herrmann (Uppsala, Sweden), V. Gaia (Bellinzona, Switzerland), S. Lai (London, United Kingdom), and R. F. Benson (Atlanta, Georgia). ![]()
| REFERENCES |
|---|
|
|
|---|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Antimicrob. Agents Chemother. | Clin. Microbiol. Rev. |
|---|---|
| Clin. Vaccine Immunol. | ALL ASM JOURNALS |
|---|