Previous Article | Next Article 
Journal of Clinical Microbiology, April 2003, p. 1785-1787, Vol. 41, No. 4
0095-1137/03/$08.00+0 DOI: 10.1128/JCM.41.4.1785-1787.2003
Copyright © 2003, American Society for Microbiology. All Rights Reserved.
BIBI, a Bioinformatics Bacterial Identification Tool
G. Devulder,1* G. Perrière,2 F. Baty,1 and J. P. Flandrois1
UMR CNRS 5558, Laboratoire de Bactériologie, Faculté de Médecine Lyon-Sud, 69921 Oullins Cedex,1
UMR CNRS 5558, Université Claude Bernard-Lyon 1, 69622 Villeurbanne Cedex, France2
Received 23 September 2002/
Returned for modification 27 November 2002/
Accepted 20 January 2003

ABSTRACT
BIBI was designed to automate DNA sequence analysis for bacterial
identification in the clinical field. BIBI relies on the use
of BLAST and CLUSTAL W programs applied to different subsets
of sequences extracted from GenBank. These sequences are filtered
and stored in a new database, which is adapted to bacterial
identification.

TEXT
In the medical field, bacterial identification is the main activity
of clinical microbiology laboratories. Conventional biochemical
methods and phenotypic tests for species differentiation are
tedious and time-consuming and may require specialized testing
that is beyond the capacity of clinical laboratories. Recent
progress in molecular biology and bioinformatics allows the
consideration of other methods that are more universal and less
time-consuming. Molecular methods using one or several appropriate
genes are gaining increasing importance because they yield quick
and, in most cases, unequivocal results (
2). The increasing
number of sequences submitted to GenBank (
7) and the data-processing
programs already developed led us to think that these techniques
will be increasingly developed. Sequence-based identification
guarantees a constant response time and may be applied to all
microorganisms. Today, sequencing techniques are well controlled,
but the identification tasks require the chaining of different
programs that are sometimes complex to handle, especially for
neophytes. Using BLAST alone without phylogenetic data would
not be appropriate to perform bacterial identification.
Thus, we have developed a specific bioinformatics tool dedicated to bacterial identification (BIBI, for Bioinformatics Bacterial Identification) in order to simplify sequences analysis within a bacterial identification framework. BIBI fully automates and speeds up different operations for the treatment of sequences. BIBI, which can be accessed at http://pbil.univ-lyon1.fr/bibi/, enables the identification of a microorganism from a gene fragment sequence of previously described cultured bacteria. This program combines similarity search tools in the sequence databases and phylogeny display programs. Thus, it is possible to easily obtain quick results while preserving great freedom in their interpretation, thanks to the use of phylogenetic tools. In addition, to automate the sequence analysis, BIBI integrates different sequence databases which are specifically adapted to bacterial identification to eliminate inaccuracies related to the direct use of sequences from GenBank.
The program implements a chaining of two well-known tools: BLAST (1) and CLUSTAL W (5). CLUSTAL W runs are accelerated by the use of prealigned BLAST results. BIBI is written in standard ANSI C language, and the interface is implemented in HTML-PHP. Analysis of an unknown sequence proceeds in four phases: search for matching sequences, sequence extraction and parsing, sequence alignment, and display of results (Fig. 1). The search for sequences similar to the one submitted is carried out by BLAST. The following stage consists of filtering of the BLAST results, which is, in fact, the key point of the method. Pairwise local alignments from the BLAST output file are extracted and saved in FASTA format. The n similar sequences and the submitted sequence are then multiply aligned with CLUSTAL W, which creates three different files containing (i) a sequence alignment, (ii) a tree in NEWICK format, and (iii) the phylogenetic distances. The use of prealigned sequences produced by BLAST instead of sequences extracted from a database allows an important gain in speed during alignment. Users can also use Dialign (3), another program for multiple-sequence alignment, which builds sequence alignment by comparison of whole segments of the sequences rather than comparison of single residues. The final result corresponds to a sorted table that presents all distinct phylogenetic distances between the query and similar sequences. The results are available within an HTML page (Fig. 2). Phylogenetic alignments and trees are displayed by two Java applets: Jalview (version 1.7 [http://www2.ebi.ac.uk/
michele/jalview/]) and ATV (8). Bacterial identification is realized by a visual inspection of the tree and/or the multiple alignment. Users can also browse the BLAST output in order to detect possible anomalies in the identification process. It is then possible to remove some sequences to perform a new analysis on a subset of defined sequences. All the files generated are available for direct download through FTP.
Different sequence databases are designed specifically for bacterial
identification. The first contains all of the bacterial sequences
of GenBank without sequence checking, while the others are more
specific and gather genes belonging to well-known families (rRNA,
hsp65,
sod, and
rpoB genes). Free submission of sequences to
general data banks leads to frequent omissions or errors, so
inaccuracies related to the direct extraction of the sequences
from GenBank may appear (
6). Also, many sequences have uninformative
definitions. To keep out those inaccuracies, analysis and sequence
checking are mandatory. This led to a second type of database.
Our improved database results from expertise in crossing the
data nomenclature database DSMZ (http://www.dsmz.de/) and a
version of GenBank structured with the ACNUC database manager
system (
4). For each valid species name, an extraction with
ACNUC was performed for each gene to build a nomenclature-driven
sequence database. We eliminated all the sequences that appeared
under uninformative names. Sequences described with basonyms
or bacterial names that are usually used without standing in
nomenclature are nevertheless extracted thanks to the National
Center for Biotechnology Information taxonomy database. All
annotations are scanned in order to extract various information
related to the sequence. To adapt these databases to the bacterial
identification framework, a search of the species type strain
numbers in all annotations is performed to identify type strain
sequences. All the sequences with varied information are stored
in an object-relational database. Thus, we have random access
to the inventory of the sequences which exist in a database
by genus, species, or genes. For example, users may scan the
list of missing species impairing identification of bacteria.
This database is regularly updated. Of course, the use of smaller
and cleaner gene databases reduces the time required for BIBI
searches: several seconds. Two kinds of databases are thus available
on BIBI: complete databases and databases adapted to bacterial
identification.
The interest of BIBI lies in the integration of well-known tools to automate the bacterial identification process. Homologous segment pairs identified by BLAST are prealigned, allowing faster multiple alignment with CLUSTAL W. The table of sorted phylogenetic distances computed by CLUSTAL W simplifies the reading of the results compared to direct reading of a BLAST file. The clean databases used by BIBI are adapted to bacterial identification. This guarantees unequivocal results. BIBI is a simple and user-friendly data-processing tool, well adapted to the identification of cultured bacteria in a clinical bacteriology laboratory. In the near future, we wish to complete databases for bacteria of medical interest and also to consider the use of a decision-making tool as an aid during identification.

FOOTNOTES
* Corresponding author. Mailing address: UMR CNRS 5558, Laboratoire de Bactériologie, Faculté de Médecine Lyon-Sud, BP 12, 69921 Oullins Cedex, France. Phone: 33-4-7886-3167. Fax: 33-4-7886-3149. E-mail:
devulder{at}biomserv.univ-lyon1.fr.


REFERENCES
1 - Altschul, S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25:3389-3402.[Abstract/Free Full Text]
2 - Kolbert, C. P., and D. H. Persing. 1999. Ribosomal DNA sequencing as a tool for identification of bacterial pathogens. Curr. Opin. Microbiol. 2:299-305.[CrossRef][Medline]
3 - Morgenstern, B., K. Frech, A. Dress, and T. Werner. 1998. DIALIGN: finding local similarities by multiple sequence alignment. Bioinformatics 14:290-294.[Abstract/Free Full Text]
4 - Perrière, G., and M. Gouy. 1996. WWW-Query: an on-line retrieval system for biological sequence banks. Biochimie 78:364-369.[Medline]
5 - Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673-4680.[Abstract/Free Full Text]
6 - Turenne, C. Y., L. Tschetter, J. Wolfe, and A. Kabani. 2001. Necessity of quality-controlled 16S rRNA gene sequence databases: identifying nontuberculous Mycobacterium species. J. Clin. Microbiol. 39:3637-3648.[Abstract/Free Full Text]
7 - Wheeler, D. L., D. M. Church, A. E. Lash, D. D. Leipe, T. L. Madden, J. U. Pontius, G. D. Schuler, L. M. Schriml, T. A. Tatusova, L. Wagner, and B. A. Rapp. 2001. Databases resources of the National Center for Bio/Technology Information. Nucleic Acids Res. 29:11-16.[Abstract/Free Full Text]
8 - Zmasek, C. M., and S. R. Eddy. 2001. ATV: display and manipulation of annotated phylogenetic trees. Bioinformatics 17:383-384.[Abstract/Free Full Text]
Journal of Clinical Microbiology, April 2003, p. 1785-1787, Vol. 41, No. 4
0095-1137/03/$08.00+0 DOI: 10.1128/JCM.41.4.1785-1787.2003
Copyright © 2003, American Society for Microbiology. All Rights Reserved.
This article has been cited by other articles:
-
Betran, A., Rezusta, A., Lezcano, M. A., Villuendas, M. C., Revillo, M. J., Boiron, P., Rodriguez-Nava, V.
(2009). First Spanish Case of Nocardiosis Caused by Nocardia takedensis. J. Clin. Microbiol.
47: 1918-1919
[Abstract]
[Full Text]
-
Mendes, R. E., Denys, G. A., Fritsche, T. R., Jones, R. N.
(2009). Case Report of Aurantimonas altamirensis Bloodstream Infection. J. Clin. Microbiol.
47: 514-515
[Full Text]
-
Jurado, V., Boiron, P., Kroppenstedt, R. M., Laurent, F., Couble, A., Laiz, L., Klenk, H.-P., Gonzalez, J. M., Saiz-Jimenez, C., Mouniee, D., Bergeron, E., Rodriguez-Nava, V.
(2008). Nocardia altamirensis sp. nov., isolated from Altamira cave, Cantabria, Spain. Int. J. Syst. Evol. Microbiol.
58: 2210-2214
[Abstract]
[Full Text]
-
Lamy, B., Marchandin, H., Hamitouche, K., Laurent, F.
(2008). Mycobacterium setense sp. nov., a Mycobacterium fortuitum-group organism isolated from a patient with soft tissue infection and osteitis. Int. J. Syst. Evol. Microbiol.
58: 486-490
[Abstract]
[Full Text]
-
Mignard, S., Flandrois, J.-P.
(2007). Identification of Mycobacterium using the EF-Tu encoding (tuf) gene and the tmRNA encoding (ssrA) gene. J Med Microbiol
56: 1033-1041
[Abstract]
[Full Text]
-
Rodriguez-Nava, V., Khan, Z. U., Potter, G., Kroppenstedt, R. M., Boiron, P., Laurent, F.
(2007). Nocardia coubleae sp. nov., isolated from oil-contaminated Kuwaiti soil. Int. J. Syst. Evol. Microbiol.
57: 1482-1486
[Abstract]
[Full Text]
-
Thies, F. L., Konig, W., Konig, B.
(2007). Rapid characterization of the normal and disturbed vaginal microbiota by application of 16S rRNA gene terminal RFLP fingerprinting. J Med Microbiol
56: 755-761
[Abstract]
[Full Text]
-
Hanekamp, K., Bohnebeck, U., Beszteri, B., Valentin, K.
(2007). PhyloGena a user-friendly system for automated phylogenetic annotation of unknown sequences. Bioinformatics
23: 793-801
[Abstract]
[Full Text]
-
Hamdad, F., Vidal, B., Douadi, Y., Laurans, G., Canarelli, B., Choukroun, G., Rodriguez-Nava, V., Boiron, P., Beaman, B., Eb, F.
(2007). Nocardia nova as the Causative Agent in Spondylodiscitis and Psoas Abscess. J. Clin. Microbiol.
45: 262-265
[Abstract]
[Full Text]
-
Hamdad, F., Vidal, B., Douadi, Y., Laurans, G., Canarelli, B., Choukroun, G., Rodriguez-Nava, V., Boiron, P., Beaman, B., Eb, F.
(2007). Nocardia nova as the Causative Agent in Spondylodiscitis and Psoas Abscess. J. Clin. Microbiol.
45: 262-265
[Abstract]
[Full Text]
-
Arigon, A.-M., Perriere, G., Gouy, M.
(2006). HoSeqI: automated homologous sequence identification in gene family databases. Bioinformatics
22: 1786-1787
[Abstract]
[Full Text]
-
Rodriguez-Nava, V., Couble, A., Devulder, G., Flandrois, J.-P., Boiron, P., Laurent, F.
(2006). Use of PCR-Restriction Enzyme Pattern Analysis and Sequencing Database for hsp65 Gene-Based Identification of Nocardia Species. J. Clin. Microbiol.
44: 536-546
[Abstract]
[Full Text]
-
Rodriguez-Nava, V., Couble, A., Khan, Z. U., Perouse de Montclos, M., Brasme, L., Villuendas, C., Molinard, C., Boiron, P., Laurent, F.
(2005). Nocardia ignorata, a New Agent of Human Nocardiosis Isolated from Respiratory Specimens in Europe and Soil Samples from Kuwait. J. Clin. Microbiol.
43: 6167-6170
[Abstract]
[Full Text]
-
Barnaud, G., Deschamps, C., Manceron, V., Mortier, E., Laurent, F., Bert, F., Boiron, P., Vinceneux, P., Branger, C.
(2005). Brain Abscess Caused by Nocardia cyriacigeorgica in a Patient with Human Immunodeficiency Virus Infection. J. Clin. Microbiol.
43: 4895-4897
[Abstract]
[Full Text]
-
Clarridge, J. E. III
(2004). Impact of 16S rRNA Gene Sequence Analysis for Identification of Bacteria on Clinical Microbiology and Infectious Diseases. Clin. Microbiol. Rev.
17: 840-862
[Abstract]
[Full Text]
-
Hill, J. E., Penny, S. L., Crowell, K. G., Goh, S. H., Hemmingsen, S. M.
(2004). cpnDB: A Chaperonin Sequence Database. Genome Res
14: 1669-1675
[Abstract]
[Full Text]
-
McNabb, A., Eisler, D., Adie, K., Amos, M., Rodrigues, M., Stephens, G., Black, W. A., Isaac-Renton, J.
(2004). Assessment of Partial Sequencing of the 65-Kilodalton Heat Shock Protein Gene (hsp65) for Routine Identification of Mycobacterium Species Isolated from Clinical Sources. J. Clin. Microbiol.
42: 3000-3011
[Abstract]
[Full Text]
-
Heritier, C., Poirel, L., Nordmann, P.
(2004). Genetic and Biochemical Characterization of a Chromosome-Encoded Carbapenem-Hydrolyzing Ambler Class D {beta}-Lactamase from Shewanella algae. Antimicrob. Agents Chemother.
48: 1670-1675
[Abstract]
[Full Text]