KID - an algorithm for fast and efficient text mining used to automatically generate a database containing kinetic information of enzymes

Stephanie Heinen,Dietmar Schomburg,Bernhard Thielen

doi:10.1186/1471-2105-11-375

Abstract

BackgroundThe amount of available biological information is rapidly increasing and the focus of biological research has moved from single components to networks and even larger projects aiming at the analysis, modelling and simulation of biological networks as well as large scale comparison of cellular properties. It is therefore essential that biological knowledge is easily accessible. However, most information is contained in the written literature in an unstructured way, so that methods for the systematic extraction of knowledge directly from the primary literature have to be deployed.DescriptionHere we present a text mining algorithm for the extraction of kinetic information such as KM, Ki, kcat etc. as well as associated information such as enzyme names, EC numbers, ligands, organisms, localisations, pH and temperatures. Using this rule- and dictionary-based approach, it was possible to extract 514,394 kinetic parameters of 13 categories (KM, Ki, kcat, kcat/KM, Vmax, IC50, S0.5, Kd, Ka, t1/2, pI, nH, specific activity, Vmax/KM) from about 17 million PubMed abstracts and combine them with other data in the abstract.A manual verification of approx. 1,000 randomly chosen results yielded a recall between 51% and 84% and a precision ranging from 55% to 96%, depending of the category searched.The results were stored in a database and are available as "KID the KInetic Database" via the internet.ConclusionsThe presented algorithm delivers a considerable amount of information and therefore may aid to accelerate the research and the automated analysis required for today's systems biology approaches. The database obtained by analysing PubMed abstracts may be a valuable help in the field of chemical and biological kinetics. It is completely based upon text mining and therefore complements manually curated databases.The database is available at http://kid.tu-bs.de. The source code of the algorithm is provided under the GNU General Public Licence and available on request from the author.

Highlights

The amount of available biological information is rapidly increasing and the focus of biological research has moved from single components to networks and even larger projects aiming at the analysis, modelling and simulation of biological networks as well as large scale comparison of cellular properties
The presented algorithm delivers a considerable amount of information and may aid to accelerate the research and the automated analysis required for today's systems biology approaches
The short overall calculation time of the KID text mining algorithm and the resulting database prove evidence, that the presented algorithm can be a helpful tool for the annotation and collection of data for other databases like BRENDA

Summary

Introduction

The amount of available biological information is rapidly increasing and the focus of biological research has moved from single components to networks and even larger projects aiming at the analysis, modelling and simulation of biological networks as well as large scale comparison of cellular properties. Several databases are available providing information about enzymes and their characteristics like e.g. BRENDA [2,3,4] with currently 92,291 entries for KM, 32,484 for kcat, 21,833 for Ki and 33,372 for specific activity [2], Kinetikon [5], KMedDB [6], KDBI [7], DOQCS [8], SABIO-RK [9] and IUPAC-kinetic [10], respectively These databases are far from complete, forcing scientists to a time consuming manual extraction of values from the literature if a systematic research approach is followed. Current algorithms include machine learning (e.g. Kinetikon [5]), statistic (e.g. FRENDA and AMENDA [3]), rule-based (KiPar [18] and BioRAT [19]) and mixed approaches (SUISEKI [20]).

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Jul 13, 2010
Citations: 30	License type: cc-by

R Discovery Prime

R Discovery Prime

KID - an algorithm for fast and efficient text mining used to automatically generate a database containing kinetic information of enzymes

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

How artificial intelligence enables modeling and simulation of biological networks to accelerate drug discovery
Mauro Dinuzzo
Frontiers in Drug Discovery | VOL. 2
Mauro DinuzzoMauro Dinuzzo
04 Oct 2022
Frontiers in Drug Discovery | VOL. 2

Petri net models for the semi-automatic construction of large scale biological networks
Ming Chen ... Sridhar Hariharaputran
Natural Computing | VOL. 10
Ming Chen, et. al.Ming Chen ... Sridhar Hariharaputran
12 Sep 2009
Natural Computing | VOL. 10

A Study and Analysis of Gene Drug Association for Diabetic Gene - A Text Mining Approach
Kirthika Bakthavatsalam ... V Bhuvaneswari
-
Kirthika Bakthavatsalam, et. al.Kirthika Bakthavatsalam ... V Bhuvaneswari
01 Mar 2014
01 Mar 2014

Petri net modeling and simulation of biological networks
-
-
--
06 Jul 2022
06 Jul 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

KID - an algorithm for fast and efficient text mining used to automatically generate a database containing kinetic information of enzymes

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics