BioDARA: Data Summarization Approach to Extracting Bio-Medical Structuring Information

Kheau Kheau

doi:10.3844/jcssp.2011.1914.1920

Abstract

Problem statement: Due to the ever growing amount of biomedical datasets stored in multiple tables, Information Extraction (IE) from these datasets is increasingly recognized as one of the crucial technologies in bioinformatics. However, for IE to be practically applicable, adaptability of a system is crucial, considering extremely diverse demands in biomedical IE application. One should be able to extract a set of hidden patterns from these biomedical datasets at low cost. Approach: In this study, a new method is proposed, called Bio-medical Data Aggregation for Relational Attributes (BioDARA), for automatic structuring information extraction for biomedical datasets. BioDARA summarizes biomedical data stored in multiple tables in order to facilitate data modeling efforts in a multi-relational setting. BioDARA has the advantages or capabilities to transform biomedical data stored in multiple tables or databases into a Vector Space model, summarize biomedical data using the Information Retrieval theory and finally extract frequent patterns that describe the characteristics of these biomedical datasets. Results: the results show that data summarization performed by DARA, can be beneficial in summarizing biomedical datasets in a complex multi-relational environment, in which biomedical datasets are stored in a multi-level of one-to-many relationships and also in the case of datasets stored in more than one one-to-many relationships with non-target tables. Conclusion: This study concludes that data summarization performed by BioDARA, can be beneficial in summarizing biomedical datasets in a complex multi-relational environment, in which biomedical datasets are stored in a multi-level of one-to-many relationships.

Highlights

Biomedical information extraction from structured biomedical data stored in relational databases refers to data summarization applied to relational biomedical data
Despite the increase in volume of biomedical datasets stored in relational databases, only few studies handle clustering across multiple relations (Kirsten and Wrobel, 1998; 2000)
Solving the multiple instance problem with vector space model that is suitable to clustering operations, as a means of aggregating or summarizing multiple instances

Summary

INTRODUCTION

Biomedical information extraction from structured biomedical data stored in relational databases refers to data summarization applied to relational biomedical data. We transform the data representation in a multi-relational environment into a vector space model suitable or applicable to clustering operation By clustering these objects, one can group bags with multiple instances that have similar characteristics that can be extracted, as an interpretable rule to describe the cluster’s behaviors. Each cluster can generate more information by looking at the most frequent patterns that describe each cluster In this experiment, we employ an algorithm, called DARA that converts the dataset representation in relational model into a space vector model and use a distanced-based method to group objects with multiple representations occurrence. These terms are BOND_TYPE), where each molecule is described encoded based on the number of attributes combined, p over several rows, listing all pairs of atoms with a and represent instances stored in the non-target table bond and the type of each atom and the type of referred by a record stored in the target table

B2: Continuous values about the charge of atoms are DISCUSSION

CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Computer Science	Publication Date: Dec 1, 2011
Citations: 3	License type: cc-by

R Discovery Prime

R Discovery Prime

BioDARA: Data Summarization Approach to Extracting Bio-Medical Structuring Information

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Computer Science

Lead the way for us

Similar Papers

Rules Extraction Based on Data Summarisation Approach Using DARA
Rayner Alfred
-
Rayner AlfredRayner Alfred
08 Oct 2008
08 Oct 2008

An Effective Transition-based Model for Discontinuous NER
Xiang Dai ... Cecile Paris
-
Xiang Dai, et. al.Xiang Dai ... Cecile Paris
01 Jan 2020
01 Jan 2020

Examining Effects of the Support Vector Machines Kernel Types on Biomedical Data Classification
Ibrahim Berkan Aydilek
-
Ibrahim Berkan AydilekIbrahim Berkan Aydilek
01 Sep 2018
01 Sep 2018

RETRACTED ARTICLE: Automatic detection of lung cancer from biomedical data set using discrete AdaBoost optimized ensemble learning generalized neural networks
P Mohamed Shakeel ... Amr Tolba
Neural Computing and Applications | VOL. 32
P Mohamed Shakeel, et. al.P Mohamed Shakeel ... Amr Tolba
10 Jan 2019
Neural Computing and Applications | VOL. 32

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

BioDARA: Data Summarization Approach to Extracting Bio-Medical Structuring Information

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Computer Science