Abstract
Problem statement: Due to the ever growing amount of biomedical datasets stored in multiple tables, Information Extraction (IE) from these datasets is increasingly recognized as one of the crucial technologies in bioinformatics. However, for IE to be practically applicable, adaptability of a system is crucial, considering extremely diverse demands in biomedical IE application. One should be able to extract a set of hidden patterns from these biomedical datasets at low cost. Approach: In this study, a new method is proposed, called Bio-medical Data Aggregation for Relational Attributes (BioDARA), for automatic structuring information extraction for biomedical datasets. BioDARA summarizes biomedical data stored in multiple tables in order to facilitate data modeling efforts in a multi-relational setting. BioDARA has the advantages or capabilities to transform biomedical data stored in multiple tables or databases into a Vector Space model, summarize biomedical data using the Information Retrieval theory and finally extract frequent patterns that describe the characteristics of these biomedical datasets. Results: the results show that data summarization performed by DARA, can be beneficial in summarizing biomedical datasets in a complex multi-relational environment, in which biomedical datasets are stored in a multi-level of one-to-many relationships and also in the case of datasets stored in more than one one-to-many relationships with non-target tables. Conclusion: This study concludes that data summarization performed by BioDARA, can be beneficial in summarizing biomedical datasets in a complex multi-relational environment, in which biomedical datasets are stored in a multi-level of one-to-many relationships.
Highlights
Biomedical information extraction from structured biomedical data stored in relational databases refers to data summarization applied to relational biomedical data
Despite the increase in volume of biomedical datasets stored in relational databases, only few studies handle clustering across multiple relations (Kirsten and Wrobel, 1998; 2000)
Solving the multiple instance problem with vector space model that is suitable to clustering operations, as a means of aggregating or summarizing multiple instances
Summary
Biomedical information extraction from structured biomedical data stored in relational databases refers to data summarization applied to relational biomedical data. We transform the data representation in a multi-relational environment into a vector space model suitable or applicable to clustering operation By clustering these objects, one can group bags with multiple instances that have similar characteristics that can be extracted, as an interpretable rule to describe the cluster’s behaviors. Each cluster can generate more information by looking at the most frequent patterns that describe each cluster In this experiment, we employ an algorithm, called DARA that converts the dataset representation in relational model into a space vector model and use a distanced-based method to group objects with multiple representations occurrence. These terms are BOND_TYPE), where each molecule is described encoded based on the number of attributes combined, p over several rows, listing all pairs of atoms with a and represent instances stored in the non-target table bond and the type of each atom and the type of referred by a record stored in the target table
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.