Abstract

The focus of the 2nd International Workshop on Data and Text Mining in Bioinformatics is to bring together researchers who work in the field of data mining, text mining, and computational biology for integrating heterogeneous structured and unstructured biologically related data. This is a challenging and rewarding task. In particular we hope to seek through this workshop: Data and text mining solutions for Bioinformatics that identify relevant background knowledge in textual documents, such as scientific publications, or in database annotations. Such approaches currently being studied range from term recognition to extraction of complex relationships of interaction between proteins.Ambitious knowledge discovery solutions that process heterogeneous biological and biomedical data collected from electronic bulletin boards, scientific publications, and any type of experiments. It is important to identify and solve issues as to how to consolidate information extracted from textual documents into other types of data in structured form. We are honored to have two international recognized investigators to deliver their keynote speeches at DTMBio. The speakers are: Dr. Alfonso Valencia from National Centre for Biotechnology, Madrid and Professor Russ B. Altman at Stanford. Among the 17 submissions that we received, we have accepted 8 as full papers and 4 as short papers. Among the 8 full papers, we have three papers that are related to Microarray data analysis. In the paper "Identification of Temporal Association Rules from Time-series Microarray Data Set", Hojung Nam, KiYoung Lee, and Doheon Lee proposed an association rule mining (ARM) method, referred to as temporal association rule mining (TARM), to analyze microarray gene expression data. The TARM can extract temporal dependencies among related genes. Working on Microarray database, in the paper "Microarray Data Analysis with PCA in a DBMS", Waree Rinsurongkawong and Carlos Ordonez focus on analyzing microarray data sets inside the DBMS by applying the Householder tridiagonalization and QR factorization numerical methods to solve SVD inside the DBMS. Brian Quanz, Meeyoung Park, and Luke Huan in their paper "Biological Pathways as Features for Microarray Data Classification" also discuss microarray data, in a supervised learning environment. The authors proposed several algorithms to utilize biological pathways as features for microarray data classification. We have two papers falling into the category of text mining for biological and biomedical data. In the paper "Passage Relevance Models for Genomics Search", towards the integration of semantic and statistical evidence of biomedical concepts, Jay Urbain Ophir Frieder and Nazli Goharian presented a passage relevance model using the framework of a probabilistic graphical model. Identifying evidence from literature about biomoelcule interactions is important. Towards that end, Timur Fayruzov, Martine De Cock, Chris Cornelis, and Veronique Hoste demonstrated the application of the support vector machine approach with both lexical and syntactic features for protein interaction mining from texts. This is discussed in the paper "The Role of Syntactic Features in Protein Interaction Extraction". Genomics and proteomics are active research areas that produce large amounts of data. We have three papers in mining genomics and proteomics data. In their paper "Peptide Programs: Applying Fragment Programs to Protein Classification", Andre O. Falcao, Daniel Faria and António Ferreira discussed an automated approach for protein functional prediction and classification using fragment programs. Evaluating statistical significance of alignment is of central importance for bioinformatics. In their paper "Pairwise Statistical Significance of Local Sequence Alignment Using Multiple Parameter Sets", Ankit Agrawal and Xiaoqiu Huang demonstrate that a new approach, based on multiple parameter sets, is significantly better than widely used BLAST method. Metastasis is the most dangerous step in cancer progression and causes more than 90% of cancer death. In their paper "Mining Metastasis Related Genes by Primary-Secondary Tumor Comparisons from Large-Scale Database", Sangwoo Kim and Doheon Lee have conducted comparisons between primary tumors and secondary metastatic tumors. A new method has been developed for identifying genes and pathways which secure metastasis dependency and are free of metastasis independent features. In addition, we have accepted 4 short papers for gene symbol disambiguation, protein-protein interaction prediction, gene regulatory mechanisms identification with data integration, and population genetics analysis of genotypes. We are very pleased to hold such a diverse and highly remarkable workshop where researchers from many different fields contribute to and interact with one another. We hope to continue to grow and bring more researchers who are interested in applying data and text mining techniques to important biological research problems.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.