The supervised machine learning method is often used for biomedical relationship extraction. The disadvantage is that it requires much time and money to manually establish an annotated dataset. Based on distant supervision, the knowledge base is combined with the corpus, thus, the training corpus can be automatically annotated. As many biomedical databases provide knowledge bases for study with a limited number of annotated corpora, this method is practical in biomedicine. The clinical significance of each patient’s genetic makeup can be understood based on the healthcare provider’s genetic database. Unfortunately, the lack of previous biomedical relationship extraction studies focuses on gene–gene interaction. The main purpose of this study is to develop extraction methods for gene–gene interactions that can help explain the heritability of human complex diseases. This study referred to the information on gene–gene interactions in the KEGG PATHWAY database, the abstracts in PubMed were adopted to generate the training sample set, and the graph kernel method was adopted to extract gene–gene interactions. The best assessment result was an F1-score of 0.79. Our developed distant supervision method automatically finds sentences through the corpus without manual labeling for extracting gene–gene interactions, which can effectively reduce the time cost for manual annotation data; moreover, the relationship extraction method based on a graph kernel can be successfully applied to extract gene–gene interactions. In this way, the results of this study are expected to help achieve precision medicine.
Read full abstract