Abstract

Accurate Medical Subject Headings (MeSH)annotation is an important issue for researchers in terms of effective information retrieval and knowledge discovery in the biomedical literature. We have developed a powerful dual triggered correspondence topic (DTCT)model for MeSH annotated articles. In our model, two types of data are assumed to be generated by the same latent topic factors and words in abstracts and titles serve as descriptions of the other type, MeSH terms. Our model allows the generation of MeSHs in abstracts to be triggered either by general document topics or by document-specific "special" word distributions in a probabilistic manner, allowing for a trade-off between the benefits of topic-based abstraction and specific word matching. In order to relax the topic influences of non-topical words or domain-frequent words in text description, we integrated the discriminative feature of Okapi BM25 into word sampling probability. This allows the model to choose keywords, which stand out from others, in order to generate MeSH terms. We further incorporate prior knowledge about relations between word and MeSH in DTCT with phi-coefficient to improve topic coherence. We demonstrated the model's usefulness in automatic MeSH annotation. Our model obtained 0.62 F-score 150,00 MEDLINE test set and showed a strength in recall rate. Specially, it yielded competitive performances in an integrated probabilistic environment without additional post-processing for filtering MeSHs.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.