Abstract
Getting started in text mining.
Highlights
Text mining is the use of automated methods for exploiting the enormous amount of knowledge available in the biomedical literature
Text mining specialists are more likely to build systems that are likely to get them published in computational linguistics conferences
Biologists seem to be better at one of the crucial first steps identified above: defining the goals of the system, and not hesitating to define those goals based on utility, rather than on presumed publishability in the computational linguistics literature
Summary
Text mining is the use of automated methods for exploiting the enormous amount of knowledge available in the biomedical literature. Breast cancer could be referred to as breast cancer, carcinoma of the breast, or mammary neoplasm These variability issues challenge more sophisticated systems, as well; we discuss ways of coping with them in Text S1. (See [3] for an early rule-based system, and [4] for a discussion of rule-based approaches to various biomedical text mining tasks.) At one end of the spectrum, a simple rule-based system might use hardcoded patterns—for example, ,gene. The former is a cadhedrin, and is associated with tumor suppression and with bipolar disorder, while the latter is a thrombospondin receptor associated with atherosclerosis, platelet glycoprotein deficiency, hyperlipidemia, and insulin resistance, to name just a few phenotypes These ambiguities are not trivial: if your analysis is wrong, you miss or erroneously extract information on relations between molecular biology and human disease. A third approach— post-hoc judging of system outputs— will often suffice for publication, but is often not practical for system development since it cannot be repeated quickly and frequently
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.