Abstract

Many medical applications and current ongoing medical research depend on text mining techniques. It is estimated that 90 % of all data is unstructured, such as emails, voice or video records, data streams, and Word documents. In the last decade, the estimated growth of unstructured data is about 62 %, whereas the amount of structured data has grown only by 22 %. In this chapter we therefore overview some methods and tools that enable researchers to automatically retrieve, extract, and integrate unstructured medical data. Due to increasing number of unstructured documents, the automatic text mining methods ease access to relevant data, already conducted research along with its results, and save money by trying to eliminate repeated research experiments. Natural language processing is lately receiving a lot of attention because researchers are trying to adapt techniques from other domains to work on biomedical data. We focus especially on methods from the fields of (1) information retrieval (indexing, searching, and retrieval of relevant documents given an input query), (2) information extraction (automatic extraction of structured data from unstructured sources with the main tasks of named entity recognition, relationship extraction, and coreference resolution), and (3) data integration (data merging and redundancy elimination).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call