Abstract

Medical terms occur across a wide variety of legal, medical, and news corpora. Documents containing these terms are of particular interest to legal professionals operating in such fields as medical malpractice, personal injury, and product liability. This paper describes a novel method of tagging medical terms in legal, medical, and news text that is very fast and also has high recall and precision. To date, most research in medical term spotting has been confined to medical text and has approached the problem by extracting noun phrases from sentences and mapping them to a list of medical concepts via a fuzzy lookup. The medical term tagging described in this paper relies on a fast finite state machine that finds within sentences the longest contiguous sets of words associated with medical terms in a medical term authority file, converts word sets into medical term hash keys, and looks up medical concept ids associated with the hash keys. Additionally our system relies on a probabilistic term classifier that uses local context to disambiguate terms being used in a medical sense from terms being used in a non-medical sense. Our method is two orders of magnitude faster than an approach based on noun phrase extraction and has better precision and recall for terms pertaining to injuries, diseases, drugs, medical procedures, and medical devices. The methods presented here have been implemented and are the core engines for a Thomson West product called the Medical Litigator. Thus far, the Medical Litigator has processed over 100 million documents and generated over 165 million tags representing approximately 164,000 unique medical concepts. The resulting system is very fast and posted a recall from 0.79 to 0.93 and precision between 0.94 and 0.97, depending on the document type.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.