Abstract

This paper introduces a statistical framework for extracting medical domain knowledge from heterogeneous corpora. The acquired information is incorporated into a natural language understanding agent and applied to DIKTIS, an existing web-based educational dialogue system for the chemotherapy of nosocomial and community acquired pneumonia, aiming at providing a more intelligent natural language interaction. Unlike the majority of existing dialogue understanding engines, the presented system automatically encodes semantic representation of a user's query using Bayesian networks. The structure of the networks is determined from annotated dialogue corpora using the Bayesian scoring method, thus eliminating the tedious and costly process of manually coding domain knowledge. The conditional probability distributions are estimated during a training phase using data obtained from the same set of dialogue acts. In order to cope with words absent from our restricted dialogue corpus, a separate offline module was incorporated, which estimates their semantic role from both medical and general raw text corpora, correlating them with known lexical-semantically similar words or predefined topics. Lexical similarity is identified on the basis of both contextual similarity and co-occurrence in conjunctive expressions. The evaluation of the platform was performed against the existing language natural understanding module of DIKTIS, the architecture of which is based on manually embedded domain knowledge.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.