Healthcare industry has many associated services including research on various trends or patterns in diseases and patients’ life style. With the emergence of Artificial Intelligence (AI), it is made possible that problems in healthcare domain can be solved by using Machine Learning (ML) techniques. One such problem considered in this paper is known as clinical document classification. Existing methods in this area lack a systematic approach in filtering out false positives. In this paper we proposed a ML framework that considers pipelining of ML models at multiple levels. In the first level, clinical documents that do not have any content related to smoking are discarded. In the second level, the documents that talk about known smoking cases are retained. In the third level clinical document are classified into two categories such as currently smoking and past smokers. We proposed an algorithm known as Learning based Clinical Document Classification (LbCDC). This algorithm makes use of three models in pipeline in order to perform classification of clinical documents at multiple levels of granularity. Our experimental results revealed that the proposed system is efficient in clinical document classification.
Read full abstract