One of the major challenges in automating health insurance claims processing lies in the complexity involved in validating an incoming claim's medical diagnoses against its policy Underwriting (UW) exclusions. Termed UW Exclusion Detection, this process ensures claims are only paid out if their diagnoses are not medically associated with conditions excluded under the policy. Medical diagnoses in health insurance claims are typically represented by the International Classification of Disease (ICD) codes, established by the World Health Organization. For example, given a policy that excludes "all respiratory illness". A claim with the ICD code J45 (Asthma) will be subject to rejection as J45 is a respiratory-related diagnosis that falls within the scope of the policy's UW exclusion. The key challenge in automating this process lies in the wide range of available ICD codes. The ICD-10-CM coding scheme consists of over 40,000 codes, which often results in scenarios where codes encountered during inference are absent from the training data. These unseen ICD codes limit the effectiveness of data-driven approaches, which depend on the training data to discern medically relevant associations between UW exclusions and ICD codes. This underscores the need to supplement data-driven approaches with additional domain knowledge. We hypothesize that integrating implicit medical domain knowledge inherent in Large Language Models (LLMs) with explicit domain knowledge from medical ontologies, will enhance data-driven approaches for UW Exclusion Detection. Thoroughly validated on real-world health insurance claims data, our proposed approach proved effective in accurately establishing medically relevant associations between UW exclusions and ICD codes.
Read full abstract