Pulmonary Embolism (PE) is a life-threatening clinical disease with no specific clinical symptoms and Computed Tomography Angiography (CTA) is used for diagnosis. Clinical decision support scoring systems like Wells and rGeneva based on PE risk factors have been developed to estimate the pre-test probability but are underused, leading to continuous overuse of CTA imaging. This diagnostic study aimed to propose a novel approach for efficient management of PE diagnosis using a two-step interconnected machine learning framework directly by analyzing patients' Electronic Health Records data. First, we performed feature importance analysis according to the result of LightGBM superiority for PE prediction, then four state-of-the-art machine learning methods were applied for PE prediction based on the feature importance results, enabling swift and accurate pre-test diagnosis. Throughout the study patients' data from different departments were collected from Sina educational hospital, affiliated with the Tehran University of medical sciences in Iran. Generally, the Ridge classification method obtained the best performance with an F1 score of 0.96. Extensive experimental findings showed the effectiveness and simplicity of this diagnostic process of PE in comparison with the existing scoring systems. The main strength of this approach centered on PE disease management procedures, which would reduce avoidable invasive CTA imaging and be applied as a primary prognosis of PE, hence assisting the healthcare system, clinicians, and patients by reducing costs and promoting treatment quality and patient satisfaction.
Read full abstract