Cardiovascular diseases (CVD) are the leading cause of death worldwide. People affected by CVDs may go undiagnosed until the occurrence of a serious heart failure event such as stroke, heart attack, and myocardial infraction. In Qatar, there is a lack of studies focusing on CVD diagnosis based on non-invasive methods such as retinal image or dual-energy X-ray absorptiometry (DXA). In this study, we aimed at diagnosing CVD using a novel approach integrating information from retinal images and DXA data. We considered an adult Qatari cohort of 500 participants from Qatar Biobank (QBB) with an equal number of participants from the CVD and the control groups. We designed a case-control study with a novel multi-modal (combining data from multiple modalities—DXA and retinal images)—to propose a deep learning (DL)-based technique to distinguish the CVD group from the control group. Uni-modal models based on retinal images and DXA data achieved 75.6% and 77.4% accuracy, respectively. The multi-modal model showed an improved accuracy of 78.3% in classifying CVD group and the control group. We used gradient class activation map (GradCAM) to highlight the areas of interest in the retinal images that influenced the decisions of the proposed DL model most. It was observed that the model focused mostly on the centre of the retinal images where signs of CVD such as hemorrhages were present. This indicates that our model can identify and make use of certain prognosis markers for hypertension and ischemic heart disease. From DXA data, we found higher values for bone mineral density, fat content, muscle mass and bone area across majority of the body parts in CVD group compared to the control group indicating better bone health in the Qatari CVD cohort. This seminal method based on DXA scans and retinal images demonstrate major potentials for the early detection of CVD in a fast and relatively non-invasive manner.