Accurate prediction of cardiovascular disease (CVD) requires multifaceted information consisting of not only a patient’s medical history, but genomic data, symptoms, lifestyle, and risk factors which are often not incorporated into a decision making process as the data are vast, difficult to obtain, and require complex algorithms. However, given the vast amount of publicly available information and recent advances in machine and deep learning techniques, acquiring multifaceted data for accurate diagnosis and medical decision making via artificial intelligence has become more attainable. In this work we illustrate a literature embedding model, which identifies various risk factors, symptoms, mechanisms, and genes associated with CVD using a query word (i.e., “stroke”), and then based on the collected information the model predicts whether or not a person will be susceptible to CVD. For this purpose, we collected published literature from PubMed using search keywords consisting of a word such as “heart” and 19,264 human gene names, then trained our literature embedding model using the collected abstracts. For the intrinsic evaluation, we analyzed whether or not the captured words and genes were correctly identified as risk factors and associated symptoms for the input query words. For the extrinsic evaluation, we used our embedding model as feature selection and dimensionality reduction tasks on cohort data for CVD prediction. Our model accurately (average accuracy of >96 %) captured associated risk factors, symptoms, and genes for a given input query word (intrinsic evaluation). Using the selected features and reduced dimensions, our method provided better performance for CVD prediction with less computational time when compared with other popular methods (extrinsic evaluation). Our model provided outstanding results for both evaluation tasks, which enables accurate risk factor identification for CVD prediction. Our model has the potential to facilitate easier collation of multifaceted information for better data mining of vast publicly available data so that efficient and accurate risk factors and symptoms can be identified, which enables better-informed decisions for CVD prediction and treatment.