Background and objective:Valvular heart disease (VHD) is associated with elevated mortality rates. Although transthoracic echocardiography (TTE) is the gold standard detection tool, phonocardiography (PCG) could be an alternative as it is a cost-effective and noninvasive method for cardiac auscultation. Many researchers have dedicated their efforts to improving the decision-making process and developing robust and precise approaches to assist physicians in providing reliable diagnoses of VHD. Methods:This research proposes a novel approach for the detection of anomalous valvular heart sounds from PCG signals. The proposed approach combines orthogonal non-negative matrix factorization (ONMF) and convolutional neural network (CNN) architectures in a three-stage cascade. The aim of the proposal is to improve the learning process by identifying the optimal ONMF temporal or spectral patterns for accurate detection. In the first stage, the time–frequency representation of the input PCG signal is computed. Next, band-pass filtering is performed to locate the spectral range that is most relevant for the presence of such cardiac abnormalities. In the second stage, the temporal and spectral cardiac structures are extracted using the ONMF approach. These structures are utilized in the third stage and fed into the CNN architecture to detect abnormal heart sounds. Results:Several state-of-the-art CNN architectures, such as LeNet5, AlexNet, ResNet50, VGG16 and GoogLeNet, have been evaluated to determine the effectiveness of using ONMF temporal features for VHD detection. The results reveal that the integration of ONMF temporal features with a CNN classifier significantly improve VHD detection. Specifically, the proposed approach achieves an accuracy improvement of approximately 45% when ONMF spectral features are used and 35% when time–frequency features from the short-time Fourier transform (STFT) spectrogram are used. Additionally, feeding ONMF temporal features into low-complexity CNN architectures yields competitive results comparable to those obtained with complex architectures. Conclusions:The temporal structure factorized by ONMF plays a critical role in distinguishing between normal heart sounds and abnormal heart sounds since the repeatability of normal heart cycles is disrupted by the presence of cardiac abnormalities. Consequently, the results highlight the importance of appropriate input data representation in the learning process of CNN models in the biomedical field of valvular heart sound detection.