Cardiovascular diseases (CVDs) remain a leading cause of mortality globally, accounting for approximately 17.9 million deaths annually. Traditional diagnostic methods, though useful, have limitations such as invasiveness, high cost, and delays in detecting early-stage heart conditions. This study presents a novel machine learning-based system for early heart disease detection using audio signal processing, specifically leveraging phonocardiogram (PCG) data. Features were extracted using Mel-Frequency Cepstral Coefficients (MFCCs), Delta MFCCs, and Delta-Delta MFCCs, followed by dimensionality reduction via Principal Component Analysis (PCA). Support Vector Machine (SVM) and XGBoost classifiers were used to analyze the extracted features, and their performance was optimized through an ensemble model using the Moth Flame Optimization (MFO) algorithm. The model was rigorously evaluated using accuracy, precision, recall, and F1-score metrics. The ensemble model achieved an accuracy of 99.13%, precision of 98.94%, recall of 95.05%, and an F1-score of 97.46%. The application of SMOTE for data augmentation significantly improved classification performance, highlighting its effectiveness in addressing class imbalance. The proposed system provides a non-invasive, cost-effective solution for heart disease detection and holds potential for improving diagnostic access, particularly in resource-limited settings.