The impact of software development methodologies on quality metrics is a crucial area of study in empirical software engineering. This research evaluates the performance of three deep learning architectures: Multi-Layer Perceptron (MLP), Convolutional Neural Network (CNN), and Long Short-Term Memory (LSTM), in predicting key software quality indicators, including maintainability index, test coverage, and code complexity, for projects developed using Behavior-Driven Development (BDD) and Test-Driven Development (TDD) approaches. Using a static tabular dataset containing software quality metrics, the models are evaluated based on Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and the R^2 coefficient. The MLP achieves the best performance, with the lowest RMSE (6.41) and MAE (6.34) and the highest R^2 value (−4.21), demonstrating its suitability for tabular data. The CNN performs moderately, while the LSTM underperforms due to its reliance on temporal dependencies absent from the dataset. These results emphasize the need for careful architectural alignment with dataset characteristics. The findings contribute to understanding the predictive power of deep learning models in software quality analysis and highlight the potential of MLP as a robust tool for such predictions. Future work can explore hybrid models and domain-specific feature engineering to enhance prediction accuracy.
Read full abstract7-days of FREE Audio papers, translation & more with Prime
7-days of FREE Prime access