Efficient management and prioritization of software requirements are critical challenges in agile projects, where requirements constantly evolve due to changing user needs, business goals, and regulatory updates. This paper explores the role of semantic feature extraction in enabling adaptive management strategies. Using the PROMISE Expanded Dataset and the Coquina Dataset, we employed TF-IDF weighted Word2Vec for advanced tokenization and feature extraction. Latent Dirichlet Allocation (LDA) was used to analyze how preprocessing steps like stop word removal impact topic representation, revealing that removing stop words improved topic specificity and coherence. To address class imbalance, Synthetic Minority Over-sampling Technique (SMOTE) was applied, enhancing the model's ability to handle underrepresented classes effectively. Principal Component Analysis (PCA) reduced the dimensionality of TF-IDF weighted Word2Vec embeddings from 100 features to 30, while Analysis of Variance (ANOVA) identified the most significant features for classification. The results obtained identified three features to have p-values below 0.05 as statistically significant, p-value = 0.0000605, p-value = 0.00000000469, and p-value = 0.0024. These extracted features could be used as input to training machine learning models for predicting and managing software requirements adaptively during agile development. With the reduction of ambiguities and sentiments of the user at the requirement phase, the development phase could be undertaken seamlessly with ease.
Read full abstract