GIS and Machine Learning Integration for Optimized School Site Selection: A Hybrid Framework with TOPSIS and Feature Ranking
GIS and Machine Learning Integration for Optimized School Site Selection: A Hybrid Framework with TOPSIS and Feature Ranking
- Research Article
3
- 10.1016/j.compbiomed.2016.10.006
- Oct 11, 2016
- Computers in Biology and Medicine
CAFÉ-Map: Context Aware Feature Mapping for mining high dimensional biomedical data
- Research Article
6
- 10.1016/j.knosys.2022.109254
- Jun 18, 2022
- Knowledge-Based Systems
Relational tree ensembles and feature rankings
- Book Chapter
21
- 10.1007/978-3-319-18032-8_33
- Jan 1, 2015
In outlying aspects mining, given a query object, we aim to answer the question as to what features make the query most outlying. The most recent works tackle this problem using two different strategies. (i) Feature selection approaches select the features that best distinguish the two classes: the query point vs. the rest of the data. (ii) Score-and-search approaches define an outlyingness score, then search for subspaces in which the query point exhibits the best score. In this paper, we first present an insightful theoretical result connecting the two types of approaches. Second, we present OARank – a hybrid framework that leverages the efficiency of feature selection based approaches and the effectiveness and versatility of score-and-search based methods. Our proposed approach is orders of magnitudes faster than previously proposed score-and-search based approaches while being slightly more effective, making it suitable for mining large data sets.
- Research Article
4
- 10.1186/s12911-023-02142-2
- Apr 6, 2023
- BMC Medical Informatics and Decision Making
BackgroundBreast cancer (BC) is one of the most common cancers among women. Since diverse features can be collected, how to stably select the powerful ones for accurate BC diagnosis remains challenging.MethodsA hybrid framework is designed for successively investigating both feature ranking (FR) stability and cancer diagnosis effectiveness. Specifically, on 4 BC datasets (BCDR-F03, WDBC, GSE10810 and GSE15852), the stability of 23 FR algorithms is evaluated via an advanced estimator (S), and the predictive power of the stable feature ranks is further tested by using different machine learning classifiers.ResultsExperimental results identify 3 algorithms achieving good stability (S ge 0.55) on the four datasets and generalized Fisher score (GFS) leading to state-of-the-art performance. Moreover, GFS ranks suggest that shape features are crucial in BC image analysis (BCDR-F03 and WDBC) and that using a few genes can well differentiate benign and malignant tumor cases (GSE10810 and GSE15852).ConclusionsThe proposed framework recognizes a stable FR algorithm for accurate BC diagnosis. Stable and effective features could deepen the understanding of BC diagnosis and related decision-making applications.
- Research Article
- 10.1371/journal.pone.0285512.r004
- May 8, 2023
- PLOS ONE
Speckle tracking echocardiography (STE) has been utilized to evaluate independent spatial alterations in the diabetic heart, but the progressive manifestation of regional and segmental cardiac dysfunction in the type 2 diabetic (T2DM) heart remains understudied. Therefore, the objective of this study was to elucidate if machine learning could be utilized to reliably describe patterns of the progressive regional and segmental dysfunction that are associated with the development of cardiac contractile dysfunction in the T2DM heart. Non-invasive conventional echocardiography and STE datasets were utilized to segregate mice into two pre-determined groups, wild-type and Db/Db, at 5, 12, 20, and 25 weeks. A support vector machine model, which classifies data using a single line, or hyperplane, that best separates each class, and a ReliefF algorithm, which ranks features by how well each feature lends to the classification of data, were used to identify and rank cardiac regions, segments, and features by their ability to identify cardiac dysfunction. STE features more accurately segregated animals as diabetic or non-diabetic when compared with conventional echocardiography, and the ReliefF algorithm efficiently ranked STE features by their ability to identify cardiac dysfunction. The Septal region, and the AntSeptum segment, best identified cardiac dysfunction at 5, 20, and 25 weeks, with the AntSeptum also containing the greatest number of features which differed between diabetic and non-diabetic mice. Cardiac dysfunction manifests in a spatial and temporal fashion, and is defined by patterns of regional and segmental dysfunction in the T2DM heart which are identifiable using machine learning methodologies. Further, machine learning identified the Septal region and AntSeptum segment as locales of interest for therapeutic interventions aimed at ameliorating cardiac dysfunction in T2DM, suggesting that machine learning may provide a more thorough approach to managing contractile data with the intention of identifying experimental and therapeutic targets.
- Research Article
1
- 10.1371/journal.pone.0285512
- May 8, 2023
- PLOS ONE
Speckle tracking echocardiography (STE) has been utilized to evaluate independent spatial alterations in the diabetic heart, but the progressive manifestation of regional and segmental cardiac dysfunction in the type 2 diabetic (T2DM) heart remains understudied. Therefore, the objective of this study was to elucidate if machine learning could be utilized to reliably describe patterns of the progressive regional and segmental dysfunction that are associated with the development of cardiac contractile dysfunction in the T2DM heart. Non-invasive conventional echocardiography and STE datasets were utilized to segregate mice into two pre-determined groups, wild-type and Db/Db, at 5, 12, 20, and 25 weeks. A support vector machine model, which classifies data using a single line, or hyperplane, that best separates each class, and a ReliefF algorithm, which ranks features by how well each feature lends to the classification of data, were used to identify and rank cardiac regions, segments, and features by their ability to identify cardiac dysfunction. STE features more accurately segregated animals as diabetic or non-diabetic when compared with conventional echocardiography, and the ReliefF algorithm efficiently ranked STE features by their ability to identify cardiac dysfunction. The Septal region, and the AntSeptum segment, best identified cardiac dysfunction at 5, 20, and 25 weeks, with the AntSeptum also containing the greatest number of features which differed between diabetic and non-diabetic mice. Cardiac dysfunction manifests in a spatial and temporal fashion, and is defined by patterns of regional and segmental dysfunction in the T2DM heart which are identifiable using machine learning methodologies. Further, machine learning identified the Septal region and AntSeptum segment as locales of interest for therapeutic interventions aimed at ameliorating cardiac dysfunction in T2DM, suggesting that machine learning may provide a more thorough approach to managing contractile data with the intention of identifying experimental and therapeutic targets.
- Research Article
8
- 10.1007/s10115-013-0631-0
- Mar 22, 2013
- Knowledge and Information Systems
In machine learning, feature ranking (FR) algorithms are used to rank features by relevance to the class variable. FR algorithms are mostly investigated for the feature selection problem and less studied for the problem of ranking. This paper focuses on the latter. A question asked about the problem of ranking given in the terminology of FR is: as different FR criteria estimate the relationship between a feature and the class variable differently on a given data, can we determine which criterion better captures the "true" feature-to-class relationship and thus generates the most "correct" order of individual features? This is termed as the "correctness" problem. It requires a reference ordering against which the ranks assigned to features by a FR algorithm are directly compared. The reference ranking is generally unknown for real-life data. In this paper, we show through theoretical and empirical analysis that for two-class classification tasks represented with binary data, the ordering of binary features based on their individual predictive powers can be used as a benchmark. Thus, allowing us to test how correct is the ordering of a FR algorithm. Based on these ideas, an evaluation method termed as FR evaluation strategy (FRES) is proposed. Rankings of three different FR criteria (relief, mutual information, and the diff-criterion) are investigated on five artificially generated and four real-life binary data sets. The results indicate that FRES works equally good for synthetic and real-life data and the diff-criterion generates the most correct orderings for binary data.
- Research Article
1
- 10.1093/jcde/qwae051
- May 1, 2024
- Journal of Computational Design and Engineering
Feature selection (FS) is vital in improving the performance of machine learning (ML) algorithms. Despite its importance, identifying the most important features remains challenging, highlighting the need for advanced optimization techniques. In this study, we propose a novel hybrid feature ranking technique called the Hybrid Feature Ranking Weighted Majority Model (HFRWM2). HFRWM2 combines ML models with the Harris Hawks Optimizer (HHO) metaheuristic. HHO is known for its versatility in addressing various optimization challenges, thanks to its ability to handle continuous, discrete, and combinatorial optimization problems. It achieves a balance between exploration and exploitation by mimicking the cooperative hunting behavior of Harris’s hawks, thus thoroughly exploring the search space and converging toward optimal solutions. Our approach operates in two phases. First, an odd number of ML models, in conjunction with HHO, generate feature encodings along with performance metrics. These encodings are then weighted based on their metrics and vertically aggregated. This process produces feature rankings, facilitating the extraction of the top-K features. The motivation behind our research is 2-fold: to enhance the precision of ML algorithms through optimized FS and to improve the overall efficiency of predictive models. To evaluate the effectiveness of HFRWM2, we conducted rigorous tests on two datasets: “Australian” and “Fertility.” Our findings demonstrate the effectiveness of HFRWM2 in navigating the search space and identifying optimal solutions. We compared HFRWM2 with 12 other feature ranking techniques and found it to outperform them. This superiority was particularly evident in the graphical comparison of the “Australian” dataset, where HFRWM2 showed significant advancements in feature ranking.
- Research Article
13
- 10.1038/s41598-021-97100-1
- Sep 2, 2021
- Scientific Reports
Much research has been done on time series of financial market in last two decades using linear and non-linear correlation of the returns of stocks. In this paper, we design a method of network reconstruction for the financial market by using the insights from machine learning tool. To do so, we analyze the time series of financial indices of S&P 500 around some financial crises from 1998 to 2012 by using feature ranking approach where we use the returns of stocks in a certain day to predict the feature ranks of the next day. We use two different feature ranking approaches—Random Forest and Gradient Boosting—to rank the importance of each node for predicting the returns of each other node, which produces the feature ranking matrix. To construct threshold network, we assign a threshold which is equal to mean of the feature ranking matrix. The dynamics of network topology in threshold networks constructed by new approach can identify the financial crises covered by the monitored time series. We observe that the most influential companies during global financial crisis were in the sector of energy and financial services while during European debt crisis, the companies are in the communication services. The Shannon entropy is calculated from the feature ranking which is seen to increase over time before market crash. The rise of entropy implies the influences of stocks to each other are becoming equal, can be used as a precursor of market crash. The technique of feature ranking can be an alternative way to infer more accurate network structure for financial market than existing methods, can be used for the development of the market.
- Research Article
14
- 10.1007/s12524-023-01707-y
- Jun 6, 2023
- Journal of the Indian Society of Remote Sensing
A landslide susceptibility map (LSM) assists in reducing the danger of landslides by locating the landslide-prone locations within the designated area. One of the locations that are prone to landslides in India's Western Ghats of which Goa is a part. This article presents the LSMs prepared for the state of Goa using four standard machine learning algorithms, namely Logistic Regression (LR ), Support Vector Machine (SVM), K-Nearest Neighbour (KNN), and Random Forest (RF). In order to create LSMs, a 78-point landslide inventory, as well as 14 landslide conditioning factors, has been used, including slope, elevation, aspect, total curvature, plan curvature, profile curvature, yearly rainfall, Stream Power Index, Topographic Wetness Index, distance to road, depth to bedrock/soil depth, soil type, lithology, and land use land cover. The most pertinent features for the models' construction have been chosen using the Pearson correlation coefficient test and the Random Forest method. The presence of landslides is shown to be strongly influenced by the distance to road, slope of the terrain, and the annual rainfall. The LSMs generated were classified into five levels ranging from very low susceptibility level to very high susceptible. The prediction accuracy, precision, recall, F1-score, area under the ROC (AUC-ROC), and True Skill Statistics (TSS) have been used to analyse and compare the LSMs created using various methodologies. All of these algorithms perform pretty well, as evidenced by the overall accuracy scores of 81.90% for LR, 83.33% for SVM, 81.94% for KNN, and 86.11% for RF. SVM and RF are the better approaches for forecasting landslide vulnerability in the research area, according to TSS data. The maximum AUC-ROC of 86% was achieved by the RF algorithm. The results of performance metrics lead to the conclusion that the tree-based RF approach is most appropriate for producing LSM for the state of Goa. The results of this study indicate that more landslide-prone areas can be found in the Sattari, Dharbandora, Sanguem, and Canacona regions of Goa.
- Research Article
27
- 10.1007/s11356-022-25119-6
- Jan 9, 2023
- Environmental science and pollution research international
Poor irrigation water quality can mar agricultural productivity. Traditional assessment of irrigation water quality usually requires the computation of various conventional quality parameters, which is often time-consuming and associated with errors during sub-index computation. To overcome this limitation, it becomes critical, therefore, to have a visual assessment of the irrigation water quality and identify the most influential water quality parameters for accurate prediction, management, and sustainability of irrigation water quality. Therefore, in this study, the overlay weighted sum technique was used to generate the irrigation water quality (IWQ) map of the area. The map revealed that 29.2% of the area is suitable for irrigation (low restriction), 41.7% is moderately suitable (moderate restriction); and 29.1% is unsuitable (high restriction), with the irrigation water quality declining towards the central-southeastern direction. Multilayer perceptron artificial neural networks (MLP-ANNs) and multiple linear regression models (MLR) were integrated and validated to predict the IWQ parameters using Cl-, HCO3- SO42-, NO3-, Ca2+, Mg2+, Na+, K+, pH, EC, TH, and TDS as input variables, and MAR, SAR, PI, KR, SSP, and PS as output variables. The two models showed high-performance accuracy based on the results of the coefficient of determination (R2 = 0.513-0.983). Low modeling errors were observed from the results of the sum of square errors (SOSE), relative errors (RE), adjusted R-square (R2adj), and residual plots, further confirming the efficacy of the two models; although the MLP-ANNs showed higher prediction accuracy for R2. Based on the sensitivity analysis of the MLP-ANN model, HCO3, pH, SO4, EC, and Cl were identified to have the greatest influence on the irrigation water quality of the area. This study has shown that the integration of GIS and machine learning can serve as rapid decision-making tools for proper planning and enhanced agricultural productivity.
- Research Article
- 10.1038/s41598-025-23820-3
- Oct 28, 2025
- Scientific Reports
Accurate automated classification of brain tumors from magnetic resonance imaging (MRI) is essential for early diagnosis and treatment. This study presents a hybrid framework combining Convolutional Neural Network (CNN) deep features, Large Margin Nearest Neighbor (LMNN) metric learning, and swarm-intelligence optimization for robust four-class classification. Five pretrained CNNs—DenseNet201, MobileNetV2, ResNet50, ResNet101, and InceptionV3—were evaluated on a dataset of 7,023 images categorized as glioma, meningioma, pituitary, healthy. Among these, DenseNet201 provided the highest baseline performance with 92.66% accuracy. LMNN improved feature separability, while Particle Swarm Optimization (PSO) and Grey Wolf Optimizer (GWO) selected compact subsets. The selected features were classified using k-Nearest Neighbor (KNN), Support Vector Machine (SVM), Artificial Neural Network (ANN), and Random Forest (RF). The DenseNet201–LMNN–GWO–KNN configuration, termed DenseWolf-K, achieved the best performance with 99.64% accuracy, establishing it as the optimal implementation of the framework. Robustness and generalizability were further confirmed using an independent external dataset. Model explainability was ensured through feature-level ranking of GWO-selected features and occlusion sensitivity maps, an Explainable Artifical Intelligence (XAI) method. Overall, the proposed DenseWolf-K framework delivers high accuracy, low false-negative rates, compact representation, and enhanced interpretability, representing a reliable and efficient solution for MRI-based brain tumor classification.
- Conference Article
2
- 10.1109/uemcon54665.2022.9965650
- Oct 26, 2022
Premature Ventricular Contraction (PVC) episodes are redundant heartbeats that disrupt the normal rhythm of the heart. The use of wearable sensors for remote heart monitoring and the implementation of trusted artificial intelligence (AI) algorithms are improvements in the field of smart health (sHealth) using cyber-physical systems (CPS) for telemedicine systems. We detect PVC beats by analyzing electrocardiogram (ECG/EKG) data and perform automatic classification to achieve high accuracy in real-time. In this study, we used a number of PVC heartbeat recordings from the MIT BIH supraventricular arrhythmia database We divided the recordings into a training dataset, which contains 39 ECG data, and a test dataset, which contains the remaining 39 ECG data. Both datasets contain approximately 80,000 samples of normal heartbeats and 7,000 samples of ventricular ectopic We extract combination of signal-specific features and signal-independent features for feature selection and ranking. We apply four algorithms, receiver operator characteristic (ROC) and the area under the ROC curve (AUC) (ROCAUC), constant, quasi constant and duplicate feature removal (Univariate) (CQCDFR), analysis of variance (ANOVA), and root mean square deviation (RMSE) to select and rank the feature. For each algorithm, it has its own selection of signal-independent features, which we combine separately with signal-specific features and test their accuracy. Then, we train the top 10 ranked combined features of each algorithm separately and check the highest performance. We explored the random forest (RF) classifier and support vector machine (SVM) classifier. Compared with other algorithms, the performance of feature selection using ANOVA algorithm before feature ranking is the lowest. The ANOVA algorithm achieved the highest accuracy after picking out the top 10 features. We further separately evaluate the sensitivity, specificity, accuracy, precision and F1 score of the top-ranked features according to the best accuracy obtained by different feature selection algorithms. The classification ANOVA algorithm from RF selects the top 7 features with 97% accuracy, 97.5 sensitivity, 98.1% specificity, 98.1 Precision%, and 95.0% F1 Score. This method can accurately monitor cardiac disease in real-time and analyze ECG beats so that patients can get accurate feedback.
- Research Article
8
- 10.1371/journal.pone.0269483
- Jun 3, 2022
- PLOS ONE
The feature ranking method of machine learning is applied to investigate the feature ranking and network properties of 21 world stock indices. The feature ranking is the probability of influence of each index on the target. The feature ranking matrix is determined by using the returns of indices on a certain day to predict the price returns of the next day using Random Forest and Gradient Boosting. We find that the North American indices influence others significantly during the global financial crisis, while during the European sovereign debt crisis, the significant indices are American and European. The US stock indices dominate the world stock market in most periods. The indices of two Asian countries (India and China) influence remarkably in some periods, which occurred due to the unrest state of these markets. The networks based on feature ranking are constructed by assigning a threshold at the mean of the feature ranking matrix. The global reaching centrality of the threshold network is found to increase significantly during the global financial crisis. Finally, we determine Shannon entropy from the probabilities of influence of indices on the target. The sharp drops of entropy are observed during big crises, which are due to the dominance of a few indices in these periods that can be used as a measure of the overall distribution of influences. Through this technique, we identify the indices that are influential in comparison to others, especially during crises, which can be useful to study the contagions of the global stock market.
- Components
1
- 10.1371/journal.pone.0269483.r004
- Jun 3, 2022
The feature ranking method of machine learning is applied to investigate the feature ranking and network properties of 21 world stock indices. The feature ranking is the probability of influence of each index on the target. The feature ranking matrix is determined by using the returns of indices on a certain day to predict the price returns of the next day using Random Forest and Gradient Boosting. We find that the North American indices influence others significantly during the global financial crisis, while during the European sovereign debt crisis, the significant indices are American and European. The US stock indices dominate the world stock market in most periods. The indices of two Asian countries (India and China) influence remarkably in some periods, which occurred due to the unrest state of these markets. The networks based on feature ranking are constructed by assigning a threshold at the mean of the feature ranking matrix. The global reaching centrality of the threshold network is found to increase significantly during the global financial crisis. Finally, we determine Shannon entropy from the probabilities of influence of indices on the target. The sharp drops of entropy are observed during big crises, which are due to the dominance of a few indices in these periods that can be used as a measure of the overall distribution of influences. Through this technique, we identify the indices that are influential in comparison to others, especially during crises, which can be useful to study the contagions of the global stock market.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.