The SAFARI project: An Artificial Intelligence-based strategy for volcano hazard monItoring from space
Identifying the observable signals that warn against volcanic unrest and impending eruptions is one of the greatest challenges in the management of natural disasters. In this regard, satellite data has become a strong focus of global interest, offering abundant datasets from multi-missions and valuable tools to study Earth and improve physical models.The SAFARI project aims at developing a comprehensive space-based strategy for next-generation quantitative volcano hazard monitoring integrating the most recent satellite imagery capabilities and the relative products with the newest technologies mainly in the field of Machine Learning (ML) and Soft Computing. The main objectives of SAFARI include: (i) following the manifestations of unrests and impending eruptions, as well as (ii) forecasting the areas potentially threatened by volcanic products through eruptive scenarios. For this purpose, SAFARI intends to characterize the state of volcanic activity (quiet, unrest and eruptive phases) by taking advantage of a variety of satellite data, including active and passive sensors ranging from optical to microwave frequencies, and to extract quantitative satellite-derived input parameters to physical models for rapid and accurate scenario forecasting during eruptions. Well-established products from space-based volcano monitoring such as: (i) volcanic radiative power, (ii) surface displacement and (iii) volcanic gas emission (e.g., SO2, BrO) time series are processed jointly and supported by less frequently used but still informative time series such as (iv) ground skin temperature of the volcanic edifices, (v) change detection time series, (vi) time-varying volcanic ash indices, (vii) ash top height time series, (viii) gravity field variation and also (ix) time varying indices giving information about deformation phases of the volcanic edifice (i.e., inflation/deflation) as well as (x) crucial parameters related to the volcanic source (e.g., depth, volume variation) by using data assimilation to deformation models. SAFARI merges and assembles the latest developments from different INGV teams, in a way to analyze Earth observation (EO) data with a retrospective and multi-disciplinary approach, employing traditional statistical or numerical analysis, latest generation Graphic Processing Units (GPUs) architectures and newer and more sophisticated ML algorithms to classify time series, detect anomalies, and predict or estimate significant parameter values. The methodologies in SAFARI are developed and verified at four active volcanoes worldwide: Etna and Vulcano (Italy), continuously monitored by dense ground based networks managed by INGV, which will provide a first controlled experiment, and Nyiragongo (D.R. Congo) and Sangay (Ecuador), characterized by high volcanic hazard but with modest permanent monitoring networks, where satellite remote sensing is a key monitoring tool.The results of the SAFARI project and its underlying data source and methodologies, as well as the potential of the whole integrated processing chain, aim at becoming an effective tool for volcanic hazard analysis and impact quantification never used to date in volcanology, improving safety and reducing risk associated to eruptive events worldwide.
- Research Article
15
- 10.1080/23279095.2024.2382823
- Jul 31, 2024
- Applied Neuropsychology: Adult
The cognitive impairment known as dementia affects millions of individuals throughout the globe. The use of machine learning (ML) and deep learning (DL) algorithms has shown great promise as a means of early identification and treatment of dementia. Dementias such as Alzheimer’s Dementia, frontotemporal dementia, Lewy body dementia, and vascular dementia are all discussed in this article, along with a literature review on using ML algorithms in their diagnosis. Different ML algorithms, such as support vector machines, artificial neural networks, decision trees, and random forests, are compared and contrasted, along with their benefits and drawbacks. As discussed in this article, accurate ML models may be achieved by carefully considering feature selection and data preparation. We also discuss how ML algorithms can predict disease progression and patient responses to therapy. However, overreliance on ML and DL technologies should be avoided without further proof. It’s important to note that these technologies are meant to assist in diagnosis but should not be used as the sole criteria for a final diagnosis. The research implies that ML algorithms may help increase the precision with which dementia is diagnosed, especially in its early stages. The efficacy of ML and DL algorithms in clinical contexts must be verified, and ethical issues around the use of personal data must be addressed, but this requires more study.
- Conference Article
9
- 10.56952/arma-2023-0287
- Jun 25, 2023
The traditional Arp's decline model failed to predict production from many oil and gas reservoirs due to some inherent assumptions like boundary-dominated flow contrary to long transient flow. Fundamentally, this is a time series curve fitting and forecasting problem. Advanced machine learning (ML) algorithms can be used to capture the unusual trend in hydrocarbon production decline. The objective of this study is to develop various ML algorithms such as Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), and Gated Recurrent Unit (GRU) in forecasting future production performance and estimating ultimate recovery (EUR). Decline curve analysis (DCA) is a straightforward and rapid way to estimate future production simply by suitable curve fitting. However, the traditional Arp's method overestimates the production from many reservoirs, resulting in new empirical methods such as Power Law Exponential Analysis (PLE by Ilk, 2008), Logistic Growth Analysis (LGA by Clark 2011), and Duong Method (DM by Duong 2011). The outcomes of these recent models also depend on the quality of the data and the reservoir flow regimes. The machine learning algorithm is applied to overcome the drawbacks and limitations of the empirical decline curve models. Machine learning algorithms such as RNN, LSTM, and GRU are compared. The first 80% of time-series data is used for training the models and the last 20% is used for testing. The trained models are employed to forecast future rates and to calculate EUR. The value of NSE close to unity suggests good model performance. A normalized Nash-Sutcliffe model efficiency coefficient (NNSE) and Normalized Root Mean Squared Error (NRMSE) are selected for assessing the efficacy of different models. The LSTM models have several unique advantages over typical supervised machine learning algorithms. The models are flexible in handling multiple inputs in time series. The ML models developed in this work can be coupled with an economic model considering the future oil price and operational costs. Machine learning is a research area quickly growing across several industries providing valuable insights. Machine algorithm for time series forecasting in the oil and gas industry has not been comprehensively explored. Results from this work will provide the literature with another application perspective with strong opportunities in production data analysis.
- Research Article
67
- 10.1007/s11269-018-2155-6
- Nov 29, 2018
- Water Resources Management
We provide contingent empirical evidence on the solutions to three problems associated with univariate time series forecasting using machine learning (ML) algorithms by conducting an extensive multiple-case study. These problems are: (a) lagged variable selection, (b) hyperparameter handling, and (c) comparison between ML and classical algorithms. The multiple-case study is composed by 50 single-case studies, which use time series of mean monthly temperature and total monthly precipitation observed in Greece. We focus on two ML algorithms, i.e. neural networks and support vector machines, while we also include four classical algorithms and a naive benchmark in the comparisons. We apply a fixed methodology to each individual case and, subsequently, we perform a cross-case synthesis to facilitate the detection of systematic patterns. We fit the models to the deseasonalized time series. We compare the one- and multi-step ahead forecasting performance of the algorithms. Regarding the one-step ahead forecasting performance, the assessment is based on the absolute error of the forecast of the last monthly observation. For the quantification of the multi-step ahead forecasting performance we compute five metrics on the test set (last year’s monthly observations), i.e. the root mean square error, the Nash-Sutcliffe efficiency, the ratio of standard deviations, the coefficient of correlation and the index of agreement. The evidence derived by the experiments can be summarized as follows: (a) the results mostly favour using less recent lagged variables, (b) hyperparameter optimization does not necessarily lead to better forecasts, (c) the ML and classical algorithms seem to be equally competitive.
- Book Chapter
1
- 10.4018/979-8-3693-4284-8.ch009
- Mar 22, 2024
Natural disasters require quick and precise reactions for preparedness, mitigation, and response activities because they pose serious risks to infrastructure, human lives, and the environment. The incorporation of machine learning (ML) algorithms has become a viable strategy to improve natural disaster management in a number of ways in recent years. Early warning systems and risk assessment frameworks are made possible by predictive models that are able to identify patterns, anomalies, and risk factors from a variety of data sources thanks to techniques like supervised learning, unsupervised learning, and deep learning. The application of machine learning algorithms to natural disaster management poses a number of issues and concerns, notwithstanding its potential advantages. By combining various data sources, sophisticated analytics, and real-time decision support systems, machine learning (ML) algorithms enable stakeholders to more effectively and resiliently prepare for, mitigate, and respond to natural catastrophes.
- Preprint Article
- 10.5194/egusphere-egu24-13400
- Mar 9, 2024
Recently published literature has confirmed time and time again that machine learning (ML) algorithms (including LSTMs, GRUs, and Transformers) and conceptual lumped hydrological models (such as SAC-SMA and HBV) perform more reliably in hindcast and forecast flood prediction intercomparison experiments than more sophisticated high-resolution hydrological models. These provocative results have challenged decades of development of physics-based hydrological models for streamflow prediction, which seem more sensitive to the errors in forcing precipitation data, and the spatial description of landscape attributes. Thus, the long-standing promise that a better and more detailed understanding and description of hydrological processes would yield better predictions of streamflow fluctuations (including floods, droughts, etc.) is yet to be fulfilled. In a recently published study by our research group, we proposed and tested a methodology to benchmark ML algorithms using artificially generated data using physically-based hydrological models under very controlled conditions. Our approach combined the implementation of the hillslope-link distributed hydrological model (HLM) on a 4,500 km2 basin driven by precipitation fields created using the stochastic storm transposition (SST) framework. We demonstrated that ML algorithms could effectively identify the input-output relations between the average rainfall over a basin and streamflows (as time series) at multiple sub-basin outlets under very general conditions of space-time variability of flood-generating storm systems. This result matches the reported performance by ML algorithms under a great variety of conditions. We are extending our work to ask a new question: How reliable are trained ML algorithms and calibrated lumped hydrological models at predicting floods that have never been observed in the “historical” record? This question goes to the heart of what these black/grey-box and conceptual types of tools represent mathematically: a deterministic estimate for the input-output relationship between rainfall and streamflow. Therefore, when any of these black-box models predicts a flood there are two possible scenarios, 1) interpolation, which means that the hydrograph and peak flow being predicted are within the range of floods observed in the past, and 2) extrapolation, the case when the event being predicted is significantly larger than anything observed in the past.  In this study, we will present the results of controlled experiments to investigate this question and show which class of algorithms are less susceptible to over or under-estimation when extrapolating beyond the range of the “historical record”. We will present results for hourly and daily prediction timescales. This investigation is very relevant in the current environment of climate change where the water-holding capacity of the atmosphere increases with every degree of warming leading to storms that seem to constantly break every record in terms of intensity, duration, and spatial coverage.
- Research Article
- 10.5194/isprs-archives-xlviii-4-w18-2025-343-2026
- Jan 27, 2026
- The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Abstract. Explainable artificial intelligence (XAI) enables users to interpret the black box of machine learning (ML) algorithms and its applicability across various ML algorithms allows for the investigation of feature impacts on the model. Among the ML algorithms, Long Short Term Memory (LSTM) deep learning method has become popular in various applications, especially forecasting analysis, due to its ability to effectively capture long-range temporal dependencies in sequential data, as is often required when analyzing time-series deformation patterns derived from multi-temporal interferometric synthetic aperture radar (MT-InSAR). The SHapley Additive exPlanations (SHAP) method, one of the most popular XAI techniques, has been widely used to identify the impacts of features on processes. To this end, forecasting analysis of MT-InSAR-based time series surface movements was performed using the LSTM method in the two selected case regions at Istanbul Airport. These case regions exhibit different time series characteristics (subsidence and stable) and belong to different surface types (runway and building). According to the results obtained, the LSTM method showed successful performances with RMSE, MAE, and R values of 1.12 mm, 0.92 mm, 0.672 for case 1 and 1.37 mm, 1.13 mm, 0.385 for case 2. To assess the impacts of the exogenous variables, including trend, seasonal, and residual components of time series data and meteorological parameters gathered from the ERA5-Land dataset, were investigated using the SHAP method, and results were evaluated specifically for each case region.
- Book Chapter
2
- 10.4018/979-8-3693-3362-4.ch015
- Jun 14, 2024
Amidst the continually changing climate and the rise in natural disasters, it is crucial to strengthen resilience against these calamities. This chapter explores the dynamic intersection of machine learning and natural disasters, revealing how advanced technologies reshape disaster management. In the face of escalating challenges posed by earthquakes, floods, and wildfires, machine learning emerges as an innovative solution, offering proactive approaches beyond conventional reactive methods. The narrative unfolds by tracing the evolution of disaster management, highlighting the transformative impact of machine learning on early warning systems. It explores predictive analytics and risk assessment, elucidating how machine learning algorithms leverage historical data and real-time information to deepen our understanding of disaster vulnerabilities. Beyond prediction, the discourse extends to the pivotal role of machine learning in optimizing response and recovery efforts—efficiently allocating resources and fostering recovery planning. A critical dimension of this integration emerges in the analysis of remote sensing and satellite imagery, where machine learning algorithms enable more accurate and timely disaster monitoring. The exploration extends further, unraveling the interconnectedness of various hazards and emphasizing how machine learning facilitates a holistic understanding. The synergy between machine learning and traditional knowledge systems comes to the forefront, recognizing the significance of integrating local wisdom into predictive models. The discourse broadens to encompass policy implications, international collaboration, and ethical considerations embedded in machine learning for disaster management. The integration of machine learning in humanitarian aid efforts and its contribution to environmental sustainability are scrutinized, offering a comprehensive understanding of the multifaceted relationship between machine learning and natural disasters. In the ever-evolving landscape of natural disaster management, the fusion of machine learning and human expertise opens new avenues for innovation. One emerging trend is the integration of real-time social media data into machine learning algorithms. By analyzing user-generated content, sentiment analysis, and geospatial information from platforms like Twitter and Facebook, these algorithms can provide rapid insights into the unfolding dynamics of a disaster. This not only enhances the timeliness of response efforts but also fosters a more community-centric approach, incorporating the voices and experiences of those directly affected. The potential of generative adversarial networks to simulate and predict complex disaster scenarios offers a proactive paradigm shift in disaster management by enabling stakeholders to refine strategies and adapt to evolving challenges through realistic simulations. As the chapter charts the course forward, it concludes by exploring emerging trends and innovations in the symbiotic relationship between machine learning and natural disaster management.
- Preprint Article
- 10.5194/egusphere-egu22-12091
- Mar 28, 2022
<p>Machine Learning (ML) algorithms are used to learn from data and make data-driven predictions. These algorithms consider pattern recognition and computational learning on the data. Earth Orientation Parameters (EOP) are the monitoring parameters for the Earth’s rotation. UT1-UTC is an EOP that monitors the time required by the Earth to complete a rotation versus atomic time. This parameter is indispensable for many applications like precise satellite orbit determination, interplanetary space navigation, etc. In this study, we will use novel ML algorithms to predict the UT1-UTC (IERS C04) time series and investigate its performance against each other and also, w.r.t. the conventional prediction fitting methods, like Least Squares (LS), Auto-Regressive (AR), Multivariate Autoregressive (MAR) methods, etc. In this work, a diversity of advanced ML algorithms will be tested: Random Forest (RF), Generalized Linear Model (GLM), Gradient Boosted Model (GBM), K-means and prophet algorithms. We would like to optimize the UT1-UTC prediction technique to work well with the short-term prediction up to 10 days. Finally, these ML predictions will be compared against those from the last Earth Orientation Parameters Prediction Comparison Campaign (EOP PCC) from October 1, 2005 to February 28, 2008. This detailed study would be useful to understand the performance of ML techniques on the UT1-UTC time series and would lead to further development of better prediction models using ML algorithms. </p><p>Key words: Machine Learning, Earth Orientation Parameters (EOP), UT1-UTC, predictions</p>
- Preprint Article
- 10.5194/egusphere-egu24-22262
- Mar 11, 2024
Over the past decades, especially since 2014, large quantities of Earth Observation (EO) data became available in high spatial and temporal resolution, thanks to ever-developing constellations (e.g.: Sentinel, Landsat) and open data policy. However, in the case of optical images, affected by cloud coverage and the spatially changing overlap of relative satellite orbits, creating temporally generalized and dense time series by using only measured data is challenging, especially when studying larger areas. Several papers investigate the question of spatio-temporal gap filling and show different interpolation methods to calculate missing values corresponding to the measurements. In the past years more products and technologies have been constructed and published in this field, for example Copernicus HR-VPP Seasonal Trajectories (ST) product.  These generalized data structures are essential to the comparative analysis of different time periods or areas and improve the reliability of data analyzing methods such as Fourier transform or correlation. Temporally harmonized input data is also necessary in order to improve the results of Machine Learning classification algorithms such as Random Forest or Convolutional Neural Networks (CNN). These are among the most efficient methods to separate land cover categories like arable lands, forests, grasslands and built-up areas, or crop types within the arable category. This study analyzes the efficiency of different interpolation methods on Sentinel-2 multispectral time series in the context of land cover classification with Machine Learning. We compare several types of interpolation e.g. linear, cubic and cubic-spline and also examine and optimize more advanced methods like Inverse Distance Weighted (IDW) and Radial Basis Function (RBF). We quantify the accuracy of each method by calculating mean square error between measured and interpolated data points. The role of interpolation of the input dataset in Deep Learning (CNN) is investigated by comparing Overall, Kappa and categorical accuracies of land cover maps created from only measured and interpolated time series. First results show that interpolation has a relevant positive effect on accuracy statistics. This method is also essential in taking a step towards constructing robust pretrained Deep Learning models, transferable between different time intervals and agro-ecological regions. The research has been implemented with the support provided by the Ministry of Culture and Innovation of Hungary from the National Research, Development and Innovation Fund, financed under the KDP-2021 funding scheme.   Keywords: time series analysis, Machine Learning, interpolation, Sentinel
- Research Article
47
- 10.1016/j.artmed.2022.102381
- Aug 27, 2022
- Artificial intelligence in medicine
Machine learning and the electrocardiogram over two decades: Time series and meta-analysis of the algorithms, evaluation metrics and applications
- Preprint Article
- 10.5194/egusphere-egu23-15915
- May 15, 2023
Ground deformation caused by groundwater exploitation leads to significant socio-economic losses worldwide. Driving factors such as population growth and climate change will increase these losses, especially in arid regions where droughts are becoming more intense, longer lasting, and frequent. Therefore, there is a need to generate models capable of forecasting ground deformation. However, few studies have analyzed deformation time series (DTS) to identify and characterize subsidence phenomena.Our research aims to predict the ground deformation associated with groundwater abstractions in 18 wells of the Madrid Detrital Aquifer (ATDM) using statistical models and shallow and deep Machine Learning (ML) algorithms. We generated a database with 18 monthly time series (one for each well) between 1992 and 2010, with data for two variables: a binary variable indicating extraction-recovery cycles of the aquifer and a continuous variable representing the average deformation for the area of influence of each well. DTS generated from Persistent Scatter Interferometry (PSI) of ERS-1/2 and ENVISAT radar images were used to calculate the average deformation. Finally, we applied six different methods for forecasting DTS: two statistical models, Autoregressive Integrated Moving Average (ARIMA) and Prophet (P), one ensemble shallow ML algorithm, Random Forest (RF), one hybrid method, Neural Prophet (NP), and two Deep Learning (DL) techniques 1D Convolutional Neural Networks (CNN1D), and Long Short-Term Memory (LSTM).The analysis of DTS allowed us to differentiate two zones with different hydrological behavior: a zone of higher permeability (north zone) and another of lower permeability (south zone). We found that establishing the architectures of ML and DL algorithms based on hydrological zones improves the prediction of ground deformation. ML and DL algorithms provide better forecasts compared to statistical and hybrid models. Specifically, LSTM and RF offer the best results. Our results show the potential of LSTM algorithms and the previous grouping of DTS in predicting ground deformation associated with groundwater exploitation.This work has been developed thanks to the pre-doctoral grant for the Training of Research Personnel (PRE2021-100044) funded by MCIN/AEI/10.13039/501100011033 and by "FSE invests in your future" within the framework of the SARAI project "Towards a smart exploitation of land displacement data for the prevention and mitigation of geological-geotechnical risks" PID2020-116540RB-C22 funded by MCIN/AEI/10.13039/501100011033.
- Research Article
78
- 10.1371/journal.pone.0301541
- Apr 18, 2024
- PLOS ONE
Many individual studies in the literature observed the superiority of tree-based machine learning (ML) algorithms. However, the current body of literature lacks statistical validation of this superiority. This study addresses this gap by employing five ML algorithms on 200 open-access datasets from a wide range of research contexts to statistically confirm the superiority of tree-based ML algorithms over their counterparts. Specifically, it examines two tree-based ML (Decision tree and Random forest) and three non-tree-based ML (Support vector machine, Logistic regression and k-nearest neighbour) algorithms. Results from paired-sample t-tests show that both tree-based ML algorithms reveal better performance than each non-tree-based ML algorithm for the four ML performance measures (accuracy, precision, recall and F1 score) considered in this study, each at p<0.001 significance level. This performance superiority is consistent across both the model development and test phases. This study also used paired-sample t-tests for the subsets of the research datasets from disease prediction (66) and university-ranking (50) research contexts for further validation. The observed superiority of the tree-based ML algorithms remains valid for these subsets. Tree-based ML algorithms significantly outperformed non-tree-based algorithms for these two research contexts for all four performance measures. We discuss the research implications of these findings in detail in this article.
- Research Article
- 10.1371/journal.pone.0301541.r004
- Apr 18, 2024
- PLOS ONE
Many individual studies in the literature observed the superiority of tree-based machine learning (ML) algorithms. However, the current body of literature lacks statistical validation of this superiority. This study addresses this gap by employing five ML algorithms on 200 open-access datasets from a wide range of research contexts to statistically confirm the superiority of tree-based ML algorithms over their counterparts. Specifically, it examines two tree-based ML (Decision tree and Random forest) and three non-tree-based ML (Support vector machine, Logistic regression and k-nearest neighbour) algorithms. Results from paired-sample t-tests show that both tree-based ML algorithms reveal better performance than each non-tree-based ML algorithm for the four ML performance measures (accuracy, precision, recall and F1 score) considered in this study, each at p<0.001 significance level. This performance superiority is consistent across both the model development and test phases. This study also used paired-sample t-tests for the subsets of the research datasets from disease prediction (66) and university-ranking (50) research contexts for further validation. The observed superiority of the tree-based ML algorithms remains valid for these subsets. Tree-based ML algorithms significantly outperformed non-tree-based algorithms for these two research contexts for all four performance measures. We discuss the research implications of these findings in detail in this article.
- Preprint Article
- 10.5194/epsc2020-963
- May 2, 2024
Abstract:As part of a larger study to elucidate the presence of hydrated minerals on asteroid surfaces, we are developing a robust taxonomic classification system using spectroscopic observations in the vicinity of 3 &#956;m. We have constructed a Python algorithm to identify band centers and band depths near 3 &#181;m for a set of normalized, thermally-corrected asteroid spectra for use to serve as inputs to Python&#8217;s Scikit-Learn library of Machine Learning (ML) algorithms. We anticipate a thorough investigation of both Principal Component Analysis and ML (supervised, unsupervised, and Artificial Neural Network) techniques to assess which technique is likely to be better suited for classifying the 3-&#181;m data. At this writing, we have run tests using Python&#8217;s Agglomerative clustering ML algorithm to examine possible clustering scenarios. These initial steps have given us some familiarity with the mechanics of using ML on the 3-&#181;m dataset as well as serving to identify some possible pitfalls or cul-de-sacs. Presented here are the preliminary results we have obtained.Introduction:Although various techniques have been used, asteroid classification has typically been done via Principal Component Analysis (PCA: [1,2]). PCA is a statistical technique that reduces the dimensionality of a dataset by identifying the most important parameters within a dataset based on their variance. Parameters that exhibit the greatest amount of variance are considered to be of greater importance while parameters with the least amount of variance are considered to be of lower importance. While the PCA technique produces better visualizations of the data by reducing the dimensionality of a dataset, the PCA technique comes with some drawbacks. Disadvantages such as its dependence on scale and information loss due to the orthogonal property of PCA can cause interpretation of PCA results to prove to be a more critical and time-consuming process. Therefore, exploring other means of classification may prove to be worthwhile.Machine Learning (ML) algorithms have had a significant impact on the way in which data is analyzed and interpreted, and have already proven to be a powerfully reliable resource in the field of planetary science. Accordingly, the application of ML to an asteroid taxonomy has the potential to be more efficient, objective, and easy-to-implement than PCA. ML algorithms can be supervised, in which the program &#8220;learns&#8221; from training data and is able to classify new inputs, or unsupervised, in which the program analyzes the dataset to determine patterns such as clusters. [3] used an Artificial Neural Network (ANN, a subset of ML) to classify asteroids, work followed up by [4]. Recent explorations of supervised ML for asteroid taxonomy are promising, and have applied training sets from existing databases to new visible and/or NIR photometric data (e.g. [5,6,7]).We seek to explore the benefits of ML algorithms, as well as compare and contrast to the PCA technique, in the production of an asteroid taxonomy. Our initial exploration has utilized a set of normalized, thermally-corrected asteroid spectra in the vicinity of 3 &#181;m. We have identified band centers and band depths and served this parameter space as inputs to Python&#8217;s Agglomerative clustering ML algorithm.Methodology:Thermal corrections of the asteroid spectra were performed via a forward model that uses a modified version of the Standard Thermal Model (STM: [8]). The forward model treats the beaming parameter as a free parameter adjusting its value for each iteration of the STM until it converges onto a value that yields expected long-wavelength continuum behavior. Spectra were then normalized to unity at a wavelength of 2.3 &#181;m, followed by identification of band centers and band depths near 3 &#181;m using both polynomial and Gaussian fits. In addition, band depths were measured at wavelengths of 2.9 &#181;m and 3.2 &#181;m to gather more information on asteroid band shapes. Lastly, the aforementioned calculated spectral features were input into Python&#8217;s Agglomerative clustering algorithm to determine which asteroid spectra shared similar features.Summary:As part of a larger investigation to better understand hydrated mineralogies as they apply to asteroids, we have begun work towards developing a quantitative taxonomic framework derived from asteroid spectra in the wavelength range from 2.0-4.0 &#181;m. Our exploration thus far of Python&#8217;s Agglomerative clustering algorithm has proven to be fruitful. Minor changes to the parameterization of this algorithm can yield very different results, which naturally can lead to different interpretations. The Agglomerative clustering algorithm is one of many the powerful ML algorithms we will explore against the PCA technique, all of which we will be discussing in our presentation.
- Research Article
7
- 10.3390/su152416593
- Dec 6, 2023
- Sustainability
Land use and land cover (LULC) classification plays a significant role in the analysis of climate change, evidence-based policies, and urban and regional planning. For example, updated and detailed information on land use in urban areas is highly needed to monitor and evaluate urban development plans. Machine learning (ML) algorithms, and particularly ensemble ML models support transferability and efficiency in mapping land uses. Generalization, model consistency, and efficiency are essential requirements for implementing such algorithms. The transfer-ensemble learning approach is increasingly used due to its efficiency. However, it is rarely investigated for mapping complex urban LULC in Global South cities, such as India. The main objective of this study is to assess the performance of machine and ensemble-transfer learning algorithms to map the LULC of two metropolitan cities of India using Landsat 5 TM, 2011, and DMSP-OLS nightlight, 2013. This study used classical ML algorithms, such as Support Vector Machine-Radial Basis Function (SVM-RBF), SVM-Linear, and Random Forest (RF). A total of 480 samples were collected to classify six LULC types. The samples were split into training and validation sets with a 65:35 ratio for the training, parameter tuning, and validation of the ML algorithms. The result shows that RF has the highest accuracy (94.43%) of individual models, as compared to SVM-RBF (85.07%) and SVM-Linear (91.99%). Overall, the ensemble model-4 produces the highest accuracy (94.84%) compared to other ensemble models for the Kolkata metropolitan area. In transfer learning, the pre-trained ensemble model-4 achieved the highest accuracy (80.75%) compared to other pre-trained ensemble models for Delhi. This study provides innovative guidelines for selecting a robust ML algorithm to map urban LULC at the metropolitan scale to support urban sustainability.