Identifying Active Travel Behaviors in Challenging Environments Using GPS, Accelerometers, and Machine Learning Algorithms
Background: Active travel is an important area in physical activity research, but objective measurement of active travel is still difficult. Automated methods to measure travel behaviors will improve research in this area. In this paper, we present a supervised machine learning method for transportation mode prediction from global positioning system (GPS) and accelerometer data.Methods: We collected a dataset of about 150 h of GPS and accelerometer data from two research assistants following a protocol of prescribed trips consisting of five activities: bicycling, riding in a vehicle, walking, sitting, and standing. We extracted 49 features from 1-min windows of this data. We compared the performance of several machine learning algorithms and chose a random forest algorithm to classify the transportation mode. We used a moving average output filter to smooth the output predictions over time.Results: The random forest algorithm achieved 89.8% cross-validated accuracy on this dataset. Adding the moving average filter to smooth output predictions increased the cross-validated accuracy to 91.9%.Conclusion: Machine learning methods are a viable approach for automating measurement of active travel, particularly for measuring travel activities that traditional accelerometer data processing methods misclassify, such as bicycling and vehicle travel.
- Research Article
87
- 10.1016/j.ijmedinf.2021.104679
- Dec 31, 2021
- International Journal of Medical Informatics
Potential applications and performance of machine learning techniques and algorithms in clinical practice: A systematic review
- Dissertation
- 10.33915/etd.7979
- Dec 10, 2020
There is robust evidence that heart failure (HF) is associated with substantial mortality, morbidity, poor health-related quality of life, healthcare utilization, and economic burden. Previous research has revealed that there are sex differences in the epidemiology, etiology, and disease burden of HF. However, research on HF among women, especially postmenopausal women, is limited. To fill the knowledge gap, the three related aims of this dissertation were to: (1) identify knowledge gaps in HF research among women, especially postmenopausal women, using unsupervised machine learning methods and big data (i.e., articles published in PubMed); (2) identify emerging predictors (i.e., polypharmacy and some prescription medications) of incident HF among postmenopausal women using supervised machine learning methods; (3) identify leading predictors of HF-related emergency room use among postmenopausal women using supervised machine learning methods with data from a large commercial insurance claims database in the United States. This study utilized machine learning methods. In the first aim, non-negative matrix factorization algorithms were used to cluster HF articles based on the primary topic. Clusters were independently validated and labeled by three investigators familiar with HF research. The most understudied area among women was atrial fibrillation. Among postmenopausal women, the most understudied topic was stress-induced cardiomyopathy. For the second and third aims, a retrospective cohort design and Optum’s de-identified Clinformatics® Data Mart Database (Optum, Eden Prairie, MN), de-identified health insurance claims data, were used. In the second aim, multivariable logistic regression and three classification machine learning algorithms (cross-validated logistic regression (CVLR), random forest (RF), and eXtreme Gradient Boosting (XGBoost) algorithms) were used to identify predictors of incident HF among postmenopausal women. The associations of the leading predictors to incident HF were explored with an interpretable machine learning SHapley Additive exPlanations (SHAP) technique. The eight leading predictors of incident HF consistent across all models were: older age, arrhythmia, polypharmacy, Medicare, chronic obstructive pulmonary disease (COPD), coronary artery disease, hypertension, and chronic kidney disease. Some prescription medications such as sulfonylureas and antibiotics other than fluoroquinolones predicted incident HF in some machine learning algorithms. In the third aim, a random forest algorithm was used to identify predictors of HF-related emergency room use among postmenopausal women. Interpretable machine learning techniques were used to explain the association of leading predictors to HF-related emergency room use. Random forest algorithm had high predictive accuracy in the test dataset (Area Under the Curve: 94%, sensitivity: 93%, specificity: 77%, and accuracy: 0.81). We found that
- Research Article
46
- 10.1002/aps3.11371
- Jun 1, 2020
- Applications in Plant Sciences
Plants meet machines: Prospects in machine learning for plant biology
- Research Article
43
- 10.1155/2019/7816154
- Jan 1, 2019
- Mathematical Problems in Engineering
According to the forecast of stock price trends, investors trade stocks. In recent years, many researchers focus on adopting machine learning (ML) algorithms to predict stock price trends. However, their studies were carried out on small stock datasets with limited features, short backtesting period, and no consideration of transaction cost. And their experimental results lack statistical significance test. In this paper, on large‐scale stock datasets, we synthetically evaluate various ML algorithms and observe the daily trading performance of stocks under transaction cost and no transaction cost. Particularly, we use two large datasets of 424 S&P 500 index component stocks (SPICS) and 185 CSI 300 index component stocks (CSICS) from 2010 to 2017 and compare six traditional ML algorithms and six advanced deep neural network (DNN) models on these two datasets, respectively. The experimental results demonstrate that traditional ML algorithms have a better performance in most of the directional evaluation indicators. Unexpectedly, the performance of some traditional ML algorithms is not much worse than that of the best DNN models without considering the transaction cost. Moreover, the trading performance of all ML algorithms is sensitive to the changes of transaction cost. Compared with the traditional ML algorithms, DNN models have better performance considering transaction cost. Meanwhile, the impact of transparent transaction cost and implicit transaction cost on trading performance are different. Our conclusions are significant to choose the best algorithm for stock trading in different markets.
- Research Article
2
- 10.2139/ssrn.3705288
- Jan 1, 2020
- SSRN Electronic Journal
The performance of machine learning (ML) algorithms depends on the nature of the problem at hand. ML ‐ based modeling, therefore, should employ suitable algorithms where optimum results are desired. The purpose of the current study was to explore the potential applications of ML algorithms in modeling daylight in indoor spaces and ultimately identify the optimum algorithm. We thus developed and compared the performance of four common ML algorithms: generalized linear models, deep neural networks, random forest, and gradient boosting models in predicting the distribution of indoor daylight illuminances. We found that deep neural networks, which showed a determination of coefficient (R 2 ) of 0.99, outperformed the other algorithms. Additionally, we explored the use of long short ‐ term memory to forecast the distribution of daylight at a particular future time. Our results show that long short ‐ term memory is accurate and reliable (R 2 = 0.92). Our findings provide a basis for discussions on ML algorithms’ use in modeling daylight in indoor spaces, which may ultimately result in efficient tools for estimating daylight performance in the primary stages of building design and daylight control schemes for energy efficiency.
- Research Article
24
- 10.1177/09544062221132697
- Nov 2, 2022
- Proceedings of the Institution of Mechanical Engineers, Part C: Journal of Mechanical Engineering Science
Since the last decade, aircraft systems, such as flight control and landing gear, have been requiring increasing power, and consequently, the complexity of hydraulic aircraft systems has escalated. Inevitably, this complexity has resulted in the need for the troubleshooting of hydraulic aircraft systems that are dispersed around an aircraft and supply power to critical flight systems. This study proposes a novel digital twin-based health monitoring system for aircraft hydraulic systems to enable diagnostics of system failures early in the design cycle using machine learning (ML) methods. The scope of the systems is limited to hydraulic systems at the aircraft level using 20 failure scenarios. The support vector machine and several ensemble learning algorithms of ML methods were used to identify these failures. A comparison of the ML methods revealed that the random forest algorithm performed superior to the other ML algorithms. The developed digital twin framework for hydraulic system of aerial vehicle platforms, can help researchers and engineers to evaluate diagnostics systems early in the design phase.
- Research Article
5
- 10.1016/j.ejmp.2021.10.003
- Nov 1, 2021
- Physica Medica
On the use of machine learning methods for mPSD calibration in HDR brachytherapy.
- Research Article
185
- 10.1029/2018jd028447
- Aug 27, 2018
- Journal of Geophysical Research: Atmospheres
Evapotranspiration (ET) is a vital variable for land‐atmosphere interactions that links surface energy balance, water, and carbon cycles. The in situ techniques can measure ET accurately but the observations have limited spatial and temporal coverage. Modeling approaches have been used to estimate ET at broad spatial and temporal scales, while accurately simulating ET at regional scales remains a major challenge. In this study, we upscale ET from eddy covariance flux tower sites to the regional scale with machine learning algorithms. Five machine learning algorithms are employed for ET upscaling including artificial neural network, Cubist, deep belief network, random forest, and support vector machine. The machine learning methods are trained and tested at 36 flux towers sites (65 site years) across the Heihe River Basin and are then applied to estimate ET for each grid cell (1 km × 1 km) within the watershed and for each day over the period 2012–2016. The artificial neural network, Cubist, random forest, and support vector machine algorithms have almost identical performance in estimating ET and have slightly lower root‐mean‐square error than deep belief network at the site scale. The random forest algorithm has slightly lower relative uncertainty at the regional scale than other methods based on three‐cornered hat method. Additionally, the machine learning methods perform better over densely vegetated conditions than barren land or sparsely vegetated conditions. The regional ET generated from the machine learning approaches captured the spatial and temporal patterns of ET at the regional scale.
- Research Article
3
- 10.1017/s1049023x24000414
- May 17, 2024
- Prehospital and disaster medicine
The aim of this study was to summarize the literature on the applications of machine learning (ML) and their performance in Emergency Medical Services (EMS). Four relevant electronic databases were searched (from inception through January 2024) for all original studies that employed EMS-guided ML algorithms to enhance the clinical and operational performance of EMS. Two reviewers screened the retrieved studies and extracted relevant data from the included studies. The characteristics of included studies, employed ML algorithms, and their performance were quantitively described across primary domains and subdomains. This review included a total of 164 studies published from 2005 through 2024. Of those, 125 were clinical domain focused and 39 were operational. The characteristics of ML algorithms such as sample size, number and type of input features, and performance varied between and within domains and subdomains of applications. Clinical applications of ML algorithms involved triage or diagnosis classification (n = 62), treatment prediction (n = 12), or clinical outcome prediction (n = 50), mainly for out-of-hospital cardiac arrest/OHCA (n = 62), cardiovascular diseases/CVDs (n = 19), and trauma (n = 24). The performance of these ML algorithms varied, with a median area under the receiver operating characteristic curve (AUC) of 85.6%, accuracy of 88.1%, sensitivity of 86.05%, and specificity of 86.5%. Within the operational studies, the operational task of most ML algorithms was ambulance allocation (n = 21), followed by ambulance detection (n = 5), ambulance deployment (n = 5), route optimization (n = 5), and quality assurance (n = 3). The performance of all operational ML algorithms varied and had a median AUC of 96.1%, accuracy of 90.0%, sensitivity of 94.4%, and specificity of 87.7%. Generally, neural network and ensemble algorithms, to some degree, out-performed other ML algorithms. Triaging and managing different prehospital medical conditions and augmenting ambulance performance can be improved by ML algorithms. Future reports should focus on a specific clinical condition or operational task to improve the precision of the performance metrics of ML models.
- Research Article
38
- 10.3390/su12114471
- Jun 1, 2020
- Sustainability
The performance of machine learning (ML) algorithms depends on the nature of the problem at hand. ML-based modeling, therefore, should employ suitable algorithms where optimum results are desired. The purpose of the current study was to explore the potential applications of ML algorithms in modeling daylight in indoor spaces and ultimately identify the optimum algorithm. We thus developed and compared the performance of four common ML algorithms: generalized linear models, deep neural networks, random forest, and gradient boosting models in predicting the distribution of indoor daylight illuminances. We found that deep neural networks, which showed a determination of coefficient (R2) of 0.99, outperformed the other algorithms. Additionally, we explored the use of long short-term memory to forecast the distribution of daylight at a particular future time. Our results show that long short-term memory is accurate and reliable (R2 = 0.92). Our findings provide a basis for discussions on ML algorithms’ use in modeling daylight in indoor spaces, which may ultimately result in efficient tools for estimating daylight performance in the primary stages of building design and daylight control schemes for energy efficiency.
- Research Article
8
- 10.7717/peerj.16216
- Oct 9, 2023
- PeerJ
Identifying species, particularly small metazoans, remains a daunting challenge and the phylum Nematoda is no exception. Typically, nematode species are differentiated based on morphometry and the presence or absence of certain characters. However, recent advances in artificial intelligence, particularly machine learning (ML) algorithms, offer promising solutions for automating species identification, mostly in taxonomically complex groups. By training ML models with extensive datasets of accurately identified specimens, the models can learn to recognize patterns in nematodes' morphological and morphometric features. This enables them to make precise identifications of newly encountered individuals. Implementing ML algorithms can improve the speed and accuracy of species identification and allow researchers to efficiently process vast amounts of data. Furthermore, it empowers non-taxonomists to make reliable identifications. The objective of this study is to evaluate the performance of ML algorithms in identifying species of free-living marine nematodes, focusing on two well-known genera: Acantholaimus Allgén, 1933 and Sabatieria Rouville, 1903. A total of 40 species of Acantholaimus and 60 species of Sabatieria were considered. The measurements and identifications were obtained from the original publications of species for both genera, this compilation included information regarding the presence or absence of specific characters, as well as morphometric data. To assess the performance of the species identification four ML algorithms were employed: Random Forest (RF), Stochastic Gradient Boosting (SGBoost), Support Vector Machine (SVM) with both linear and radial kernels, and K-nearest neighbor (KNN) algorithms. For both genera, the random forest (RF) algorithm demonstrated the highest accuracy in correctly classifying specimens into their respective species, achieving an accuracy rate of 93% for Acantholaimus and 100% for Sabatieria, only a single individual from Acantholaimus of the test data was misclassified. These results highlight the overall effectiveness of ML algorithms in species identification. Moreover, it demonstrates that the identification of marine nematodes can be automated, optimizing biodiversity and ecological studies, as well as turning species identification more accessible, efficient, and scalable. Ultimately it will contribute to our understanding and conservation of biodiversity.
- Conference Article
14
- 10.1109/edcc.2019.00035
- Sep 1, 2019
The ability of Machine Learning (ML) algorithms to learn and work with incomplete knowledge has motivated many system manufacturers to include such algorithms in their products. However, some of these systems can be described as Safety-Critical Systems (SCS) since their failure may cause injury or even death to humans. Therefore, the performance of ML algorithms with respect to the safety requirements of such systems must be evaluated before they are used in their operational environment. Although there exist several measures that can be used for evaluating the performance of ML algorithms, most of these measures focus mainly on some properties of interest in the domains where they were developed. For example, Recall, Precision and F-Factor are, usually, used in Information Retrieval (IR) domain, and they mainly focus on correct predictions with less emphasis on incorrect predictions, which are very important in SCS. Accordingly, such measures need to be tuned to fit the needs for evaluating the safe performance of ML algorithms. This position paper presents the authors’ view on the inadequacy of existing measures, and it proposes a new set of measures to be used for the evaluation of the safe performance of ML algorithms.
- Research Article
- 10.1016/j.apergo.2024.104427
- May 1, 2025
- Applied ergonomics
Classification algorithms trained on simple (symmetric) lifting data perform poorly in predicting hand loads during complex (free-dynamic) lifting tasks.
- Research Article
1
- 10.34248/bsengineering.1351863
- May 15, 2024
- Black Sea Journal of Engineering and Science
The liver, a life-sustaining organ, plays a substantial role in many body functions. Liver diseases have become an important world health problem in terms of prevalence, incidences, and mortalities. Liver fibrosis/cirrhosis is great of importance, because if not treated in time liver cancer could be occurred and spread to other parts of the body. For this reason, early diagnosis of liver fibrosis/cirrhosis gives significance. Accordingly, this study investigated the performances of different machine learning algorithms for prediction of liver fibrosis/cirrhosis based on demographic and blood values. In this context, random forest, k nearest neighbour, C4.5 decision tree, K-star, random tree and reduced error pruning tree algorithms were used. Two distinct approaches were employed to evaluate the performances of machine learning algorithms. In the first approach, the entire features of dataset were utilized, while in the second approach, only the features selected through principal component analysis were used. Each approach was rigorously assessed using both 10-fold cross-validation and data splitting (70% train and 30% test) techniques. By conducting separate evaluations for each approach, a comprehensive understanding of the effectiveness of utilizing all features versus extracted features based principal component analysis was attained, providing valuable insights into the impact of feature dimensionality reduction on model performance. In this study, all analyses were implemented on WEKA data mining tool. In the first approach, the classification accuracies of random forest algorithm were 89.72% and 90.75% with the application of data splitting (70%-30%) and cross-validation techniques, respectively. In the second approach, where feature reduction is performed using principal component analysis technique, the accuracy values obtained from data splitting and cross-validation techniques of random forest algorithm were 88.61% and 88.83%, respectively. The obtained results revealed out that random forest algorithm outperformed for both approaches. Besides, the application of principal component analysis technique negatively affected the classification performance of used machine learning algorithms. It is thought that the proposed model will guide specialist physicians in making appropriate treatment decisions for patients with liver fibrosis/cirrhosis, potentially leading to death in its advanced stages.
- Research Article
1
- 10.17533/udea.redin.17254
- Aug 26, 2014
- Revista Facultad de Ingeniería Universidad de Antioquia
Accurate identification of precipitating clouds is a challenging task. In the present work, Support Vector Machines, Decision Trees and Random Forests algorithms were applied to discriminate between precipitating clouds and non-precipitating clouds from a satellite weather image GOES-13 covering the Colombian territory. The objective of this study was to evaluate the performance of machine learning (ML) algorithms for digital classification of cloud masses in terms of thematic accuracy classification using the conventional Mahalanobis algorithm as benchmark. Results show that ML algorithms provide more accurate classification of cloud masses than conventional algorithms. The best accuracy was obtained using Random Forests (RF), with an overall thematic accuracy of 97%. Furthermore, the classification obtained with the RF algorithm was compared pixel-to-pixel with NASA Tropical Rainfall Measurement Mission (TRMM) rainfall estimates, obtaining an overall accuracy of 94%. ML algorithms can therefore be used to improve current precipitating clouds identification methods.
- Ask R Discovery
- Chat PDF