Use of machine learning models to predict older adult ground-level falls: uncovering factors and patterns.
A Level I trauma center used machine learning algorithms to identify risk factors and patterns in falls among older adults, which constitute our greatest burden of traumatic admissions. A retrospective analysis was conducted on 2,391 ground-level fall trauma admissions from 2017-2022 including variables related to demographics, and weather conditions at admission. Supervised learning models were developed to predict older adult vs younger counterpart falls. In this machine learning modality, we generated a Decision Tree, a Support Vector Machine Classifier Algorithm, and a Logistic Regression Model. Unsupervised learning methods uncover patterns or groupings in the dataset of older adult ground-level falls, which consists of 1,742 records from 2017-2022 trauma admissions including comorbidity variables. Unsupervised learning methods of Principal Components Analysis, Hierarchical Clustering, and Market Basket Analysis were employed. All three supervised models found the female sex as an important variable in predicting older adult falls. Unsupervised learning identified discernible patterns and groupings, revealing that certain weather variables are associated with falls. These machine learning modalities can shed light on what may be important risk factors for older adult falls and can help to target awareness and outreach.
- Research Article
5
- 10.1190/geo2023-0199.1
- Jan 5, 2024
- GEOPHYSICS
Seismic reservoir characterization is of great interest for sweet spot identification, reservoir quality assessment, and geologic model building. The sparsity of the labeled samples often limits the application of supervised machine learning (ML) for seismic reservoir characterization. Unsupervised learning methods, in contrast, explore the internal structure of data and extract low-dimensional features of geologic interest from seismic data without the need for labels. We compare various unsupervised learning approaches, including the linear method of principal component analysis (PCA), the manifold learning methods of t-distributed stochastic neighbor embedding and uniform manifold approximation and projection (UMAP), and the convolutional autoencoder (CAE), on the 3D synthetic and field seismic data of a deep carbonate reservoir in southwest China. On the synthetic data, the low-dimensional features extracted by UMAP and CAE provide a better indication of porosity and gas saturation than traditional seismic attributes. In particular, UMAP better preserves the global structure of geologic features and indicates the potential of decoupling the gas saturation and porosity effects from seismic responses. We demonstrate that joint use of several types of seismic attributes, instead of using a single type of seismic attributes, can better delineate the reservoir structures using unsupervised ML. On the field seismic data, UMAP can effectively characterize the sedimentary facies distribution, which is consistent with the geologic understanding. Nevertheless, the porosity and saturation can not be reliably identified from field seismic data using unsupervised ML, which is likely caused by the complex pore structures in carbonates complicating the mapping relationship between seismic responses and reservoir parameters.
- Research Article
16
- 10.1007/s11846-019-00349-0
- Aug 23, 2019
- Review of Managerial Science
We compare several unsupervised probabilistic machine learning methods for market basket analysis, namely binary factor analysis, two topic models (latent Dirichlet allocation and the correlated topic model), the restricted Boltzmann machine and the deep belief net. After an overview of previous applications of unsupervised probabilistic machine learning methods to market basket analysis we shortly present the methods which we investigate and outline their estimation. Performance is measured by tenfold cross-validated log likelihood values. Binary factor analysis vastly outperforms topic models. The restricted Boltzmann machine attains a similar performance advantage over binary factor analysis. Overall, a deep belief net with 45 variables in the first and 15 variables in the second hidden layers turns out to be the best model. We also compare the investigated machine learning methods with respect to ease of interpretation and runtimes. In addition, we show how to interpret the relationships between hidden variables and observed category purchases. To demonstrate managerial implications we estimate the effect of promoting each category both on purchase probability increases of other product categories and the relative increase of basket size. Finally, we indicate several possibilities to extend restricted Boltzmann machines and deep belief nets for market basket analysis.
- Research Article
5
- 10.1007/s11707-018-0704-1
- Jun 4, 2018
- Frontiers of Earth Science
Unsupervised learning methods were applied to explore data patterns in multivariate geophysical datasets collected from ocean floor sediment core samples coming from scientific ocean drilling in the South China Sea. Compared to studies on similar datasets, but using supervised learning methods which are designed to make predictions based on sample training data, unsupervised learning methods require no a priori information and focus only on the input data. In this study, popular unsupervised learning methods including K-means, self-organizing maps, hierarchical clustering and random forest were coupled with different distance metrics to form exploratory data clusters. The resulting data clusters were externally validated with lithologic units and geologic time scales assigned to the datasets by conventional methods. Compact and connected data clusters displayed varying degrees of correspondence with existing classification by lithologic units and geologic time scales. K-means and self-organizing maps were observed to perform better with lithologic units while random forest corresponded best with geologic time scales. This study sets a pioneering example of how unsupervised machine learning methods can be used as an automatic processing tool for the increasingly high volume of scientific ocean drilling data.
- Research Article
14
- 10.1186/s13638-017-0931-2
- Sep 6, 2017
- EURASIP Journal on Wireless Communications and Networking
Ultra-wideband (UWB) radar with strong anti-jamming performance and high-range resolution can be used to separate multiple human targets in a complex environment. In recent years, through-wall human being detection with UWB radar has become relatively sophisticated. In this paper, the method of kernel principal component analysis (KPCA) feature extraction and the support vector machine (SVM) classification algorithm are applied to identify and classify the multiple statuses of through-wall human being detection. This method makes full use of the KPCA of powerful, nonlinear feature extraction and SVMs, which can solve the problem of multiple-status detection and nonlinear pattern recognition. The experimental data that come from KPCA feature extraction are used as input to the SVM classification algorithm, some of which are used to train the model and the others to test the model. Experimental results showed that KPCA feature extraction and the SVM classification algorithm effectively distinguished four statuses of through-wall human being detection and achieved the desired results.
- Research Article
16
- 10.1155/2020/8880786
- Dec 22, 2020
- Applied Bionics and Biomechanics
Identifying patients with high risk of hip fracture is a great challenge in osteoporosis clinical assessment. Bone Mineral Density (BMD) measured by Dual-Energy X-Ray Absorptiometry (DXA) is the current gold standard in osteoporosis clinical assessment. However, its classification accuracy is only around 65%. In order to improve this accuracy, this paper proposes the use of Machine Learning (ML) models trained with data from a biomechanical model that simulates a sideways-fall. Machine Learning (ML) models are models able to learn and to make predictions from data. During a training process, ML models learn a function that maps inputs and outputs without previous knowledge of the problem. The main advantage of ML models is that once the mapping function is constructed, they can make predictions for complex biomechanical behaviours in real time. However, despite the increasing popularity of Machine Learning (ML) models and their wide application to many fields of medicine, their use as hip fracture predictors is still limited. This paper proposes the use of ML models to assess and predict hip fracture risk. Clinical, geometric, and biomechanical variables from the finite element simulation of a side fall are used as independent variables to train the models. Among the different tested models, Random Forest stands out, showing its capability to outperform BMD-DXA, achieving an accuracy over 87%, with specificity over 92% and sensitivity over 83%.
- Research Article
11
- 10.1038/s41598-022-08574-6
- Apr 7, 2022
- Scientific Reports
SARS-CoV-2 pandemic first emerged in late 2019 in China. It has since infected more than 298 million individuals and caused over 5 million deaths globally. The identification of essential proteins in a protein–protein interaction network (PPIN) is not only crucial in understanding the process of cellular life but also useful in drug discovery. There are many centrality measures to detect influential nodes in complex networks. Since SARS-CoV-2 and (H1N1) influenza PPINs pose 553 common human proteins. Analyzing influential proteins and comparing these networks together can be an effective step in helping biologists for drug-target prediction. We used 21 centrality measures on SARS-CoV-2 and (H1N1) influenza PPINs to identify essential proteins. We applied principal component analysis and unsupervised machine learning methods to reveal the most informative measures. Appealingly, some measures had a high level of contribution in comparison to others in both PPINs, namely Decay, Residual closeness, Markov, Degree, closeness (Latora), Barycenter, Closeness (Freeman), and Lin centralities. We also investigated some graph theory-based properties like the power law, exponential distribution, and robustness. Both PPINs tended to properties of scale-free networks that expose their nature of heterogeneity. Dimensionality reduction and unsupervised learning methods were so effective to uncover appropriate centrality measures.
- Research Article
- 10.1007/s00170-025-15327-y
- Mar 1, 2025
- The International Journal of Advanced Manufacturing Technology
In gas metal arc welding (GMAW) processes, including wire arc additive manufacturing (WAAM), machine learning (ML) is emerging as a powerful tool for monitoring both process and product anomalies. However, a significant challenge in real industrial environments is the reliance on large, balanced datasets for training supervised learning models. To address this issue, a shift toward unsupervised learning is gaining attention in this research field, offering the potential to work effectively with small and unbalanced datasets. However, different materials, sensors, and welding technologies have been used in the literature, making complex the comparison of the results. This work fills that gap by presenting a comprehensive comparison of both supervised and unsupervised learning methods. An experimental campaign was conducted on Invar 36 alloy—a material with limited WAAM research—where 15 wall structures were deposited with varying process parameters using the natural dip transfer process, aiming to identify the optimal parameters for this alloy. Data on welding current and voltage were captured, and during the qualification procedure, anomalies were detected, some of which led to product defects. Supervised, unsupervised, and semi-supervised ML approaches, along with a detailed frequency domain analysis of the collected signals, were applied to process the obtained unbalanced dataset. The results provide key insights: while supervised learning models can be applied to anomaly detection in small and unbalanced datasets, they are prone to overfitting, which limits their practical use due to the prevalence of normal cases over anomalies in the dataset, resulting in higher number of missed anomalies. In contrast, unsupervised models, with their lower generalization capability, tend to exhibit higher false alarm rates but better performance to identify anomalous data. This work not only compares in depth these data analytics methodologies but also offers guidance on selecting the appropriate ML algorithm based on specific industrial objectives and provides insights into the printability of Invar 36 for WAAM applications under natural dip transfer process.
- Research Article
2
- 10.1016/j.csbj.2023.10.033
- Jan 1, 2023
- Computational and Structural Biotechnology Journal
Survival analysis of patient groups defined by unsupervised machine learning clustering methods based on patient metabolomic data.
- Research Article
3
- 10.1007/s11707-019-0748-x
- Aug 8, 2019
- Frontiers of Earth Science
Unsupervised machine learning methods were applied on multivariate geophysical and geochemical datasets of ocean floor sediment cores collected from the South China Sea. The well-preserved and continuous core samples comprising high resolution Cenozoic sediment records enable scientists to carry out paleoenvironment studies in detail. Bayesian age-depth chronological models constructed from biostratigraphic control points for the drilling sites are applied on cluster boundaries generated from two popular unsupervised learning methods: K-means and random forest. The unsupervised learning methods experimented have produced compact and unambiguous clusters from the datasets, indicating that previously unknown data patterns can be revealed when all variables from the datasets are taken into account simultaneously. A study of synchroneity of past events represented by the cluster boundaries across geographically separated ocean drilling sites is achieved through converting the fixed depths of cluster boundaries into chronological ranges represented by Gaussian density plots which are then compared with known past events in the region. A Gaussian density peak at around 7.2 Ma has been identified from results of all three sites and it is suggested to coincide with the initiation of the East Asian monsoon. Contrary to traditional statistical approach, a priori assumptions are not required for unsupervised learning, and the clustering results serve as a novel data-driven proxy for studying the complex and dynamic processes of the paleoenvironment surrounding the ocean sediment. This work serves as a pioneering approach to extract valuable information of regional events and opens up a systematic and objective way to study the vast global ocean sediment datasets.
- Conference Article
- 10.3384/ecp208021
- Jun 14, 2024
We explore the use of various machine learning (ML) models for classifying lithologies utilizing data from X-ray fluorescence (XRF) and X-ray computed tomography (XCT). Typically, lithologies are identified over several meters, which restricts the use of ML models due to limited training data. To address this issue, we augment the original interval dataset, where lithologies are marked over extensive sections, into finer segments of 10cm, to produce a high resolution dataset with vastly increased sample size. Additionally, we examine the impact of adjacent lithologies on building a more generalized ML model. We also demonstrate that combining XRF and XCT data leads to an improved classification accuracy compared to using only XRF data, which is the common practice in current studies, or solely relying on XCT data.
- Research Article
152
- 10.1145/3392878
- May 28, 2020
- Proceedings of the ACM on Human-Computer Interaction
As the use of machine learning (ML) models in product development and data-driven decision-making processes became pervasive in many domains, people's focus on building a well-performing model has increasingly shifted to understanding how their model works. While scholarly interest in model interpretability has grown rapidly in research communities like HCI, ML, and beyond, little is known about how practitioners perceive and aim to provide interpretability in the context of their existing workflows. This lack of understanding of interpretability as practiced may prevent interpretability research from addressing important needs, or lead to unrealistic solutions. To bridge this gap, we conducted 22 semi-structured interviews with industry practitioners to understand how they conceive of and design for interpretability while they plan, build, and use their models. Based on a qualitative analysis of our results, we differentiate interpretability roles, processes, goals and strategies as they exist within organizations making heavy use of ML models. The characterization of interpretability work that emerges from our analysis suggests that model interpretability frequently involves cooperation and mental model comparison between people in different roles, often aimed at building trust not only between people and models but also between people within the organization. We present implications for design that discuss gaps between the interpretability challenges that practitioners face in their practice and approaches proposed in the literature, highlighting possible research directions that can better address real-world needs.
- Conference Article
- 10.36880/c16.02943
- Jun 1, 2024
In recent years, the application of machine learning models in finance has attracted great attention due to its potential to improve decision-making processes and risk management strategies. The aim of this article is to present a comprehensive review of academic research conducted in Turkey on the use of machine learning models in finance. It aims to identify machine learning techniques commonly used in Turkish finance studies, evaluate their effectiveness, and provide insights into successful applications. The findings reveal that regression analysis is widely used in predicting financial variables such as stock prices and exchange rates. Clustering techniques have been effective in customer segmentation and market basket analysis. Decision trees are frequently used in credit scoring and fraud detection tasks due to their interpretability and ease of implementation. Moreover, artificial neural networks, especially deep learning algorithms; It has shown promising results in complex financial tasks such as sentiment analysis, anomaly detection, and algorithmic trading. In conclusion, this review underlines the significant potential of machine learning models in finance in Turkey. A few suggestions can be made regarding machine learning in finance in Turkey to identify future research areas. These may include developing customized machine learning models for specific financial applications that require more in-depth analysis, improving the quality and size of datasets, and investigating new techniques outside of existing models. There is also a need for more studies to provide practical guidance on how machine learning techniques are applied by financial institutions and how these applications can be improved.
- Research Article
2
- 10.38088/jise.1134816
- Oct 17, 2022
- Journal of Innovative Science and Engineering (JISE)
Today, with the development of technology, the decision-making capabilities of machines have also increased. With their high analytical skills, computers can easily catch points and relationships that may escape the human eye. Thanks to these capabilities, machines are also widely used in the field of health. For example, many machine learning techniques developed on cancer prediction have been successfully applied. Early detection of cancer is crucial to survival. In the early diagnosis of cancer, the rates of drug treatment, chemotherapy or radiotherapy that the person will be exposed to are significantly reduced and the patient gets through this process with the least amount of wear and tear. Gene Expression Cancer RNA-Seq Dataset was used in this study. This data set includes gene expression values of 5 cancer types (BRCA, KIRC, LUAD, LUSC, UCEC). DNA sequences in the dataset were analyzed using k-means and hierarchical clustering algorithms, which are unsupervised machine learning methods. The aim of the study is to develop a usable machine learning model for early detection of cancer at the gene level. Adjusted Rand Index (ARI), Silhouette Score, and Accuracy metrics were used to evaluate the analysis results. The rand index calculates similarity between clusters by counting the binaries assigned to clusters. The adjusted Rand Index is a randomly adjusted version of the Rand Index. The silhouette score indicates how well a data point fits within its own set among separated datasets. The accuracy metric is obtained as a percentage of correctly clustered data points divided by all predictions. Different connection methods are used in the hierarchical clustering algorithm. These are 'complete', 'ward', 'average' and 'single'. As a result of the study, the accuracy in the k-means algorithm was 0.990, the Adjusted Rand Index was 0.79, and the Silhouette Score was 0.14. Looking at the hierarchical clustering, ward performed the best of the four linkage methods, with an ARI score of 0.76 and a silhouette score of 0.13. As a result of the study, the accuracy of in the hierarchical clustering algorithm was 0.999.
- Research Article
23
- 10.1109/mci.2018.2807039
- May 1, 2018
- IEEE Computational Intelligence Magazine
One of the fundamental challenges in brain-computer interfaces (BCIs) is to tune a brain signal decoder to reliably detect a user's intention. While information about the decoder can partially be transferred between subjects or sessions, optimal decoding performance can only be reached with novel data from the current session. Thus, it is preferable to learn from unlabeled data gained from the actual usage of the BCI application instead of conducting a calibration recording prior to BCI usage. We review such unsupervised machine learning methods for BCIs based on event-related potentials of the electroencephalogram. We present results of an online study with twelve healthy participants controlling a visual speller. Online performance is reported for three completely unsupervised learning methods: (1) learning from label proportions, (2) an expectation-maximization approach and (3) MIX, which combines the strengths of the two other methods. After a short ramp-up, we observed that the MIX method not only defeats its two unsupervised competitors but even performs on par with a state-of-the-art regularized linear discriminant analysis trained on the same number of data points and with full label access. With this online study, we deliver the best possible proof in BCI that an unsupervised decoding method can in practice render a supervised method unnecessary. This is possible despite skipping the calibration, without losing much performance and with the prospect of continuous improvement over a session. Thus, our findings pave the way for a transition from supervised to unsupervised learning methods in BCIs based on eventrelated potentials.
- Conference Article
11
- 10.2118/188228-ms
- Nov 13, 2017
The formation mechanism and utilization conditions of the remaining oil in the high water cut period play significant roles in improved tapping potential and enhanced oil recovery. The classification of the remaining oil is a difficult point, meanwhile a burning issue. However, the current classification method is mainly through the manual method to determine the boundaries of classification, time-consuming and has a great subjectivity. Machine learning and data mining methods in recent years have been widely used in the field of petroleum engineering, such as prediction of the recovery factor and so on, especially the well-known k-means classification algorithm. The first objective of this paper is to use the semi-supervised learning (SSL) method to realize the classification of remaining oil in the high water cut period, based on the database obtained from experiments of 2D etched glass micro-model, with the help of the technique of quantitative characterization of pore structure and micro-residual oil. The method of principal component analysis (PCA) is used to reduce the dimension of the data. According to the formation causes, remaining oil can be divided into four types: oil film, throat retained oil, heterogeneous multi-pores oil and clustered oil. Two typical blocks are identified manually for each class, with an increased weight coefficients, then the other oil blocks with smaller weights are clustered into their types by the seeded k-means algorithm. The result shows that semi-supervised method is more effective than both supervised learning (with manual boundaries) and unsupervised learning methods. Based on the classification, the effects on the formation of heterogeneous multi-pores oil and throat retained oil are analyzed by statistical method. All of these quantitative studies can provide theoretical guidance for the use of residual oil in high water cut periods and increased oil recovery.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.