Uniform Manifold Approximation Research Articles

Data from the social media platform X (formerly Twitter) can provide insights into the types of language that are used when discussing drug use. In past research using latent Dirichlet allocation (LDA), we found that tweets containing "street names" of prescription drugs were difficult to classify due to the similarity to other colloquialisms and lack of clarity over how the terms were used. Conversely, "brand name" references were more amenable to machine-driven categorization. This study sought to use next-generation techniques (beyond LDA) from natural language processing to reprocess X data and automatically cluster groups of tweets into topics to differentiate between street- and brand-name data sets. We also aimed to analyze the differences in emotional valence between the 2 data sets to study the relationship between engagement on social media and sentiment. We used the Twitter application programming interface to collect tweets that contained the street and brand name of a prescription drug within the tweet. Using BERTopic in combination with Uniform Manifold Approximation and Projection and k-means, we generated topics for the street-name corpus (n=170,618) and brand-name corpus (n=245,145). Valence Aware Dictionary and Sentiment Reasoner (VADER) scores were used to classify whether tweets within the topics had positive, negative, or neutral sentiments. Two different logistic regression classifiers were used to predict the sentiment label within each corpus. The first model used a tweet's engagement metrics and topic ID to predict the label, while the second model used those features in addition to the top 5000 tweets with the largest term-frequency-inverse document frequency score. Using BERTopic, we identified 40 topics for the street-name data set and 5 topics for the brand-name data set, which we generalized into 8 and 5 topics of discussion, respectively. Four of the general themes of discussion in the brand-name corpus referenced drug use, while 2 themes of discussion in the street-name corpus referenced drug use. From the VADER scores, we found that both corpora were inclined toward positive sentiment. Adding the vectorized tweet text increased the accuracy of our models by around 40% compared with the models that did not incorporate the tweet text in both corpora. BERTopic was able to classify tweets well. As with LDA, the discussion using brand names was more similar between tweets than the discussion using street names. VADER scores could only be logically applied to the brand-name corpus because of the high prevalence of non-drug-related topics in the street-name data. Brand-name tweets either discussed drugs positively or negatively, with few posts having a neutral emotionality. From our machine learning models, engagement alone was not enough to predict the sentiment label; the added context from the tweets was needed to understand the emotionality of a tweet.

Aims/hypothesisClustering-based subclassification of type 2 diabetes, which reflects pathophysiology and genetic predisposition, is a promising approach for providing personalised and effective therapeutic strategies. Ahlqvist’s classification is currently the most vigorously validated method because of its superior ability to predict diabetes complications but it does not have strong consistency over time and requires HOMA2 indices, which are not routinely available in clinical practice and standard cohort studies. We developed a machine learning (ML) model to classify individuals with type 2 diabetes into Ahlqvist’s subtypes consistently over time.MethodsCohort 1 dataset comprised 619 Japanese individuals with type 2 diabetes who were divided into training and test sets for ML models in a 7:3 ratio. Cohort 2 dataset, comprising 597 individuals with type 2 diabetes, was used for external validation. Participants were pre-labelled (T2Dkmeans) by unsupervised k-means clustering based on Ahlqvist’s variables (age at diagnosis, BMI, HbA1c, HOMA2-B and HOMA2-IR) to four subtypes: severe insulin-deficient diabetes (SIDD), severe insulin-resistant diabetes (SIRD), mild obesity-related diabetes (MOD) and mild age-related diabetes (MARD). We adopted 15 variables for a multiclass classification random forest (RF) algorithm to predict type 2 diabetes subtypes (T2DRF15). The proximity matrix computed by RF was visualised using a uniform manifold approximation and projection. Finally, we used a putative subset with missing insulin-related variables to test the predictive performance of the validation cohort, consistency of subtypes over time and prediction ability of diabetes complications.ResultsT2DRF15 demonstrated a 94% accuracy for predicting T2Dkmeans type 2 diabetes subtypes (AUCs ≥0.99 and F1 score [an indicator calculated by harmonic mean from precision and recall] ≥0.9) and retained the predictive performance in the external validation cohort (86.3%). T2DRF15 showed an accuracy of 82.9% for detecting T2Dkmeans, also in a putative subset with missing insulin-related variables, when used with an imputation algorithm. In Kaplan–Meier analysis, the diabetes clusters of T2DRF15 demonstrated distinct accumulation risks of diabetic retinopathy in SIDD and that of chronic kidney disease in SIRD during a median observation period of 11.6 (4.5–18.3) years, similarly to the subtypes using T2Dkmeans. The predictive accuracy was improved after excluding individuals with low predictive probability, who were categorised as an ‘undecidable’ cluster. T2DRF15, after excluding undecidable individuals, showed higher consistency (100% for SIDD, 68.6% for SIRD, 94.4% for MOD and 97.9% for MARD) than T2Dkmeans.Conclusions/interpretationThe new ML model for predicting Ahlqvist’s subtypes of type 2 diabetes has great potential for application in clinical practice and cohort studies because it can classify individuals with missing HOMA2 indices and predict glycaemic control, diabetic complications and treatment outcomes with long-term consistency by using readily available variables. Future studies are needed to assess whether our approach is applicable to research and/or clinical practice in multiethnic populations.Graphical

Uniform Manifold Approximation Research Articles

Related Topics

Articles published on Uniform Manifold Approximation

Generation of chemical library of near-IR dyes for photovoltaics applications

Enhanced synthetic generation of channel state information for millimeter‐wave networks in 5G communication systems

Identification of Beef Odors under Different Storage Day and Processing Temperature Conditions Using an Odor Sensing System.

Dimensionality reduction distills complex evolutionary relationships in seasonal influenza and SARS-CoV-2.

Non-Intrusive Load Monitoring Based on Dimensionality Reduction and Adapted Spatial Clustering

Digital Epidemiology of Prescription Drug References on X (Formerly Twitter): Neural Network Topic Modeling and Sentiment Analysis.

Assessing spirlin Alburnoides bipunctatus (Bloch, 1782) as an early indicator of climate change and anthropogenic stressors using ecological modeling and machine learning

Machine learning-based reproducible prediction of type 2 diabetes subtypes

Using machine learning techniques for exploration and classification of laboratory data

Single-cell transcriptomic analysis reveals a decrease in the frequency of macrophage-RGS1high subsets in patients with osteoarticular tuberculosis

Benchmarking clustering, alignment, and integration methods for spatial transcriptomics

Crop Water Status Analysis from Complex Agricultural Data Using UMAP-Based Local Biplot

Detection of defects in composite insulators based on laser‐induced plasma combined with machine learning

Comparative Analysis of Manifold Learning-Based Dimension Reduction Methods: A Mathematical Perspective

Hunting for Polluted White Dwarfs and Other Treasures with Gaia XP Spectra and Unsupervised Machine Learning

Spectroscopic Phenological Characterization of Mangrove Communities

Enhanced classification of pyrite generations based on mineral chemistry using uniform manifold approximation and projection (UMAP)

Deep learning-based electricity theft prediction in non-smart grid environments

Enhancing the rationale of convolutional neural networks for glitch classification in gravitational wave detectors: a visual explanation

Compressed representation of brain genetic transcription.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Uniform Manifold Approximation Research Articles

Related Topics

Articles published on Uniform Manifold Approximation

Generation of chemical library of near-IR dyes for photovoltaics applications

Enhanced synthetic generation of channel state information for millimeter‐wave networks in 5G communication systems

Identification of Beef Odors under Different Storage Day and Processing Temperature Conditions Using an Odor Sensing System.

Dimensionality reduction distills complex evolutionary relationships in seasonal influenza and SARS-CoV-2.

Non-Intrusive Load Monitoring Based on Dimensionality Reduction and Adapted Spatial Clustering

Digital Epidemiology of Prescription Drug References on X (Formerly Twitter): Neural Network Topic Modeling and Sentiment Analysis.

Assessing spirlin Alburnoides bipunctatus (Bloch, 1782) as an early indicator of climate change and anthropogenic stressors using ecological modeling and machine learning

Machine learning-based reproducible prediction of type 2 diabetes subtypes

Using machine learning techniques for exploration and classification of laboratory data

Single-cell transcriptomic analysis reveals a decrease in the frequency of macrophage-RGS1high subsets in patients with osteoarticular tuberculosis

Benchmarking clustering, alignment, and integration methods for spatial transcriptomics

Crop Water Status Analysis from Complex Agricultural Data Using UMAP-Based Local Biplot

Detection of defects in composite insulators based on laser‐induced plasma combined with machine learning

Comparative Analysis of Manifold Learning-Based Dimension Reduction Methods: A Mathematical Perspective

Hunting for Polluted White Dwarfs and Other Treasures with Gaia XP Spectra and Unsupervised Machine Learning

Spectroscopic Phenological Characterization of Mangrove Communities

Enhanced classification of pyrite generations based on mineral chemistry using uniform manifold approximation and projection (UMAP)

Deep learning-based electricity theft prediction in non-smart grid environments

Enhancing the rationale of convolutional neural networks for glitch classification in gravitational wave detectors: a visual explanation

Compressed representation of brain genetic transcription.