Imbalanced Labels Research Articles

In many coastal cities around the world, continuing water degradation threatens the living environment of humans and aquatic organisms. To assess and control the water pollution situation, this study estimated the Biochemical Oxygen Demand (BOD) concentration of Hong Kong's marine waters using remote sensing and an improved machine learning (ML) method. The scheme was derived from four ML algorithms (RBF, SVR, RF, XGB) and calibrated using a large amount (N > 1000) of in-situ BOD5 data. Based on labeled datasets with different preprocessing, i.e., the original BOD5, the log10(BOD5), and label distribution smoothing (LDS), three types of models were trained and evaluated. The results highlight the superior potential of the LDS-based model to improve BOD5 estimate by dealing with imbalanced training dataset. Additionally, XGB and RF outperformed RBF and SVR when the model was developed using log10(BOD5) or LDS(BOD5). Over two decades, the BOD5 concentration of Hong Kong marine waters in the autumn (Sep. to Nov.) shows a downward trend, with significant decreases in Deep Bay, Western Buffer, Victoria Harbour, Eastern Buffer, Junk Bay, Port Shelter, and the Tolo Harbour and Channel. Principal component analysis revealed that nutrient levels emerged as the predominant factor in Victoria Harbour and the interior of Deep Bay, while chlorophyll-related and physical parameters were dominant in Southern, Mirs Bay, Northwestern, and the outlet of Deep Bay. LDS provides a new perspective to improve ML-based water quality estimation by alleviating the imbalance in the labeled dataset. Overall, the remotely sensed BOD5 can offer insight into the spatial-temporal distribution of organic matter in Hong Kong coastal waters and valuable guidance for the pollution control.

PurposeTraditional Chinese medicine (TCM) prescriptions have always relied on the experience of TCM doctors, and machine learning(ML) provides a technical means for learning these experiences and intelligently assists in prescribing. However, in TCM prescription, there are the main (Jun) herb and the auxiliary (Chen, Zuo and Shi) herb collocations. In a prescription, the types of auxiliary herbs are often more than the main herb and the auxiliary herbs often appear in other prescriptions. This leads to different frequencies of different herbs in prescriptions, namely, imbalanced labels (herbs). As a result, the existing ML algorithms are biased, and it is difficult to predict the main herb with less frequency in the actual prediction and poor performance. In order to solve the impact of this problem, this paper proposes a framework for multi-label traditional Chinese medicine (ML-TCM) based on multi-label resampling.Design/methodology/approachIn this work, a multi-label learning framework is proposed that adopts and compares the multi-label random resampling (MLROS), multi-label synthesized resampling (MLSMOTE) and multi-label synthesized resampling based on local label imbalance (MLSOL), three multi-label oversampling techniques to rebalance the TCM data.FindingsThe experimental results show that after resampling, the less frequent but important herbs can be predicted more accurately. The MLSOL method is shown to be the best with over 10% improvements on average because it balances the data by considering both features and labels when resampling.Originality/valueThe authors first systematically analyzed the label imbalance problem of different sampling methods in the field of TCM and provide a solution. And through the experimental results analysis, the authors proved the feasibility of this method, which can improve the performance by 10%−30% compared with the state-of-the-art methods.

Imbalanced Labels Research Articles

Related Topics

Articles published on Imbalanced Labels

Towards value-sensitive and poisoning-proof model aggregation for federated learning on heterogeneous data

Predicting Multiple Outcomes Associated with Frailty based on Imbalanced Multi-label Classification.

A Multitask Dynamic Graph Attention Autoencoder for Imbalanced Multilabel Time Series Classification.

Oversampling multi-label data based on natural neighbor and label correlation

Remotely sensed estimates of long-term biochemical oxygen demand over Hong Kong marine waters using machine learning enhanced by imbalanced label optimisation

IRDA: Implicit data augmentation for deep imbalanced regression

Federated variational generative learning for heterogeneous data in distributed environments

Toward Robustness in Multi-Label Classification: A Data Augmentation Strategy against Imbalance and Noise

Enhancing Predictive Models: An In-depth Analysis of Feature Selection Techniques Coupled with Boosting Algorithms

A heuristic method for discovering multi-class classification rules from multi-source data in cloud–edge system

Multi-label feature selection with global and local label correlation

Nonlocal Hybrid Network for Long-tailed Image Classification

Large-Scale Object Detection in the Wild With Imbalanced Data Distribution, and Multi-Labels.

Transfer Learning for Anomaly Detection Using Bearings' Vibration Signals

Clustered FedStack: Intermediate Global Models with Bayesian Information Criterion

Hyperbolic graph convolutional neural network with contrastive learning for automated ICD coding

Prediction of traditional Chinese medicine prescriptions based on multi-label resampling

A Multi-Label Learning Framework for Predicting Chemical Classes and Biological Activities of Natural Products from Biosynthetic Gene Clusters.

Curriculum label distribution learning for imbalanced medical image segmentation.

LPN: Label-Enhanced Prototypical Network for Legal Judgment Prediction.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Imbalanced Labels Research Articles

Related Topics

Articles published on Imbalanced Labels

Towards value-sensitive and poisoning-proof model aggregation for federated learning on heterogeneous data

Predicting Multiple Outcomes Associated with Frailty based on Imbalanced Multi-label Classification.

A Multitask Dynamic Graph Attention Autoencoder for Imbalanced Multilabel Time Series Classification.

Oversampling multi-label data based on natural neighbor and label correlation

Remotely sensed estimates of long-term biochemical oxygen demand over Hong Kong marine waters using machine learning enhanced by imbalanced label optimisation

IRDA: Implicit data augmentation for deep imbalanced regression

Federated variational generative learning for heterogeneous data in distributed environments

Toward Robustness in Multi-Label Classification: A Data Augmentation Strategy against Imbalance and Noise

Enhancing Predictive Models: An In-depth Analysis of Feature Selection Techniques Coupled with Boosting Algorithms

A heuristic method for discovering multi-class classification rules from multi-source data in cloud–edge system

Multi-label feature selection with global and local label correlation

Nonlocal Hybrid Network for Long-tailed Image Classification

Large-Scale Object Detection in the Wild With Imbalanced Data Distribution, and Multi-Labels.

Transfer Learning for Anomaly Detection Using Bearings' Vibration Signals

Clustered FedStack: Intermediate Global Models with Bayesian Information Criterion

Hyperbolic graph convolutional neural network with contrastive learning for automated ICD coding

Prediction of traditional Chinese medicine prescriptions based on multi-label resampling

A Multi-Label Learning Framework for Predicting Chemical Classes and Biological Activities of Natural Products from Biosynthetic Gene Clusters.

Curriculum label distribution learning for imbalanced medical image segmentation.

LPN: Label-Enhanced Prototypical Network for Legal Judgment Prediction.