Task Dataset Research Articles

In practical classification tasks, the sample distribution of the dataset is often unbalanced; for example, this is the case in a dataset that contains a massive quantity of samples with weak labels and for which concrete identification is unavailable. Even in samples with exact labels, the number of samples corresponding to many labels is small, resulting in difficulties in learning the concepts through a small number of labeled samples. In addition, there is always a small interclass variance and a large intraclass variance among categories. Weak labels, few-shot problems, and fine-grained analysis are the key challenges affecting the performance of the classification model. In this paper, we develop a progressive training technique to address the few-shot challenge, along with a weak-label boosting method, by considering all of the weak IDs as negative samples of every predefined ID in order to take full advantage of the more numerous weak-label data. We introduce an instance-aware hard ID mining strategy in the classification loss and then further develop the global and local feature-mapping loss to expand the decision margin. We entered the proposed method into the Kaggle competition, which aims to build an algorithm to identify individual humpback whales in images. With a few other common training tricks, the proposed approach won first place in the competition. All three problems (weak labels, few-shot problems, and fine-grained analysis) exist in the dataset used in the competition. Additionally, we applied our method to CUB-2011 and Cars-196, which are the most widely-used datasets for fine-grained visual categorization tasks, and achieved respective accuracies of 90.1% and 94.9%. This experiment shows that the proposed method achieves the optimal effect compared with other common baselines, and verifies the effectiveness of our method. Our solution has been made available as an open source project.

Read full abstract

Smart Crowd management (SCM) solutions can mitigate overcrowding disasters by implementing efficient crowd learning models that can anticipate critical crowd conditions and potential catastrophes. Developing an SCM solution involves monitoring crowds and modelling their dynamics. Crowd monitoring produces vast amounts of data, with features such as densities and speeds, which are essential for training and evaluating crowd learning models. By and large, crowd datasets can be classified as real (e.g., real monitoring of crowds) or synthetic (e.g., simulation of crowds). Using real crowd datasets can produce effective and reliable crowd learning models. However, acquiring real crowd data faces several challenges, including the expensive installation of a sensory infrastructure, the data pre-processing costs and the lack of real datasets that cover particular crowd scenarios. Consequently, crowd management literature has adopted simulation tools for generating synthetic datasets to overcome the challenges associated with their real counterparts. The majority of existing datasets, whether real or synthetic, can be used for crowd counting applications or analysing the activities of individuals rather than collective crowd behaviour. Accordingly, this paper demonstrates the process of generating bespoke synthetic crowd datasets that can be used for crowd anomaly detection and prediction, using the MassMotion crowd simulator. The developed datasets present two types of crowd anomalies; namely, high densities and contra-flow walking direction. These datasets are: SIMulated Crowd Data (SIMCD)-Single Anomaly and SIMCD-Multiple Anomalies for anomaly detection tasks, besides two SIMCD-Prediction datasets for crowd prediction tasks. Furthermore, the paper demonstrates the data preparation (pre-processing) process by aggregating the data and proposing new essential features, such as the level of crowdedness and the crowd severity level, that are useful for developing crowd prediction and anomaly detection models.

Read full abstract

Task Dataset Research Articles

Related Topics

Articles published on Task Dataset

Selective Layer Tuning and Performance Study of Pre-Trained Models Using Genetic Algorithm

Multi-way relation-enhanced hypergraph representation learning for anti-cancer drug synergy prediction.

Robust Feature Selection-Based Speech Emotion Classification Using Deep Transfer Learning

A scoping review of publicly available language tasks in clinical natural language processing.

Reproducing FSL's fMRI data analysis via Nipype: Relevance, challenges, and solutions.

A Photovoltaic Power Predicting Model Using the Differential Evolution Algorithm and Multi-Task Learning

CALVIN: A Benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks

Hierarchical Decision Granules Optimization through the Principle of Justifiable Granularity

Unsupervised Sentence Representation via Contrastive Learning with Mixing Negatives

Polarity and Subjectivity Detection with Multitask Learning and BERT Embedding

Occupational sex composition and the relative pay of managerial work

CDANet: Common-and-Differential Attention Network for Object Detection and Instance Segmentation

Adversarial training for supervised relation extraction

VDM-DA: Virtual Domain Modeling for Source Data-Free Domain Adaptation

Computational Models of Linguistic Alignment for Clustering Group Participants and Predicting Task Outcomes

Progressive Training Technique with Weak-Label Boosting for Fine-Grained Classification on Unbalanced Training Data

NEAR: Named entity and attribute recognition of clinical concepts

Challenges of Vehicle Classification Using Acoustics

SIMCD: SIMulated crowd data for anomaly detection and prediction

3DGT-DDI: 3D graph and text based neural network for drug-drug interaction prediction.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Task Dataset Research Articles

Related Topics

Articles published on Task Dataset

Selective Layer Tuning and Performance Study of Pre-Trained Models Using Genetic Algorithm

Multi-way relation-enhanced hypergraph representation learning for anti-cancer drug synergy prediction.

Robust Feature Selection-Based Speech Emotion Classification Using Deep Transfer Learning

A scoping review of publicly available language tasks in clinical natural language processing.

Reproducing FSL's fMRI data analysis via Nipype: Relevance, challenges, and solutions.

A Photovoltaic Power Predicting Model Using the Differential Evolution Algorithm and Multi-Task Learning

CALVIN: A Benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks

Hierarchical Decision Granules Optimization through the Principle of Justifiable Granularity

Unsupervised Sentence Representation via Contrastive Learning with Mixing Negatives

Polarity and Subjectivity Detection with Multitask Learning and BERT Embedding

Occupational sex composition and the relative pay of managerial work

CDANet: Common-and-Differential Attention Network for Object Detection and Instance Segmentation

Adversarial training for supervised relation extraction

VDM-DA: Virtual Domain Modeling for Source Data-Free Domain Adaptation

Computational Models of Linguistic Alignment for Clustering Group Participants and Predicting Task Outcomes

Progressive Training Technique with Weak-Label Boosting for Fine-Grained Classification on Unbalanced Training Data

NEAR: Named entity and attribute recognition of clinical concepts

Challenges of Vehicle Classification Using Acoustics

SIMCD: SIMulated crowd data for anomaly detection and prediction

3DGT-DDI: 3D graph and text based neural network for drug-drug interaction prediction.