Set Of Training Instances Research Articles

The non-linear nature of deep neural networks makes it difficult to interpret the reason behind their output, thus reducing verifiability of the system where these models are applied. Understanding the patterns between activation vectors and predictions could give insight as to erroneous classifications and how to identify them. This paper explains a systematic approach to identifying the clusters with the most misclassifications or false label annotations. For this research, we extracted the activation vectors from a deep learning model, DNABERT, and visualized them using t-SNE to decode the reason behind the results that are produced. We applied K-means in a hierarchical fashion on the activation vectors for a set of training instances. We analyzed cluster mean activation vectors to find any patterns in the errors across K-means clusters. The cluster analysis revealed that the predictions were uniform, or nearly 100 percent the same, in clusters of similar activation vectors. It was found that two clusters containing most of their objects belonging to the same true class tend to be closer together than clusters of opposite classes. The means of objects of the same true label are closer if two clusters have the same predicted labels rather than opposite predicted labels, showing that the activation vectors reflect both predicted and true classes. We did a similar analysis for all 26 organisms in the dataset, showing the Euclidean distance can be used for identifying clusters with many errors. We propose a heuristic to find the clusters with a high number of misclassifications or incorrect label annotations using the vector analysis between clusters. This can aid in identifying misclassifications of DNA sequences or problems with sequence tagging.

Read full abstract

Sketch recognition is the task of converting hand-drawn digital ink into symbolic computer representations. Since the early days of sketch recognition, the bulk of the work in the field focused on building accurate recognition algorithms for specific domains, and well defined data sets. Recognition methods explored so far have been developed and evaluated using standard machine learning pipelines and have consequently been built over many simplifying assumptions. For example, existing frameworks assume the presence of a fixed set of symbol classes, and the availability of plenty of annotated examples. However, in practice, these assumptions do not hold. In reality, the designer of a sketch recognition system starts with no labeled data at all, and faces the burden of data annotation. In this work, we propose to alleviate the burden of annotation by building systems that can learn from very few labeled examples, and large amounts of unlabeled data. Our systems perform self-learning by automatically extending a very small set of labeled examples with new examples extracted from unlabeled sketches. The end result is a sufficiently large set of labeled training data, which can subsequently be used to train classifiers. We present four self-learning methods with varying levels of implementation difficulty and runtime complexities. One of these methods leverages contextual co-occurrence patterns to build verifiably more diverse set of training instances. Rigorous experiments with large sets of data demonstrate that this novel approach based on exploiting contextual information leads to significant leaps in recognition performance. As a side contribution, we also demonstrate the utility of bagging for sketch recognition in imbalanced data sets with few positive examples and many outliers.

Read full abstract

Set Of Training Instances Research Articles

Related Topics

Articles published on Set Of Training Instances

Finding BERT Errors by Clustering Activation Vectors

Learn to optimize-a brief overview.

Learning shared and non-redundant label-specific features for partial multi-label classification

Exemplar-model account of categorization and recognition when training instances never repeat.

Framework of algorithm portfolios for strip packing problem

A Sampling-Based Stack Framework for Imbalanced Learning in Churn Prediction

Enhancing performance of gene expression value prediction with cluster-based regression.

A Genetic Programming Approach for Evolving Variable Selectors in Constraint Programming

Local-Set Based-on Instance Selection Approach for Autonomous Object Modelling

Sketch recognition with few examples

Severe-occluded 3D object identification via region-based descriptions

A discriminative model selection approach and its application to text classification

An Optimization-Based Method for Feature Ranking in Nonlinear Regression Problems.

Dimensionality reduction by feature clustering for regression problems

Planning through Automatic Portfolio Configuration: The PbP Approach

Heuristic Search When Time Matters

Heuristic search when time matters

Anomalistic sequence detection

An evolutionary approach for achieving scalability with general regression neural networks

Learning Concepts by Arranging Appropriate Training Order

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Set Of Training Instances Research Articles

Related Topics

Articles published on Set Of Training Instances

Finding BERT Errors by Clustering Activation Vectors

Learn to optimize-a brief overview.

Learning shared and non-redundant label-specific features for partial multi-label classification

Exemplar-model account of categorization and recognition when training instances never repeat.

Framework of algorithm portfolios for strip packing problem

A Sampling-Based Stack Framework for Imbalanced Learning in Churn Prediction

Enhancing performance of gene expression value prediction with cluster-based regression.

A Genetic Programming Approach for Evolving Variable Selectors in Constraint Programming

Local-Set Based-on Instance Selection Approach for Autonomous Object Modelling

Sketch recognition with few examples

Severe-occluded 3D object identification via region-based descriptions

A discriminative model selection approach and its application to text classification

An Optimization-Based Method for Feature Ranking in Nonlinear Regression Problems.

Dimensionality reduction by feature clustering for regression problems

Planning through Automatic Portfolio Configuration: The PbP Approach

Heuristic Search When Time Matters

Heuristic search when time matters

Anomalistic sequence detection

An evolutionary approach for achieving scalability with general regression neural networks

Learning Concepts by Arranging Appropriate Training Order