Abstract
Most of the traditional classification algorithms are based on the uniform distribution of samples, and the effect is not ideal when dealing with such data, which mainly shows that the classification results incline to the majority class. Therefore, we propose the imbalanced multi-source heterogeneous data classification algorithms in this paper, which are mainly based on the expansion and extension of Support Vector Machines. Considering that there are complex connections within multi-source data, express them as a unified, concise and efficient mathematical model can completely retain data information and improve data processing efficiency. We perform tensor representation and feature extraction on the heterogeneous data, and two different classification algorithms are proposed in this paper. In the first method, we represent multi-source heterogeneous data into a unified tensor form directly and obtain a high-quality core data through dimensionality reduction algorithm, then realize data classification by Support Tensor Machine. In the other method, we extract data from different data sources and classify them with Ensemble Deep Support Vector Machine (DSVM), which combined three DSVM with different kernel functions. The algorithms are compared on CUAVE data set, which contains two different modalities of sound and picture.
Highlights
In the current data collection system, a large number data collect from multi-sources sensor network and obtaining a large amount of heterogeneous data
(ii) Ensemble learning and deep learning are applied to Support Vector Machine, and an Ensemble Deep Support Vector Machine (DSVM) model is proposed to realize the classification of multisource heterogeneous data, which can avoid the limitations of a single classifier, and make full use of the excellent performance of multi-level deep learning
(i) The input for Ensemble DSVM is vector data, which is obtained by combining the image features after singular value decomposition (SVD) and the voice features after principal component analysis (PCA)
Summary
In the current data collection system, a large number data collect from multi-sources sensor network and obtaining a large amount of heterogeneous data. W. Wang et al.: Imbalanced Data Classification for Multi-Source Heterogenous Sensor Networks only one class with SVM, which selects minority as the target class to train model and the class with few samples in the test data are identified, which proves the effectiveness of the method. The joint representation and feature extraction of multi-source heterogeneous data are carried out, and its classification is realized by STM, which can well preserve the spatial structure of high dimensional data. (ii) Ensemble learning and deep learning are applied to Support Vector Machine, and an Ensemble DSVM model is proposed to realize the classification of multisource heterogeneous data, which can avoid the limitations of a single classifier, and make full use of the excellent performance of multi-level deep learning. The rest of the article is arranged as follows: Section II mainly introduces the definition of tensor geometry involved in this article and the feature extraction of tensor data; and the classification algorithms will be proposed in Section III; the Section IV validates the algorithms on the CUAVE dataset and compares theirs classification performance; Section V are the summarization and future work
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have