Efficient data reduction in multimedia data

Surong Wang,Min Xu,Manoranjan Dash,Liang-Tien Chia

doi:10.1007/s10489-006-0112-1

Abstract

As the amount of multimedia data is increasing day-by-day thanks to cheaper storage devices and increasing number of information sources, the machine learning algorithms are faced with large-sized datasets. When original data is huge in size small sample sizes are preferred for various applications. This is typically the case for multimedia applications. But using a simple random sample may not obtain satisfactory results because such a sample may not adequately represent the entire data set due to random fluctuations in the sampling process. The difficulty is particularly apparent when small sample sizes are needed. Fortunately the use of a good sampling set for training can improve the final results significantly. In KDD'03 we proposed EASE that outputs a sample based on its `closeness' to the original sample. Reported results show that EASE outperforms simple random sampling (SRS). In this paper we propose EASIER that extends EASE in two ways. (1) EASE is a halving algorithm, i.e., to achieve the required sample ratio it starts from a suitable initial large sample and iteratively halves. EASIER, on the other hand, does away with the repeated halving by directly obtaining the required sample ratio in one iteration. (2) EASE was shown to work on IBM QUEST dataset which is a categorical count data set. EASIER, in addition, is shown to work on continuous data of images and audio features. We have successfully applied EASIER to image classification and audio event identification applications. Experimental results show that EASIER outperforms SRS significantly.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Efficient data reduction in multimedia data

Abstract

Talk to us

Similar Papers

More From: Applied Intelligence

Lead the way for us

Journal: Applied Intelligence	Publication Date: Dec 1, 2006
Citations: 32

Similar Papers

Efficient sampling of training set in large and noisy multimedia data
Surong Wang ... Manoranjan Dash
ACM Transactions on Multimedia Computing, Communications, and Applications | VOL. 3
Surong Wang, et. al.Surong Wang ... Manoranjan Dash
01 Aug 2007
ACM Transactions on Multimedia Computing, Communications, and Applications | VOL. 3

Heart Disease Classification for Early Diagnosis based on Adaptive Hoeffding Tree Algorithm in IoMT Data
Ersin Elbasi ... Aymen I. Zreikat
The International Arab Journal of Information Technology | VOL. 20
Ersin Elbasi, et. al.Ersin Elbasi ... Aymen I. Zreikat
01 Jan 2023
The International Arab Journal of Information Technology | VOL. 20

Utilization of synthetic minority oversampling technique for improving potato yield prediction using remote sensing data and machine learning algorithms with small sample size of yield data
Hamid Ebrahimy ... Zhou Zhang
ISPRS Journal of Photogrammetry and Remote Sensing | VOL. 201
Hamid Ebrahimy, et. al.Hamid Ebrahimy ... Zhou Zhang
24 May 2023
ISPRS Journal of Photogrammetry and Remote Sensing | VOL. 201

Quality Control and Optimization for Hybrid Crowd-Machine Learning Systems

-

02 Nov 2017
02 Nov 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Efficient data reduction in multimedia data

Abstract

Talk to us

Similar Papers

More From: Applied Intelligence