The recent progress in speech and vision has issued from the increased use of machine learning. Not only does the machine learning provides many useful tools, it also help us to understand existing algorithms and their connections in a new light. As a powerful tool in machine learning, support vector machine SVM leads to an expensive computational cost in the training phase due to the large number of original training samples, while minimal enclosing ball MEB presents limitations dealing with a large dataset. The training computation increases as data size becomes large, hence in this paper, we propose two improved approaches that handle this problem in huge dataset used in different domains. These approaches, based on L2-SVMs reduced to MEB problems result in a reduced data optimally matched to the input demands of different background of systems such as Universal Background Model architectures in language recognition and identification systems. We experiment on speech information based on acoustic shifted delta coefficient feature vectors applied in GMM-based dialect identification system where all data outer the ball defined by MEB are eliminated and the training time is reduced. Further numerical experiments on some real-world datasets show proof of the usefulness of our approaches in the field of data mining.
Read full abstract