Abstract

Machine learning, especially deep learning, has made great achievements in application domains such as computer vision, speech recognition, machine translation, etc. This great success not only depends on the progress of model design but also on the emergence of large-scale annotated benchmark datasets. However, in some application domains, it is difficult to apply machine learning methods to solve the new arising problems due to the lack of standardized and annotated datasets. People have to find their way from the very beginning of the data collecting, cleaning, and labeling. Unfortunately, there is not a guiding framework to deal with such a situation. This paper pays attention to it and proposes a general machine learning framework to solve the pratical problems from the scratch. It contains two main stages implemented by an unsupervised clustering method and a semi-supervised learning method representatively. In the first stage, an unsupervised clustering method is utilized to find representative data samples which are more important and can be used to do data cleaning and manual labeling. In the second stage, a semi-supervised method is adopted to predict the labels for the rest data samples and to construct a larger annotated dataset. A case study in the texture perception field has been done to confirm the effectiveness and efficiency of the proposed application framework.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call