Abstract
The concept of involving users in the loop of analytic workflows refers to the ability to replace heuristics with user input in machine learning and data mining tasks. For supervised tasks, user engagement generally occurs via the manipulation of training data. But for unsupervised tasks, user involvement is limited to changes in the algorithm parametrization or the input data representation, also known as features. Typically, different types of features can be extracted from raw data, and the careful selection of the extraction strategy allows users to have more control over unsupervised tasks. Nevertheless, since there is no perfect feature extractor, the combination of multiple sets of features has been explored through a process called feature fusion. Feature fusion can be readily performed when the machine learning or data mining algorithms have a cost function, such as accuracy for classification tasks. However, when such a function does not exist, user support needs to be provided, otherwise the process is impractical. In this article, we present a novel feature fusion approach that employs data samples and visualization to allow users to not only effortlessly control the combination of different feature sets but also understand the attained results. The effectiveness of our approach is confirmed by a comprehensive set of qualitative and quantitative experiments, opening up different possibilities for user-guided analytical scenarios. The ability of our approach to provide real-time feedback for feature fusion is exploited in the context of unsupervised clustering techniques, where users can perform an exploratory process to discover the best combination of features that reflects their individual perceptions about similarity.
Submitted Version (Free)
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have