Abstract

Clustering is a very popular machine-learning technique that is often used in data exploration of continuous variables. In general, there are two problems commonly encountered in clustering: (1) the selection of the optimal number of clusters, and (2) the undecidability of the affiliation of border data points to neighboring clusters. We address both problems and describe how to solve them in application to affective multimedia databases. In the experiment, we used the unsupervised learning algorithm k-means and the Nencki Affective Picture System (NAPS) dataset, which contains 1356 semantically and emotionally annotated pictures. The optimal number of centroids was estimated, using the empirical elbow and silhouette rules, and validated using the Monte-Carlo simulation approach. Clustering with k = 1–50 centroids is reported, along with dominant picture keywords and descriptive statistical parameters. Affective multimedia databases, such as the NAPS, have been specifically designed for emotion and attention experiments. By estimating the optimal cluster solutions, it was possible to gain deeper insight into affective features of visual stimuli. Finally, a custom software application was developed for study in the Python programming language. The tool uses the scikit-learn library for the implementation of machine-learning algorithms, data exploration and visualization. The tool is freely available for scientific and non-commercial purposes.

Highlights

  • IntroductionClustering can be broadly described as the task of dividing the population, or data points, or observations, as they are called, into a number of groups such that data points in the same groups, given a chosen set of attributes and metrics to compare them, are more similar to other data points in the same group than to those in other groups or clusters [1]

  • Clustering can be broadly described as the task of dividing the population, or data points, or observations, as they are called, into a number of groups such that data points in the same groups, given a chosen set of attributes and metrics to compare them, are more similar to other data points in the same group than to those in other groups or clusters [1].Clustering is an unsupervised process, which means that we are given unlabeled data and we need to put similar samples in one group and dissimilar samples in another, different cluster

  • When applying clustering in practice, one often encounters several problems: (1) the selection of the cluster similarity measure, (2) the selection of the optimal number of clusters, (3) the undecidability of the affiliation of border data points to neighboring clusters, and (4) the lack of correct group labels, which limits the applicability of the clustering model [3,4]

Read more

Summary

Introduction

Clustering can be broadly described as the task of dividing the population, or data points, or observations, as they are called, into a number of groups such that data points in the same groups, given a chosen set of attributes and metrics to compare them, are more similar to other data points in the same group than to those in other groups or clusters [1]. The difference in the articulation of images by pixel-defined content and semantic content is referred to as the semantic gap [11] This coupling of semantics and affect in emotionally annotated multimedia documents can be defined as a deterministic interaction between the semantics of a document and the effect that its semantics evoke. In this regard, the practical goal of the presented research is to develop an intelligent system that can infer the emotional content of a multimedia document from the evaluation of its semantics, and estimate the dominant semantics from the affective annotations when such information is available. The conclusion is presented in the final section at the end of the paper

Affective Multimedia Databases
Models of Affect in Affective Multimedia Databases
The NAPS Affective Picture Database
Related Work
Unsupervised Machine Learning Methods
Disadvantages of the k-Means Algorithm and the Solutions Used
Unstable Cluster Indexes
Statistical Distribution Undecidability
Experiment and Results
The Optimal Number of Clusters
D Dis quantitatively
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call