Abstract

Among the variety of algorithms that have been developed for clustering, prototype-based approaches are very popular due to their low computational complexity, allowing real-life applications. In such algorithms, the data set is summarized by a small set of prototypes. Each prototype usually represents a cluster of objects. However, the definition of prototypes for complex objects defined by their relations (relational data) is not an easy task. Few works have been done yet in relational prototype-based clustering. Because relational data are described by a full matrix of dissimilarities, the most important challenge is the computation and memory costs, especially when the number of objects to analyze is very large and for the analysis of data streams (data sets with a dynamic structure varying over time). The combination of these three characteristics (size, complexity and evolution) presents a major challenge and few satisfactory solutions exist at the moment, despite increasingly evident needs. This paper focus on the development of new clustering approaches adapted to big and dynamic relational data. The main idea is to use a set of fixed support points chosen among the objects of the data set, independently from the clusters, and use these support points as a basis for the definition of a representation space, using the Barycentric Coordinates formalism. We demonstrate the qualities of the proposed approaches theoretically and experimentally on a set of artificial and real relational data. We also propose an extension adapted to relational data stream analysis, allowing a dynamic creation and suppression of prototypes to follow the dynamic of the data structure. This dynamic approach is applied on a real data set to detect and follow the dynamic of areas of interest over time in user’s web navigation. We tested different measures of similarity between URLs and different methods of automatic labeling to characterize the clusters. The results are convincing and encouraging, the clusters are homogeneous with clear associated topics. The dynamics of user’s interest can be recorded and visualized for each cluster. Remarkable patterns can be associated to precise events or usual timing and cycles in user’s interest.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.