Abstract
Often the data units are described with discrete distributions (work described with citation distribution over time, population pyramid described as age-sex distribution etc.).When the set of such units is very large, appropriate clustering methods can reveal the typical patterns hidden in the data. In this paper we present an adapted leaders method combined with a compatible adapted agglomerative hierarchical method that are based on relative error measure between a unit and the corresponding cluster representative–leader. The proposed approach is illustrated on citation distributions derived from the data set of US patents from 1980 to 1999. These new methods were developed because clustering of units, described with distributions, with classical k-means method reveals patterns with single high peaks which correspond to a single year. These patterns prevail over other distribution shapes also present in the data. Compared with centers in k-means method, clusters’ representatives obtained with the proposed new methods better detect typical distribution shapes of units. The obtained main cluster types for different sets of units show three main patterns: patents with early or late peak of importance to the community, and patents where the importance is slowly increasing throughout the time period.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.