Abstract

AbstractNumerous reports have elucidated the classification of a large amount of data using various clustering techniques. However, an increase in data size hinders the applicability of these methods. Here, it is investigated how to deal with the exploding number of possibilities to be sorted into irreducible classes by using a clustering technique when its input capacity cannot accommodate the total number of possibilities. This can be exemplified by atomic substitutions in the supercell modeling of alloys. The number of possibilities is sometimes equal to trillions, which is extremely large to be accommodated in a cluster. Thus, it is not practically feasible to identify directly how many irreducible classes exist even though several techniques are available to perform the clustering. In this regard, a stochastic framework is developed to avoid the shortage limitations, providing a method to estimate the total number of irreducible classes (the order of classes), as a statistical estimate. The main conclusion is that the statistical variation of the number of classes, at each sampling trial, can serve as a promising measure to estimate the total number of irreducible classes. Characteristics of this approach is also discussed by comparing with the conventional one based on Polya's theorem.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.