AbstractSimulating natural phenomena at greater accuracy results in an explosive growth of data. Large‐scale simulations with particles currently involve ensembles consisting of between 106 and 109 particles, which cover 105–106 time steps. Thus, the data files produced in a single run can reach from tens of gigabytes to hundreds of terabytes. This data bank allows one to reconstruct the spatio‐temporal evolution of both the particle system as a whole and each particle separately. Realistically, for one to look at a large data set at full resolution at all times is not possible and, in fact, not necessary. We have developed an agglomerative clustering technique, based on the concept of a mutual nearest neighbor (MNN). This procedure can be easily adapted for efficient visualization of extremely large data sets from simulations with particles at various resolution levels. We present the parallel algorithm for MNN clustering and its timings on the IBM SP and SGI/Origin 3800 multiprocessor systems for up to 16 million fluid particles. The high efficiency obtained is mainly due to the similarity in the algorithmic structure of MNN clustering and particle methods. We show various examples drawn from MNN applications in visualization and analysis of the order of a few hundred gigabytes of data from discrete particle simulations, using dissipative particle dynamics and fluid particle models. Because data clustering is the first step in this concept extraction procedure, we may employ this clustering procedure to many other fields such as data mining, earthquake events and stellar populations in nebula clusters. Copyright © 2003 John Wiley & Sons, Ltd.
Read full abstract