Abstract

Rapidly growing Global Positioning System (GPS) data plays an important role in trajectory and their applications (e.g., GPS-enabled smart devices). In order to employ K-means to mine the better origins and destinations (OD) behind the GPS data and overcome its shortcomings including slowness of convergence, sensitivity to initial seeds selection, and getting stuck in a local optimum, this paper proposes and focuses on a novel niche genetic algorithm (NGA) with density and noise for K-means clustering (NoiseClust). In NoiseClust, an improved noise method and K-means++ are proposed to produce the initial population and capture higher quality seeds that can automatically determine the proper number of clusters, and also handle the different sizes and shapes of genes. A density-based method is presented to divide the number of niches, with its aim to maintain population diversity. Adaptive probabilities of crossover and mutation are also employed to prevent the convergence to a local optimum. Finally, the centers (the best chromosome) are obtained and then fed into the K-means as initial seeds to generate even higher quality clustering results by allowing the initial seeds to readjust as needed. Experimental results based on taxi GPS data sets demonstrate that NoiseClust has high performance and effectiveness, and easily mine the city’s situations in four taxi GPS data sets.

Highlights

  • Nowadays, with the prevalence of smart Global Positioning System (GPS) devices with positioning ability, a large amount of GPS-based data and trajectories are available

  • For the purpose of testing the performance of the NoiseClust algorithm, experiments are conducted on real-world taxi GPS data sets [35], and the results show that NoiseClust has a higher performance and effectiveness than GenClust [16] and Genetic algorithm K-means (GAK) [42]

  • NoiseClust uses the proposed new niche genetic algorithm (NGA) with noise and density to avoid getting stuck in a local optimum, while achieving high-quality cluster results for taxi GPS data

Read more

Summary

Introduction

With the prevalence of smart Global Positioning System (GPS) devices with positioning ability, a large amount of GPS-based data and trajectories are available. The key element to these applications is location (based on GPS), which is required to mine the hidden information and understand the meaning of the trajectories, instead of only considering trajectory as a combination of recorded GPS data points In these application domains, techniques for mining trajectory patterns and frequent trajectory routes are very important [1], and have usually been described by several trajectory patterns, such as origins and destinations (OD) [2,3,4], stops and moves [5,6], moving object [7,8]; a great quantity of clustering algorithms have been used to mine these patterns and produce clustering results. GGA (Group genetic algorithm) [21] presented GA-based clustering algorithms with a new grouping method in the initial population These GAs with K-means can lose population diversity due to global optimal problems and weak exploitation capabilities, and the gene size of the chromosomes must be equal in the AGCUK (Automatic genetic clustering for unknown K) and GAGR. In GGA algorithm, the number of clusters require a user input, but gene sizes are not equal

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call