Abstract
Tensor clustering is a knowledge management technique which is well known as a major algorithmic and technological driver behind a broad applications spectrum. The latter ranges from multimodal social media analysis and geolocation processing to analytics tailored for large omic data. However, known exact tensor clustering problems when reduced to tensor factorization are provably NP hard. This is attributed in part to the volume of data contained in a tensor, proportional to the product of its dimensions, as well as to the increased interdependency between the tensor entries across its dimensions. One well studied way to circumvent this inherent difficulty is to resort to heuristics. This article presents an enhanced version of a genetic algorithm tailored for community discovery structure in tensors containing spatiosocial data, namely linguistic and geolocation data. The objective function as well as the chromosome fitness functions by design take into account elements of linguistic propagation models. The genetic operators of selection, crossover, and mutation as well as the newly added double mutation operator work directly on the community level. Moreover, various policies for maintaining gene variability across generations are studied in an extensive simulation powered by Google TensorFlow. As with its predecessor, the proposed genetic algorithm has been applied to a dataset consisting of a large number of Tweets and their associated geolocations from the Grand Duchy of Luxembourg, a historically and de facto trilingual country. The results are compared with those obtained from the original genetic algorithm and their differences are interpreted.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have