Abstract

Community structure detection is an important tool in graph analysis. This can be done, among other ways, by solving for the partition set which optimizes the modularity scores . Here it is shown that topological constraints in correlation graphs induce over-fragmentation of community structures. A refinement step to this optimization based on Linear Discriminant Analysis (LDA) and a statistical test for significance is proposed. In structured simulation constrained by topology, this novel approach performs better than the optimization of modularity alone. This method was also tested with two empirical datasets: the Roll-Call voting in the 110th US Senate constrained by geographic adjacency, and a biological dataset of 135 protein structures constrained by inter-residue contacts. The former dataset showed sub-structures in the communities that revealed a regional bias in the votes which transcend party affiliations. This is an interesting pattern given that the 110th Legislature was assumed to be a highly polarized government. The -amylase catalytic domain dataset (biological dataset) was analyzed with and without topological constraints (inter-residue contacts). The results without topological constraints showed differences with the topology constrained one, but the LDA filtering did not change the outcome of the latter. This suggests that the LDA filtering is a robust way to solve the possible over-fragmentation when present, and that this method will not affect the results where there is no evidence of over-fragmentation.

Highlights

  • Many problems in science can be abstracted as networks

  • The topology constraint is based on contacts since the points in simulation lay on a unit grid

  • The Linear Discriminant Analysis (LDA) filtering proposed here have no information of the topology constraint, the results shown demonstrate that there is a geographic signal in the US votes, and that does not follow a party-strict pattern

Read more

Summary

Introduction

Many problems in science can be abstracted as networks. For example, in biological sciences, protein structures can be abstracted as graphs of connected residues [1], metabolic networks can be created by connecting enzymes by their interactions in a given pathway [2], or food webs can be created by joining species with their trophic interactions [3]. Since correlation is a measure of strength of relationship, the actual correlation value can be use as a weight in the edge, representing such relationship. This graph abstraction is useful since allow us to analyze the relationships using the graph invariants. There are many such properties, but one of special interest here is the community structure which represents how the vertices are arranged in groups densely connected internally and sparsely connected externally [6]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call