Online social networks are attracting more and more users. Faced with hundreds of millions of users, how to store user data in a scalable manner has become a hot issue of focus for both social service providers and researchers. Currently, distributed key value store is widely used; it places user data across multiple storage servers based on a hash approach. However, it results in a huge amount of communication traffic inside a data center, and is not conducive to the scale of social networks. By considering user interaction characteristics, this paper proposes a data placement approach that combines both social graph partitioning and data replication. Considering the network topologies of data centers, we design the data placement for specific topologies. Furthermore, we discuss the incremental adjustment for social network growth and the distributed implementation of the proposed algorithms. Finally, experiments on real world traces indicate that the proposed algorithms can effectively reduce internal communication traffic, thereby enhancing the scalability of online social networks.
Read full abstract