Performance optimization based on node attributes is of profound significance in the replica placement algorithm used in Hadoop distributed file system (HDFS). Currently, most researchers studying replica placement algorithms consider only a single attribute or a few multiple attributes. However, a single attribute cannot accurately express the performance of a node. Therefore, this paper proposes a replica placement algorithm based on the entropy weight TOPSIS (technique for order preference by similarity to ideal solution) method, called TS-REPLICA. First, a multi-attribute matrix that comprehensively reflects the performance and load of nodes is defined. Then, a TOPSIS-based algorithm is proposed to calculate the performance score of each data node. In addition, the entropy weight method is introduced to derive the weights of attributes for balancing the influence of weights of multiple attributes. Next, the comprehensive load score of each data node in the Spark cluster, the average comprehensive load score of each rack, and the average comprehensive load score of the entire cluster are calculated, and replica placement is performed based on the obtained scores. Finally, the effectiveness of the proposed algorithm is verified on various datasets and test cases. The experimental results show that the TS-REPLICA algorithm outperforms the better comparison algorithm in execution number in Spark cluster.
Read full abstract