The era of big data has been constantly producing a huge amount of small files that will cause problems such as uneven distribution of copies and massive random readings and writings, resulting in delays in data accessions and degrading performances. By aiming at resolving the problems of an object storage system concerning the organization of small file objects such as layout imbalance, uneven loading, and adaptability of poor data migration, this article introduces a new strategy for object organization called object multi-tiered balanced organization strategy (OMBOS) with improved performance on massive small files in a real object storage system. Apart from the conventional strategy, the OMBOS achieves a multi-level balanced object by establishing weight indicators that reflect the comprehensive performance of nodes, including the balanced distribution of objects between replica sets and the balanced distribution of I/O requests in the replica sets as well as adapting to the equilibrium of the scale changes of the system dynamics. The OMBOS establishes a comprehensive performance evaluation model based on the fuzzy analytic hierarchy process and calculates the performance weights of the nodes of the object storages and replica sets. Then, the consistent hash-based object placement between the replica sets algorithm (BROP) and the I/O-loading balance in the replica set algorithm (IRIB) are utilized to implement performance-weighted objects in the object storage system through a tiered organization when the performance weights are taken into account. When the system scale changes flexibly, data migration is the only performed process in the nodes by adding or removing nodes, so data will not be migrated between existing nodes. Hence, the amount of data migration reaches the theoretical setting. Experiments show that the OMBOS achieves a balanced layout of objects based on performance weights, which meets the balancing and adaptability requirements of data organization. When compared to the original layout strategy through the balanced layout of objects, it performs nearly double the random access method.
Read full abstract