To address the issues of high redundancy in sentence features and poor semantic similarity analysis in English translation, a boundary definition model for complex sentence clauses in English translation based on the Huffman tree and objective function is proposed. This model analyzes the boundary structure of complex sentences and clauses in English translation, determines the mathematical expected value of the probability distribution of boundary features, introduces the maximum entropy model to assess the key degree of different features, and utilizes conditional random fields to mine hidden features. Additionally, by employing a hierarchical clustering algorithm to analyze the distance between boundary features of complex sentences and clauses in English translation, similar images are merged based on the minimum distance between data points. Feature redundancy scores are obtained through an attention mechanism, and the weight of boundary features of complex sentences and clauses in English translation is calculated. By using the edit distance in semantic similarity to determine the boundary distance of clauses, and then using cosine similarity to calculate the similarity between the boundary features of complex clauses in English translation, a Huffman tree and objective function are introduced to construct a model for defining the boundary of complex clauses in English translation. Input the boundary feature values of complex sentences and clauses in English translation to complete the final definition. The experimental results show that the proposed method performs well in the task of defining the boundaries of complex sentences and clauses in English translation, with semantic similarity analysis results remaining above 95% and reaching up to 99%, significantly better than the comparative methods. Meanwhile, the recall curve obtained by the proposed method is closest to the ideal curve and has a small fluctuation range, stable between 90% and 98%, further verifying its accuracy and robustness in boundary delineation. In addition, when the sample data size is 1000, the confidence level of the proposed method is as high as 99.6%, which is higher than the 95.6% and 95.1% of the comparison methods. As the sample size increased to 2500, the confidence level of the proposed method remained at a high level of 99.4%, while the confidence level of the comparative method decreased to 94.2% and 94.1%, respectively. These data results fully demonstrate the effectiveness of the proposed method in reducing redundant interference and improving confidence. In summary, the proposed method has improved the performance of defining the boundaries of complex sentences and clauses in English translation.
Read full abstract