Abstract

A model of XML document is extended by considering both path and frequency information, namely the frequency-path model. Based on this model, a structural similarity calculation algorithm with position and frequency weight by longest common subsequence (PFWLCS) is proposed, which is fast and has high precision. Furthermore the selection of the position and frequency factors are discussed in depth. Experiments show that the PFWLCS has higher recall ratio and accuracy than existing similarity calculation methods, especially on XML with different Structures.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call