Abstract

As part of information retrieval processes, words are often stemmed to a common root. The Porter Stemming Algorithm operates as a rule-based suffix-removal process. Stemming can be viewed as a way to cluster related words together according to one common stem. Sometimes Porter includes words in a cluster that are un-related. This experiment attempts to correct this using Formal Concept Analysis (FCA). FCA is the process of formulating formal concepts from a given formal context. A formal context consists of objects and attributes, and a binary relation that indicates the attributes possessed by each object. A formal concept is formed by computing the closure of subsets of objects and attributes. Using the Cranfield document collection, this experiment crafted a comparison measure between each word in the stemmed cluster using the Google Web 1T 5-gram data set. Using FCA to correct the clusters, the results showed a varying level of success dependent upon the error threshold allowed.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call