Abstract

Reducing the dimension of the feature space for describing thematic documents is considered. Descriptions of documents are presented in the form of an “object-property” table, for the formation of which thematic dictionaries were developed with a volume of no more than 100 keywords for each subject area. The correctness of the formation of dictionaries is proved in the framework of the problem of the pattern recognition with disjoint classes. Results of the analysis of the topological properties of the feature space by the values of the compactness measures are used as a research tool. The values of the compactness measures are the quantitative estimation of structures in relations between objects for each class and for the sample as a whole. The structure of relationships is investigated through the division of the class objects into disjoint groups. A path always may be created based on binary relation of connectedness between any two objects of a group. The choice of the space for the description of documents is made by solving the problem of conditional optimization using the Lagrange method. The condition for the formation of an ordered sequence of features is determined. Applying of an ordered sequence is considered as a method to reduce the combinatorial complexity of the selection algorithms. When removing uninformative features from the description of documents, the value of the measure of the compactness of the sample reaches its maximum. A visual representation of the complexity of the configuration of groups and the connectivity of objects from their composition is given.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call