Abstract

The Fuzzy Inference System (FIS) is frequently utilized in a variety of Text Mining applications. In the text processing domains, where the amount of the processed data is vast, inserting manual rules for FIS remains a real issue, especially in the text processing domains, where the size of the processed databases is enormous. Therefore, an automated and optimal inference rules (IR) selection strengthens the FIS process. In this work, we propose to apply the FP-Growth as an association model algorithm and an automatic way to identify IR for fuzzy text vectorization. Once the fuzzy vectors are generated, we call the selection variables algorithms, e.g., Info Gain and Relief, to reduce the given descriptor dimensionality. To test the new descriptor performance, we propose multi-classes text classifification systems using several machine learning algorithms. Applying benchmarked databases, the new technique to produce Fuzzy descriptors achieves a signifificant gain in time, precision rules, and weighting quality. Moreover, comparing the classifification systems, the accuracy is improved by 10% comparing with other approaches.

Highlights

  • One of the Fuzzy Inference System (FIS) use in the Text Mining field is the technique of weighting features (FTF-IDF) [4], where we use fuzzy reasoning to extract the term frequency-inverse term frequency (TF-IDF) scores [5]

  • To compare different recognition systems, based on the new fuzzy representation FTF-IDF approach and classifiers, several experimentations have been conducting for all algorithms with different configurations under a compatible Dell, Intel (R) Core i5- CPU 2.50 GHz, and 4 GB of RAM

  • The given accuracy = 98%, in table 4, and the presented results for the paper [6] prove that the automatization of rules to produce the FTF-IDF weight has an excellent impact on the text representation and the supervised decision

Read more

Summary

Introduction

The automatization of expert systems is a challenge in several areas [1] [2]. The main aim is to show, monitor, and provide relevant information utilizing fast and intelligent technologies, in the artificial intelligence context applied to textual data. Unlike Apriori, which produces candidate itemsets and tests them to keep only frequent itemsets, FP- Growth constructs frequent itemsets without generating candidates [9] In this contribution, the FP-Growth allows producing more explicit rules, which minimize the need for post-processing as a complicated step.the experiments prove that the new technique permits an optimal, rapid, and interoperable selection of inference rules to produce relevant Fuzzy Descriptors. As the second part of our presented contribution, we cote to generate fuzzy descriptors using the mentioned approach for several textual corpora for automatic multi-classes text classification. This level permits to test the performance of the given descriptor, where we compare several unstructured data categorization systems using:. To show the compared multi-classes text classification systems performances, we present the experimentation and results in the fourth section before the conclusion

Related Work
Main observations and motivations
The adopted Fuzzy TF-IDF approach
Machine Learning Models
Association Models
FP-Growth
Select attributes Methods
Classification Tools
Experiments & Results
Datasets We use as corpus the BBC News and BBC
Pre-processing
The Classification parameters
Performance Measures
BBC Sport Dataset
BBC News Dataset Using other databases, the BBC
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call