Abstract

In the last 70 years, the automatic text summarization work has become more and more important because the amount of data on the Internet is increasing so fast, and automatic text summarization work can extract useful information and knowledge what user's need that could be easily handled by humans and used for many purposes. Especially in people's daily life, news text is the type of text most people are exposed to. In this study, a new automatic summarzation model for news text which based on fuzzy logic rules, multi-feature and Genetic algorithm (GA) is introduced. Firstly, the most important feature is word features, we score each word and extracted words that exceeded the preset score as keywords and because news text is a special kind of text, it contains many specific elements, such as time, place and characters, so sometimes these special news elements can be extracted directly as keywords. Second is sentence features, a linear combination of these features shows the importance of each sentence and each feature is weighted by Genetic algorithm. At last, we use fuzzy logic system to calculate the final score in order to get automatic summarization. The results of the proposed method was compared with other methods including Msword, System19, System21, System 31, SDS-NNGA, GCD, SOM and Ranking SVM by using ROUGE assessment method on DUC2002 dataset show that proposed method outperforms the aforementioned methods.

Highlights

  • In the era of big data, there is a large amount of data produced on the Internet every day, today’s people feel is the most powerful social media data of explosive growth [1], such as our daily news from Web, WeChat, weibo, and various types of industry data

  • In this paper,we proposed a new model based on Multifeature, genetic algorithm and fuzzy logic for news text summarization

  • Considering capital letters and words appearing in news headlines can reflect the central content of the article, these two features are added to the word features for extraction, and each word is graded according to the different features.At the same time, according to the characteristics of news text writing, the three news elements are directly extracted as news keywords

Read more

Summary

Introduction

In the era of big data, there is a large amount of data produced on the Internet every day, today’s people feel is the most powerful social media data of explosive growth [1], such as our daily news from Web, WeChat, weibo, and various types of industry data. Web news has become one of the best media for people to get the latest information and the ever-changing current events. Facing these massive news, people do not have enough time to get the information they need by reading all the news online, especially for some enterprises and individuals in great demand for information. It can solve the problem of information overload on the Internet and the other hand, it can simplify the information obtained by users [5]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call