Abstract

With abundant user data and computing power, using machine learning to automate or minimize our task has become a recent trend. These resources are useful in providing some correlations between data and drawing conclusions. There has been a rapid progression in the field of AI/ML since the past decade with areas like computer vision and natural language processing at the forefront of the race. Text summarization is one of the tasks which has been explored a lot but there hasn’t been any practical application for the same other than tasks such as news or book summarization which are based on Extractive summarization. One of the main disadvantages of Abstractive summarization is there is too much focus on generating good results with respect to a particular sentence and too little on the corpus of text containing thousands of such sentences. There has been some work on multi document summarization but that does not account for the fact that the text can be changed or appended with new text and thus these methods become obsolete. Thus, the proposed method can be applied to a large corpus of data containing thousands of entities/sentences and can also be applied to changing text corpus. The method can be applied to new data that is added separately without rerunning the model on the whole corpus again. This gives us the power of batch processing which can be leveraged according to our space and time constraints. There has not been any research on the same to the best of our knowledge. A transformer model is used to generate individual sentence summaries of respected review text and then used a combination of Universal Sentence Encoder, statistical methods and graph reduction algorithm to select the most relevant sentences to best represent the whole text. By doing so, the same mechanism is applied to incorporate new data that can be added to the corpus and the accuracy would not be affected much. Our results show that even by increasing the degree of contraction of the text corpus (particularly large text corpus), the same accuracy can be achieved.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call