TOPIC SEGMENTATION METHODS COMPARISON ON COMPUTER SCIENCE TEXTS

Volodymyr Sokol,Vitalii Krykun,Mariia Bilova,Ivan Perepelytsya,Volodymyr Pustovarov

doi:10.20998/2079-0023.2021.02.10

Abstract

The demand for the creation of information systems that simplifies and accelerates work has greatly increased in the context of the rapidinformatization of society and all its branches. It provokes the emergence of more and more companies involved in the development of softwareproducts and information systems in general. In order to ensure the systematization, processing and use of this knowledge, knowledge managementsystems are used. One of the main tasks of IT companies is continuous training of personnel. This requires export of the content from the company'sknowledge management system to the learning management system. The main goal of the research is to choose an algorithm that allows solving theproblem of marking up the text of articles close to those used in knowledge management systems of IT companies. To achieve this goal, it is necessaryto compare various topic segmentation methods on a dataset with a computer science texts. Inspec is one such dataset used for keyword extraction andin this research it has been adapted to the structure of the datasets used for the topic segmentation problem. The TextTiling and TextSeg methods wereused for comparison on some well-known data science metrics and specific metrics that relate to the topic segmentation problem. A new generalizedmetric was also introduced to compare the results for the topic segmentation problem. All software implementations of the algorithms were written inPython programming language and represent a set of interrelated functions. Results were obtained showing the advantages of the Text Seg method incomparison with TextTiling when compared using classical data science metrics and special metrics developed for the topic segmentation task. Fromall the metrics, including the introduced one it can be concluded that the TextSeg algorithm performs better than the TextTiling algorithm on theadapted Inspec test data set.

Highlights

In the context of the rapid informatization of society and all its branches, both daily and professional activities, the demand for the creation of information systems that simplifies and accelerates work has greatly increased
The purpose of this work is to compare the effectiveness of some well-known topic segmentation methods on a dataset on computer science topics, which will help in the future to implement an appropriate component for a knowledge management system when preparing content for export to a learning management system
TextSeg is one of the examples of generative methods of topic segmentation, the essence of which is the assumption that the text is generated based on a certain sequence of topics, which in turn have their own models of language, i.e. the probabilities of meeting words

Summary

Introduction

In the context of the rapid informatization of society and all its branches, both daily and professional activities, the demand for the creation of information systems that simplifies and accelerates work has greatly increased. This need provokes the emergence of more and more companies involved in the development of software products and information systems in general. One of the main tasks of IT companies is continuous training of personnel to improve their qualifications and ensure greater work efficiency For this purpose, content from the company's knowledge management system must be exported to the learning management system. Knowledge of which is accumulating in IT companies, is computer science

Objectives

Methods

Results

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

TOPIC SEGMENTATION METHODS COMPARISON ON COMPUTER SCIENCE TEXTS

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Bulletin of National Technical University "KhPI". Series: System Analysis, Control and Information Technologies

Lead the way for us

Journal: Bulletin of National Technical University "KhPI". Series: System Analysis, Control and Information Technologies	Publication Date: Dec 28, 2021
License type: cc-by

Similar Papers

Typical functionality, application and deployment specifics of knowledge management systems in IT companies
Volodymyr Sokol ... Mariia Bilova
Vìsnik Nacìonalʹnogo unìversitetu "Lʹvìvsʹka polìtehnìka". Serìâ Ìnformacìjnì sistemi ta merežì | VOL. 8
Volodymyr Sokol, et. al.Volodymyr Sokol ... Mariia Bilova
05 Dec 2020
Vìsnik Nacìonalʹnogo unìversitetu "Lʹvìvsʹka polìtehnìka". Serìâ Ìnformacìjnì sistemi ta merežì | VOL. 8

Optimization of labor cost standards in IT companies
Ivanna Pererva
Development Management | VOL. 18
Ivanna PerervaIvanna Pererva
08 May 2020
Development Management | VOL. 18

Comparing neural sentence encoders for topic segmentation across domains: not your typical text similarity task
Iacopo Ghinassi ... Chris Newell
PeerJ Computer Science | VOL. 9
Iacopo Ghinassi, et. al.Iacopo Ghinassi ... Chris Newell
03 Nov 2023
PeerJ Computer Science | VOL. 9

Knowledge management practicesin oil companies
...
Open Education | VOL. 22
, et. al. ...
14 Jan 2019
Open Education | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

TOPIC SEGMENTATION METHODS COMPARISON ON COMPUTER SCIENCE TEXTS

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Bulletin of National Technical University "KhPI". Series: System Analysis, Control and Information Technologies