Abstract

distances are computed in a multi-dimensional space. The axes of this space in principle relate to the features inherent in the input data. Usually, such features are chosen by neural network developers, thereby introducing a possible bias. A method of automatically generating feature sets is discussed, with specific reference to the categorisation of streams of free-text news items. The feature sets were generated by a procedure that automatically selects a group of keywords based on a lexico-semantic analysis. Three different types of text streams – headlines only, news summaries and full news items including the body of the text –have been categorised using Self-Organising Feature Maps (SOFM). A method for assessing the discrimination ability of a SOFM, based on Fisher’s Linear Discriminant Rule suggests that the maps trained on vectors related to summaries only provides a fairly accurate cluster when compared with vectors related to full text. The use of summaries as document surrogates for document categorisation is suggested.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.