The dynamic development of natural language processing results in a growing number of products utilizing so-called speech and language technologies. On the one hand this refers to the possibility of interacting with a computer using a language that people naturally use in speech and writing; one the other – making the information contained in all sorts of texts accessible for a computer. We present how methods for gathering and extracting information can be applied to news releases, to possibly reduce the overhead generated by republishing the same news by numerous internet information portals. We present how web syndication can be used to gather press releases; how to process those texts in order to determine mutual similarity; and how to visualize those. We present preliminary results of an experiment with application of the above-mentioned methods to selected Polish internet portals.
Read full abstract