Abstract
The Rich News system, that can automatically annotate radio and television news with the aid of resources retrieved from the World Wide Web, is described. Automatic speech recognition gives a temporally precise but conceptually inaccurate annotation model. Information extraction from related web news sites gives the opposite: conceptual accuracy but no temporal data. Our approach combines the two for temporally accurate conceptual semantic annotation of broadcast news. First low quality transcripts of the broadcasts are produced using speech recognition, and these are then automatically divided into sections corresponding to individual news stories. A key phrases extraction component finds key phrases for each story and uses these to search for web pages reporting the same event. The text and meta-data of the web pages is then used to create index documents for the stories in the original broadcasts, which are semantically annotated using the KIM knowledge management platform. A web interface then allows conceptual search and browsing of news stories, and playing of the parts of the media files corresponding to each news story. The use of material from the World Wide Web allows much higher quality textual descriptions and semantic annotations to be produced than would have been possible using the ASR transcript directly. The semantic annotations can form a part of the Semantic Web, and an evaluation shows that the system operates with high precision, and with a moderate level of recall.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.