Abstract

The YPA project is building a system to make the information in classified directories more accessible. BT's Yellow Pages®1 provides an example of classified database with which this work would be useful. There are two reasons for doing this: (i) directories like Yellow Pages contain much useful but hard-to-access information, especially in the free text in semi-display advertisements; (ii) more generally, the project is a demonstrator for exploitation of semi-structured data — data that is less systematic than database entries or logical clauses, but more systematic than free text because it has been marked up, for display or some other purpose. Accessing the directory source data file requires both natural language processing (for softening the interface to the system, and separately for analysis of natural-language-like constructs in the data) and information retrieval techniques, which are assisted by shallow knowledge. Deep world knowledge is impractical. The project seeks to get maximum effect from conveniently simplified approximations of standard natural language processing and knowledge representation. The paper gives an overview of the system, and illustrates its style with points about how the source data file is analysed. The YPA requires further development, but already demonstrates the effectiveness of shallow processing applied to semi-structured data.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.