Abstract

The amount of online information in Chinese and the number of Chinese Internet users have been increasing tremendously during the past decade. Since Chinese language is significantly different from English, techniques that have been developed for retrieving information from English Web documents cannot be directly applied to retrieve information from Chinese Web documents. In order to provide high-performance access of Chinese information on the Web, we have developed a Chinese Web query engine that (i) extracts (hierarchical) data of interest from Chinese HTML tables using an information extraction tool called semantic hierarchy, (ii) allows the user to submit queries in Chinese using a menu-driven user interface, and (iii) processes the user's queries (as Boolean expressions) to generate the correct results. Our query engine supports various groups of information that are categorized into various subject areas, such as car ads, house rentals, job ads, stocks, university catalogs, etc. We have tested our information extraction tool on two application domains, car-ads and house-rental. The average F-measure on extracting Chinese data from these two application domains is above 90%. More importantly, our query engine can easily be configured and internationalized to become a worldwide, multilingual query engine with minor changes in system settings on PCs running Windows operating systems.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.