Abstract

Abstract.--The majority of taxonomic descriptions is currently in print format. The majority of digital descriptions are in formats such as DOC, HTML, or PDF and for human readers. These formats do not convey rich semantics in taxonomic descriptions for computer-aided process. Newer digital formats such as XML and RDF accommodate semantic annotations that allow computers to process the rich semantics on human's behalf, thus open up opportunities for a wide range of innovative usages of taxonomic descriptions, such as searching in more precise and flexible ways, integrating with gnomic and geographic information, generating taxonomic keys automatically, and text data mining and information visualization etc. This paper discusses the challenges in automated conversion of multiple collections of descriptions to XML format and reports an automated system, MARTT. MARTT is a machine-learning system that makes use of training examples to tag new descriptions into XML format. A number of utilities are implemented as solutions to the challenges. The utilities are used to reduce the effort for training example preparation, to facilitate the creation of a comprehensive schema, and to predict system performance on a new collection of descriptions. The system has been tested with several plant and alga taxonomic publications including Flora of China and Flora of North America.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.