Abstract

We argue for a systematic analysis of one particular, well structured domain -academic Web pages - with regard to a special class of digital genres: Web genres. For this purpose, we have developed a database-driven system that will ultimately consist of more than 3000000 HTML documents, written in German, which are the empirical basis for our research. We introduce the notions of Web genre type which constitutes the basic framework for a certain Web genre, and compulsory and optional Web genre modules. These act as building blocks which go together to make up the structure characterised by the Web genre type and furthermore, operate as modifiers for the default assignment involved. The analysis of a 200 document sample illustrates our notion of Web genre hierarchy, into which Web genre types and modules are embedded. The analysis of four different documents of the Web genre Academic's Personal Homepage, not only illustrates our approach, but also our long-term goal of automatically extracting the contents of Web genre modules in order to build up structured XML documents of groups of unstructured HTML documents.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.