Abstract

Nowadays, blogs cover a large audience and they raised from the underground to become part of mainstream media. Blogs contain information on diverse topics, personal opinions, and discussions between bloggers and readers. Tags and categories are structural elements of a blog post that increase the blog's visibility, enhance navigation and searching within the blog history. We suppose that those annotations are made on subjective grounds rather than in a systematic way. Even if there are tools to help bloggers to tag and categorize their posts, we still don't know to which extent these tools take into account information contained in previous posts. This paper presents a 11 million word corpus of blogs posts in French dedicated to study these questions, and an experiment in tag and category prediction. Preliminary results show that around 27% of the overall tags can be predicted from lexical frequency analysis of blog posts. However, a first comparison experience with an existing tag suggestion tool shows that an important proportion of the tags used for blog description are not present in the blog post. This shows that tag suggestion tools should exploit the diachronic analysis of blogs.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.