Gender Clustering of Blog Posts using Distinguishable Features

Yaakov Hacohen-Kerner,Yarden Tzach,Ori Asis

doi:10.5220/0006077403840391

Abstract

The aim of this research is to find out how to perform effective clustering of unlabeled personal blog posts written in English by gender. Given a gender-labeled blog corpus and a blog corpus that is not gender-labeled, we extracted from the labeled corpus distinguishable unigrams for both males and females. Then, we defined two general features that represent the relative frequencies of the distinguishable malesâ unigrams and femalesâ unigrams, (malesâ frequency and femalesâ frequency). The best distinguishable feature was found to be the malesâ frequency feature with a ratio factor at least 1.4 times that of females. This feature leads to accuracy rate of 83.7% for gender clustering of the unlabeled blog corpus. To the best of our knowledge, this study presents two novelties: (1) this is the first study to cluster blog posts by gender, and (2) clustering of an unlabeled corpus using distinguishable features that were extracted from a labeled corpus.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Gender Clustering of Blog Posts using Distinguishable Features

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Differential Use of Reporting Verbs in Academic Papers and Personal Blogs
Yongkook Won
The English Teachers Association in Korea | VOL. 27
Yongkook WonYongkook Won
30 Sep 2021
The English Teachers Association in Korea | VOL. 27

FLAG-PDFe: Features Oriented Metadata Extraction Framework for Scientific Publications
Muhammad Waqas Ahmed ... Muhammad Tanvir Afzal
IEEE Access | VOL. 8
Muhammad Waqas Ahmed, et. al.Muhammad Waqas Ahmed ... Muhammad Tanvir Afzal
01 Jan 2020
IEEE Access | VOL. 8

Competing narratives, gender and threaded identity in cyberspace
Antonio García Gómez
Journal of Gender Studies | VOL. 19
Antonio García GómezAntonio García Gómez
01 Mar 2010
Journal of Gender Studies | VOL. 19

Retrieval and feedback models for blog feed search
Jonathan L Elsas ... Jaime Arguello
-
Jonathan L Elsas, et. al.Jonathan L Elsas ... Jaime Arguello
20 Jul 2008
20 Jul 2008

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Gender Clustering of Blog Posts using Distinguishable Features

Abstract

Talk to us

Similar Papers