Topic Modeling Genre: An Exploration of French Classical and Enlightenment Drama

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

<strong>This version of the article is outdated. Please consider a more recent version here: https://zenodo.org/record/166356.</strong> The concept of literary genre is a highly complex one: not only are different genres frequently defined on several, but not necessarily the same levels of description, but consideration of genres as cognitive, social, or scholarly constructs with a rich history further complicate the matter. This contribution focuses on thematic aspects of genre with a quantitative approach, namely Topic Modeling. Topic Modeling has proven to be useful to discover thematic patterns and trends in large collections of texts, with a view to class or browse them on the basis of their dominant themes. It has rarely if ever, however, been applied to collections of dramatic texts. In this contribution, Topic Modeling is used to analyse a collection of French Drama of the Classical Age and the Enlightenment. The general aim of this contribution is to discover what types of semantic coherence topics show in this collection, whether different dramatic subgenres have distinctive dominant topics and plot-related topic patterns, and inversely, to what extent clustering methods based on topic scores per play produce groupings of texts which agree with more traditional genre distinctions. This contribution shows that interesting topic patterns can be detected which provide new insights into the thematic, internal structure of a genre such as drama as well as into the history of French drama of the Classical Age and the Enlightenment.

Similar Papers
  • Dissertation
  • Cite Count Icon 1
  • 10.4225/03/58a675be72e78
Text mining and rating prediction with topical user models
  • Feb 17, 2017
  • Yanir Seroussi

Recent years have seen an abundance of user-generated texts published online. Mining these texts for useful information is a growing research area with many aspects that are yet to be fully explored. Two such aspects, which are investigated in this thesis, are the extraction of implicit information about users to create user models, and the application of these models to tasks that require user information. Our main approach to extracting user information is via topical user models, which represent each author and document with low-dimensional distributions over topics (a topic is a distribution over words). We develop methods that utilise these topical user models to address the following tasks: (1) authorship attribution: identifying which user wrote a given anonymous text; (2) polarity inference: detecting the level of sentiment expressed in a given text; and (3) rating prediction: determining a given user's expected sentiment towards a given item. The first task we consider is authorship attribution, where the goal is to identify the authors of anonymous texts. Authorship attribution is one of the most commonly attempted tasks in the authorship analysis field, which -- in addition to authorship attribution -- also deals with profiling authors by inferring demographic information and personality traits from their texts. Traditionally, research in this field has focused on formal texts, such as essays and novels, but recently more attention has been given to online user-generated texts, such as emails and blogs. Authorship attribution of online user-generated texts is a more challenging task than traditional authorship attribution, because such texts tend to be short and informal, and the number of candidate authors is often larger than in traditional settings. We address this challenge by employing topical user models. In addition to exploring novel ways of applying two popular topic models to this task, we develop a new model that projects users and documents to two disjoint topic spaces. Employing our model in authorship attribution yields state-of-the-art performance on several datasets, which contain either formal texts or online user-generated texts, where the number of candidate authors ranges from three to about 20,000. The second task we consider is polarity inference, where the goal is to infer the degree of positive or negative sentiment expressed in texts. Polarity inference is a key task in the sentiment analysis field, which deals with inferring people's sentiments and opinions from texts. Even though the way polarity is expressed often appears to depend on the author, most of the work in this field ignores authors. In this thesis, we introduce a framework that infers the polarity of texts by employing user-specific inference models, where the models can be weighted according to user similarity. We show that our framework outperforms two popular baselines, even when all the base models are given equal weights. In addition, we show that performance can be further improved by considering user similarity in terms of language use (e.g., as captured by topical user models) and rating patterns. The third and final task we consider is rating prediction, where the goal is to predict the rating a given user would assign to a given item. Rating prediction is a core component of many recommender systems, which require a way to predict users' future sentiments in order to find and recommend items of personal interest. Recently, rating prediction algorithms that are based on matrix factorisation have become increasingly popular, mainly due to their high accuracy and scalability. However, such algorithms often deliver inaccurate rating predictions for users who submitted only a few ratings. In this thesis, we introduce an extension to the basic matrix factorisation algorithm that considers information about the users when generating rating predictions. We show that employing either demographic information or text-based information (in the form of topical user models) outperforms baselines that consider only ratings, thereby enabling more accurate generation of personalised rating predictions for users who have not submitted many ratings. In the case of topical user models, these predictions are generated without requiring users to explicitly supply any information about themselves and their preferences. Awards: Winner of the Mollie Holman Doctoral Medal for Excellence, Faculty of Information Technology, 2012.

  • Research Article
  • 10.13016/m2y1v5-n4qf
Variational Autoencoders using D-Wave Quantum Annealing
  • Dec 10, 2018
  • Jennifer Sleeman

Exploring the use of deep learning algorithms on the quantum computer will provide insight into how the quantum computer, in particular quantum annealing, can be applied to climate related research to accelerate the learning process. Current research has explored using Restricted Boltzmann Machines (RBM) using D-Wave's quantum annealer. This work has explored problems such as MNIST image recognition tasks. In addition, another body of research has explored variational inference methods using quantum annealing. We consider using a combination of the RBM approach and the variational inference approach to implement a deep variational autoencoder to perform latent extractions to support a text-based data assimilation method. We will compare the latent extractions using the quantum variational autoencoder approach with latent extractions produced using a classical variational autoencoder. We will use the D-Wave quantum annealer system and the IBM Power 8 system to perform these experiments. We will compare the effects of the latent extractions on our Dynamic Data Assimilation for Topic Modeling (DDATM) method used to understand how IPCC supported research has evolved over time.

  • Research Article
  • 10.5281/zenodo.18178
Topic Modeling Genre: An Exploration of French Classical and Enlightenment Drama
  • Jun 1, 2015
  • Christof Schöch

The concept of literary genre is a highly complex one: not only are different genres frequently defined on several, but not necessarily the same levels of description, but consideration of genres as cognitive, social, or scholarly constructs with a rich history further complicate the matter. This contribution focuses on thematic aspects of genre with a quantitative approach, namely Topic Modeling. Topic Modeling has proven to be useful to discover thematic patterns and trends in large collections of texts, with a view to class or browse them on the basis of their dominant themes. It has rarely if ever, however, been applied to collections of dramatic texts. In this contribution, Topic Modeling is used to analyse a collection of French Drama of the Classical Age and the Enlightenment. The general aim of this contribution is to discover what types of semantic coherence topics show in this collection, whether different dramatic subgenres have distinctive dominant topics and plot-related topic patterns, and inversely, to what extent clustering methods based on topic scores per play produce groupings of texts which agree with more traditional genre distinctions. This contribution shows that interesting topic patterns can be detected which provide new insights into the thematic, internal structure of a genre such as drama as well as into the history of French drama of the Classical Age and the Enlightenment.

  • Research Article
  • 10.5075/epfl-thesis-5059
Modeling and understanding communities in online social media using probabilistic methods
  • Jan 1, 2011
  • Radu Andrei Negoescu

The amount of multimedia content is on a constant increase, and people interact with each other and with content on a daily basis through social media systems. The goal of this thesis was to model and understand emerging online communities that revolve around multimedia content, more specifically photos, by using large-scale data and probabilistic models in a quantitative approach. The dissertation has four contributions. First, using data from two online photo management systems, this thesis examined different aspects of the behavior of users of these systems pertaining to the uploading and sharing of photos with other users and online groups. Second, probabilistic topic models were used to model online entities, such as users and groups of users, and the new proposed representations were shown to be useful for further understanding such entities, as well as to have practical applications in search and recommendation scenarios. Third, by jointly modeling users from two different social photo systems, it was shown that differences at the level of vocabulary exist, and different sharing behaviors can be observed. Finally, by modeling online user groups as entities in a topic-based model, hyper-communities were discovered in an automatic fashion based on various topic-based representations. These hyper-communities were shown, both through an objective and a subjective evaluation with a number of users, to be generally homogeneous, and therefore likely to constitute a viable exploration technique for online communities.

  • Research Article
  • 10.21427/d7v52f
An Exploration of Parliamentary Speeches in the Irish Parliament Using Topic Modeling
  • Oct 4, 2018
  • Fiona Leheny

An Exploration of Parliamentary Speeches in the Irish Parliament Using Topic Modeling

  • Conference Article
  • 10.4230/lipics.cosit.2019.18
Enabling the Discovery of Thematically Related Research Objects with Systematic Spatializations.
  • Jul 8, 2020
  • Sara Lafia + 1 more

Author(s): Lafia, Sara; Last, Christina; Kuhn, Werner | Abstract: It is challenging for scholars to discover thematically related research in a multidisciplinary setting, such as that of a university library. In this work, we use spatialization techniques to convey the relatedness of research themes without requiring scholars to have specific knowledge of disciplinary search terminology. We approach this task conceptually by revisiting existing spatialization techniques and reframing them in terms of core concepts of spatial information, highlighting their different capacities. To apply our design, we spatialize masters and doctoral theses (two kinds of research objects available through a university library repository) using topic modeling to assign a relatively small number of research topics to the objects. We discuss and implement two distinct spaces for exploration: a field view of research topics and a network view of research objects. We find that each space enables distinct visual perceptions and questions about the relatedness of research themes. A field view enables questions about the distribution of research objects in the topic space, while a network view enables questions about connections between research objects or about their centrality. Our work contributes to spatialization theory a systematic choice of spaces informed by core concepts of spatial information. Its application to the design of library discovery tools offers two distinct and intuitive ways to gain insights into the thematic relatedness of research objects, regardless of the disciplinary terms used to describe them.

  • Research Article
  • 10.4233/uuid:f7a44425-27ee-4b59-a514-55df183b3c0c
Value conflicts in energy systems
  • Oct 1, 2020
  • Tristan De Wildt

This thesis introduces an approach to support the long-term social acceptance of energy systems by addressing value conflicts embedded in regulatory and technical designs. When designing energy systems, the realisation of some values can conflict with the realisation of other values. The decision to deploy energy systems therefore inevitably entails a prioritisation of some values over others. Societal groups that do not agree with this prioritisation may decide to oppose or not to support the deployment and use of these systems. Lack of social acceptance may occur during the planning phase, but also at a later point in time as a result of value change. This can be caused by a growing mismatch between values prioritized in energy systems and how societal groups are affected. To support the social acceptance of energy systems, value conflicts embedded in energy systems need to be addressed. Methods to do so were however lacking. This thesis provides a methodological contribution by demonstrating how the literature on data science and the complexity sciences can be used to address value conflicts. This thesis answers the following research question: How can value conflicts embedded in energy systems be addressed in support of social acceptance? We use probabilistic topic modelling to explore how the academic literature addresses value conflicts. Identified tactics can be used to specify design requirements and policy guidelines in support of the social acceptance of energy systems. Agent-based modelling is used to identify value conflicts embedded in energy systems that result from the heterogeneous properties of the affected population. Agent-based models provide insights about the type of population affected by value conflicts and hence about the severity of the resulting lack of social acceptance. This thesis contributes to the literature on social acceptance by demonstrating how long-term acceptance can be supported by drawing on insights from ethics of technology. Additionally, we provide a systematic and practical approach to integrate human values in the regulatory and technical design of infrastructures, which is critical for supporting the ongoing energy transition.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.