Use Of Textual Data Research Articles

Bias in textual data can lead to skewed interpretations and outcomes when the data is used. These biases could perpetuate stereotypes, discrimination, or other forms of unfair treatment. An algorithm trained on biased data may end up making decisions that disproportionately impact a certain group of people. Therefore, it is crucial to detect and remove these biases to ensure the fair and ethical use of data. To this end, we develop a comprehensive and robust framework Nbias that consists of four main layers: data, corpus construction, model development and an evaluation layer. The dataset is constructed by collecting diverse data from various domains, including social media, healthcare, and job hiring portals. As such, we applied a transformer-based token classification model that is able to identify bias words/phrases through a unique named entity BIAS. In the evaluation procedure, we incorporate a blend of quantitative and qualitative measures to gauge the effectiveness of our models. We achieve accuracy improvements ranging from 1% to 8% compared to baselines. We are also able to generate a robust understanding of the model functioning. The proposed approach is applicable to a variety of biases and contributes to the fair and ethical use of textual data.

Read full abstract

Many animals are put up for adoption. In turn, having many available animals can overburden shelters if this number rises above a shelter’s maximum capacity. Pet adoption websites need to identify which characteristics or general information about the animals are most salient to potential pet adopters to increase adoption speed and avoid overcrowding. To solve this problem, we have applied several data mining techniques including regression, decision tree, random forest, gradient boosting, and neural network to find which technique is best at predicting adoption speed (a.k.a. length of stay - LOS), and which variable(s) are considered significant. We applied topic modeling techniques to discover latent features using information extracted from description scripts and fed them into our predictive models. Our ensemble model produced an impressive average squared error (ASE) of 1.15584 weeks. While the use of textual data helped improve the performance of regression and gradient boosting models, other models did not receive any additional benefit from textual data. Our results indicated that pet characteristics such as age, breed, quantity, and sterilization status are key factors that influence adoption speed. This study could further help shelters around the world be able to optimize their adoption process.

Read full abstract

Use Of Textual Data Research Articles

Articles published on Use Of Textual Data

Nbias: A natural language processing framework for BIAS identification in text

Representing and utilizing clinical textual data for real world studies: An OHDSI approach

Pet analytics: Predicting adoption speed of pets from their online profiles

Forecasting Corporate Failure in the Chinese Energy Sector: A Novel Integrated Model of Deep Learning and Support Vector Machine

Combining different evaluation systems on social media for measuring user satisfaction

The Textual Criticism of Middle English Manuscript Traditions: A Survey of Critical Issues in the Interpretation of Textual Data

An automatic video indexing method based on shot classification

Measuring emotion in open-ended survey responses: An application of textual data analysis

An architecture and design approach for a geographical information retrieval system to support retrieval by content and browsing★

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Use Of Textual Data Research Articles

Articles published on Use Of Textual Data

Nbias: A natural language processing framework for BIAS identification in text

Representing and utilizing clinical textual data for real world studies: An OHDSI approach

Pet analytics: Predicting adoption speed of pets from their online profiles

Forecasting Corporate Failure in the Chinese Energy Sector: A Novel Integrated Model of Deep Learning and Support Vector Machine

Combining different evaluation systems on social media for measuring user satisfaction

The Textual Criticism of Middle English Manuscript Traditions: A Survey of Critical Issues in the Interpretation of Textual Data

An automatic video indexing method based on shot classification

Measuring emotion in open-ended survey responses: An application of textual data analysis

An architecture and design approach for a geographical information retrieval system to support retrieval by content and browsing★