Related Topics
Articles published on Web scraping
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
192 Search results
Sort by Recency
- Research Article
- 10.55041/ijcope.v2i4.230
- Apr 11, 2026
- International Journal of Creative and Open Research in Engineering and Management
- Dr Ajay Singh Thakur Dr Ajay Singh Thakur
Data analytics depends significantly on having precise, pertinent, and high-quality data readily available. The core of any analytical procedure is recognizing suitable data sources and utilizing efficient data gathering techniques. This chapter examines the different categories of data sources, including primary, secondary, internal, and external sources, as well as classifications like structured, unstructured, and semi-structured data. It also explores various data collection techniques, such as surveys, interviews, observation, experiments, web scraping, transactional systems, sensors, and application programming interfaces (APIs). The section outlines the benefits and drawbacks of every approach and underscores important factors including data quality, dependability, validity, expenses, time limitations, and ethical concerns. Moreover, it addresses typical issues such as data bias, absence of data, and integration challenges, while offering best practices to guarantee efficient data gathering. Comprehending these ideas enables organizations and analysts to make knowledgeable choices and enhance the overall effectiveness and precision of data analytics procedures. Keywords: Data Analytics, Data Sources, Data Collection Methods, Primary Data, Secondary Data, Structured Data, Unstructured Data, Data Quality, Data Preprocessing, Surveys and Questionnaires, Web Scraping, Big Data, Data Ethics, Data Integration
- Research Article
1
- 10.1177/00031348261441678
- Apr 9, 2026
- The American surgeon
- Andrew C Kuo + 4 more
BackgroundWeb scraping-the automated extraction of data from websites-has become an essential technique for researchers seeking to collect large-scale data that would be impractical to gather manually. Surgeon-scientists increasingly encounter publicly available web data relevant to outcomes research, health services analysis, workforce studies, and policy work, yet technical guidance on implementing web scrapers remains limited in the surgical literature.MethodsThis tutorial provides a clinician-oriented technical guide to web scraping for surgical research. We present key concepts including static vs dynamic websites, CSS selectors, browser automation, rate limiting, and ethical considerations. A complete worked example demonstrates the full pipeline by scraping a surgical research group's publication page (https://www.onetomapanalytics.com) to build a structured bibliometric database.ResultsThe worked example successfully extracts structured publication data-including titles, author lists, abstracts, keywords, and PubMed links-from a JavaScript-rendered website, producing an analysis-ready data set. We demonstrate how this pipeline generalizes to other surgical research applications including hospital price transparency data, residency program characteristics, and quality metrics.ConclusionsWeb scraping is a powerful tool for surgeon-scientists when implemented with technical rigor and ethical responsibility. By anchoring the tutorial to a concrete surgical use case and providing a reusable code template, we equip surgical researchers with the foundational knowledge to design, implement, and adapt web scrapers for their own data collection projects.
- Research Article
- 10.1177/15562646261434135
- Mar 23, 2026
- Journal of empirical research on human research ethics : JERHRE
- Alicia Cappello + 1 more
With the increasing popularity of AI, it is critical to gain an understanding of how academic research ethics are impacted. Unfortunately, in Canada, the Tri-Council Policy Statement: Ethical Conduct for Research Involving Humans, or the TCPS2, lags far behind current technology when it comes to ethical research related to AI. We suspected that REBs at Canadian public universities are struggling to determine how the TCPS2 can or should be interpreted when it comes to AI research. To gauge these challenges, we conducted a survey of REB Chairs/Vice-Chairs and REO managers/supervisors. The results of the survey confirmed our suspicions. They also show the lack of guidance from the Tri-Council and the Panel on Research Ethics is likely making this struggle worse. Through an analysis of the TCPS2, proposed Canadian legislation, and the survey results, we developed a series of recommendations for both the Tri-Council and REBs which may help overcome these challenges and ease the confusion.
- Research Article
- 10.34659/eis.2025.94.3.1056
- Feb 2, 2026
- Economics and Environment
- Małgorzata Tyrańska + 2 more
Purpose: The purpose of this article is to identify the green competencies requirements for candidates employed in green positions. Methodology: The Research sample was gathered using the Web Scraping procedure. Data were analysed using both qualitative (Delphi method) and quantitative methods (descriptive statistics and statistical testing). Findings: Behavioural green competences are more important than functional ones for companies. It is important to state the importance of experience in a contemporary situation on the labour market. Research limitations/implications: One of the biggest limitations is a lack of definition of green competences and a lack of a single frame, which might be the base for comparative research projects. Originality/value: The results constitute a picture of the contemporary labour market.
- Research Article
- 10.51903/6rqk1874
- Jan 21, 2026
- Jurnal Ilmiah Sistem Informasi
- Anita Hakim Nasution + 5 more
Di berbagai negara berkembang, keberlanjutan finansial lembaga pendidikan tinggi swasta sangat bergantung pada jumlah pendaftaran mahasiswa, disebabkan oleh keterbatasan diversifikasi sumber pendapatan. Berbeda dengan universitas negeri yang memperoleh dukungan substansial dari subsidi pemerintah, universitas swasta pada umumnya bergantung pada biaya kuliah sebagai sumber pendapatan utama. Ketergantungan ini mendorong banyak institusi swasta untuk mengalokasikan sumber daya keuangan yang signifikan guna mendukung inisiatif promosi yang dirancang untuk menarik calon mahasiswa. Melalui strategi pencitraan merek (branding) yang efektif, universitas swasta semakin memanfaatkan platform media sosial—terutama Instagram—sebagai alat strategis untuk membangun jejaring sosial serta meningkatkan kesadaran informasi di kalangan audiens berusia setara dengan pelajar sekolah menengah atas. Penelitian ini mengkaji potensi penerapan strategi promosi berbasis algoritma Instagram dengan menggunakan teknik digital marketing influence maximization, khususnya viral marketing dan influencer marketing, yang umumnya diterapkan di sektor komersial. Temuan penelitian menguraikan tiga strategi pemasaran dengan potensi manfaat yang signifikan: memanfaatkan mekanisme viral marketing, mengidentifikasi dan melibatkan key opinion leaders (KOL) yang tepat dalam influencer marketing, serta mengoptimalkan penggunaan tagar (hashtag) secara strategis. Selain itu, penelitian ini berkontribusi terhadap pengembangan pengetahuan dengan menawarkan kerangka konseptual untuk perumusan strategi promosi dan mengusulkan metodologi evaluatif untuk mengukur efektivitas promosi. Pendekatan evaluasi tersebut meliputi penerapan metrik brand influence dan engagement rate, yang ditunjukkan melalui studi kasus singkat.
- Research Article
- 10.54694/stat.2025.45
- Dec 12, 2025
- Statistika: Statistics and Economy Journal
- Adam Slavíček + 2 more
The study analyzes the heterogeneity of the housing stock and its impact on rent levels in Prague. Using web scraping and hedonic log-log regression on data from the SReality portal (20 184 apartments offered), it quantifies the impact of layout, size, and amenities (elevator, terrace, garage, facilities, new construction) on the price per square meter and examines the spatial and temporal variability of these factors. It compares classic indices (Laspeyres, Paasche, Fisher) with their hedonic variants to assess whether changes in the structure of supply distort aggregate statistics. The results show significant spatial heterogeneity of parameters, but only slight differences between traditional and hedonic indices in short-term monitoring.
- Research Article
- 10.34151/jurtek.v18i2.4463
- Dec 1, 2025
- Jurnal Teknologi
- Defrensia Apriliani Kasi + 2 more
Sentiment analysis is a technique in natural language processing that is used to classify user opinions or reviews into positive, negative, or neutral categories. In this study, sentiment analysis was carried out on the reviews of ChatGPT application users on the Google Play Store platform using two classification methods, namely Naive Bayes and Support Vector Machine (SVM). The main objective of this study is to compare the effectiveness of the two methods in classifying user sentiment based on the reviews collected. The data used in this study was obtained through web Scraping techniques from the Google Play Store, then data preprocessing was carried out, including text cleaning, tokenization, removal of meaningless words (stopwords), and stemming. After that, the features are extracted using the Term Frequency-Inverse Document Frequency (TF-IDF) method before the classification process is carried out using the Naive Bayes algorithm and the Support Vector Machine (SVM) with the One Against Rest approach with the RBF Kernel. The results show that the Support Vector Machine (SVM) method is more accurate than Naive Bayes in user sentiment classification. From the model evaluation, Naive Bayes' accuracy was obtained at 91%, while SVM's accuracy reached 92%. These findings indicate that SVM is more effective in identifying user review sentiment towards the ChatGPT application. This research is expected to provide insights for application developers and researchers in improving the performance of machine learning-based sentiment analysis.
- Research Article
- 10.52937/hira.25.5.2.e6
- Nov 28, 2025
- Health Insurance Review & Assessment Service Research
- Ho-Young Choi
The rapid spread of tools such as web scraping and automated macros has made it technically easy-but legally complex-to collect large volumes of healthrelated data from websites and online services.This review compares the principal frameworks in South Korea, the United States, and the European Union to identify conditions for lawful and ethical research use.Baseline privacy statutes (Korea's PIPA, U.S. HIPAA, EU GDPR), sectoral instruments, and enforcement trends reveal convergent requirements: (1) robust de-identification or pseudonymization; (2) a valid legal basis (explicit consent or statutory alternatives for scientific research in the public interest); (3) strict respect for access controls and anti-circumvention rules (no bypassing logins, CAPTCHAs, paywalls, or technical protection measures); (4) transparency and independent oversight (e.g., notices, data-subject rights handling, IRB/ethics review); and (5) safeguards for cross-border transfers, including emerging national-security limits on bulk health datasets.In South Korea, PIPA treats health information as sensitive; pseudonymized data may be used without consent for statistics, scientific research, or archiving under defined safeguards, while cross-controller combinations are confined to designated institutions and API-based sharing is preferred.In the U.S., HIPAA governs research uses by covered entities (authorization or IRB waiver), while non-HIPAA actors face FTC oversight; scraping of publicly accessible pages may avoid CFAA liability but still implicates DMCA and contract/tort claims.In the EU, GDPR requires both an Article 6 basis and an Article 9 condition, with Article 14 transparency even for indirectly collected data; database rights and text-and-data-mining (TDM) rules shape permissible extraction, and the EHDS will expand controlled research access via secure environments.Together, these regimes point to a risk-managed pathway for research that centers lawful sourcing, technical safeguards, and accountable governance.
- Research Article
- 10.55041/isjem05191
- Nov 26, 2025
- International Scientific Journal of Engineering and Management
- Vijay A + 1 more
Abstract The rapid expansion of digital ecosystems and the increasing reliance on data-driven decision-making have created a critical need for efficient, scalable, and adaptable mechanisms to extract structured information from diverse online sources. Traditional web scraping solutions, while sufficient for single-site or small-scale tasks, often fail to meet the complexities associated with multi-site data extraction, evolving web architectures, and dynamic content rendering. To address these challenges, the Modular Web Scraping Pipeline presents a comprehensive and configurable framework engineered to facilitate systematic, multi-source data retrieval with high reliability and minimal manual intervention. Designed using Python and powered by modular components, the pipeline offers a structured methodology for collecting URLs, fetching raw HTML or API responses, parsing and normalizing content, managing storage, and integrating with analytical dashboards or downstream systems. A key strength of the pipeline lies in its modular design philosophy, which decomposes the scraping process into six independent yet interconnected components: URL Collector, File Fetcher, Data Extractor, Automated File Cleanup, Database Management, and Dashboard Integration. This separation not only enhances maintainability but also enables site-specific customization without altering the overall workflow. The use of YAML configuration files allows extraction logic to be defined declaratively, making it possible to adapt quickly to new website structures or modifications in existing ones. This approach significantly reduces the burden of rewrites, increases pipeline flexibility, and ensures that the system remains resilient in the face of frequent web interface updates
- Research Article
1
- 10.3390/buildings15234260
- Nov 25, 2025
- Buildings
- Riccardo Censi + 4 more
The construction sector faces growing challenges in integrating sustainability, risk management, and regulatory compliance, in line with initiatives such as the European Green Deal, the Corporate Sustainability Reporting Directive, and international building standards. However, the systematic adoption of ESG metrics in decision-making remains limited due to fragmented data, the lack of predictive tools, and reliance on static reporting. This study proposes and illustrates a digital framework, based on simulated data, that combines Artificial Intelligence, Process Mining, and Robotic Process Automation to enhance ESG risk assessment in sustainable construction management. The model, formalized through Business Process Model and Notation, integrates Machine Learning for risk weighting and classification, and leverages Web Scraping and Business Intelligence for dynamic data acquisition. A simulated case study involving 100 synthetic construction projects is used to demonstrate the internal logic and quantitative feasibility of the framework, showing how automated data integration and predictive modeling can improve the consistency of ESG risk identification and classification. While the results are illustrative rather than empirical, they confirm the analytical coherence and reproducibility of the proposed workflow. From a scientific perspective, it contributes an integrated methodology that bridges predictive analytics and process management for ESG evaluation. From a practical standpoint, it offers a structured and reproducible workflow to anticipate, classify, and mitigate ESG risks, supporting the construction sector’s transition toward data-driven and sustainability-first management practices.
- Research Article
- 10.55041/ijsrem54144
- Nov 19, 2025
- INTERNATIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT
- Kodi Ajay Kumar Reddy + 3 more
ABSTRACT Amazon’s dynamic pricing ecosystem presents challenges and opportunities for businesses and individuals aiming to optimize purchasing decisions, manage inventory costs, and maintain competitive pricing. This study explores advanced methodologies and technologies for monitoring Amazon prices, focusing on automation, data collection, and analysis. Key technologies include web scraping frameworks such as BeautifulSoup, Selenium, and Playwright; APIs like the Amazon Product Advertising API; and data storage solutions using SQL, NoSQL databases, or cloud services. The research delves Into algorithms utilized in price monitoring, including web scraping parsers, data cleaning pipelines, and machine learning models like Linear Regression for trend analysis, Time Series Forecasting (ARIMA, Prophet) for predicting price fluctuations, and Clustering Algorithms (K-Means, DBSCAN) for grouping price patterns across categories. Additionally, NLP algorithms assist in extracting contextual information from product descriptions, while Anomaly Detection Models flag irregular pricing behavior. The study also evaluates the integration of real-time monitoring systems via streaming technologies such as Apache Kafka and alert systems using cloud-based notification services. Ethical and legal considerations are addressed to ensure compliance with Amazon’s terms of service. The findings highlight how leveraging these technologies and algorithms can empower businesses to enhance profitability and enable consumers to make informed purchasing decisions in a competitive e-commerce landscape. INDEX TERMS E-commerce, Price Tracking, Web Scraping, Market Analytics, Price Forecasting
- Research Article
- 10.55041/ijsrem54152
- Nov 19, 2025
- INTERNATIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT
- Er Jagpreet Singh + 1 more
Abstract—The proliferation of digital financial services has created a fragmented and insecure data landscape, overwhelming individuals seeking to manage their financial health. Traditional cloud-based personal finance tools require users to surrender sensitive data, posing significant privacy risks. This paper introduces ASTRAFIN, a secure, local-first AI agent designed for autonomous financial health analysis. ASTRAFIN leverages on-device Natural Language Processing (NLP) models to parse and extract data from varied sources, including PDF bank statements and UPI SMS, ensuring sensitive information never leaves the user's device. We detail the system's core, a multi-tool AI agent built on the ReAct (Reasoning and Acting) framework, which autonomously reasons about the user's financial state. This agent orchestrates a suite of integrated tools for automated transaction categorization, predictive budget tracking, and web-based deal discovery. A key innovation is the implementation of a stateful memory system using a local vector database, enabling the agent to maintain long-term context and provide personalized, longitudinal analysis. By integrating Python-based ML models, advanced NLP concepts, and secure API integration, ASTRAFIN provides a privacy-centric, intelligent, and autonomous solution to empower individuals in managing their financial well-being. Keywords—AI Agent, Local-First Software, ReAct Framework, Privacy-Preserving NLP, Financial Data Parsing, Vector Database, Autonomous Decision-Making, Personal Finance, Transaction Categorization, Web Scraping
- Research Article
- 10.56028/aemr.15.1.217.2025
- Nov 17, 2025
- Advances in Economics and Management Research
- Zhuoyu Zhao
Against the backdrop of the lagging and information distortion predicament faced by traditional financial health assessment, this paper, based on the theory of behavioral finance, uses Python web crawler technology to dynamically capture Baidu Index as a proxy variable for investor attention, and combines the ten-year financial data of Shanshan Co., Ltd. to explore the relationship between the two through case analysis. Research findings show that during the stable period of business, financial health and attention interact positively, and the improvement of fundamentals is the core driving force for market confidence. During the periods of governance crisis and debt crisis, the impulsive surge in attention driven by emotions was significantly contrary to the continuous deterioration of financial indicators, and the market's response to positive news lagged behind the financial recovery. Research shows that investor attention can serve as a leading indicator of financial health, but it may also exacerbate corporate predicaments due to the amplification of risks by sentiment. Based on this, it is suggested that unstructured attention data be incorporated into the financial health assessment framework. Enterprises need to establish an attention monitoring mechanism to optimize financing strategies, and regulatory authorities should pay attention to the divergence signals between the two to prevent systemic risks. This study provides big data methodological support for expanding the theoretical boundaries of financial health.
- Research Article
1
- 10.3390/data10110174
- Oct 31, 2025
- Data
- Ignacio Molina + 2 more
This paper presents a dataset of Chilean news media coverage during the social unrest and constitutional processes from 2019 to 2023. Using Python-based web scraping with BeautifulSoup and Selenium, we collected articles from 15 Chilean news outlets between 15 November 2019 and 17 December 2023. The initial collection of 1254 articles was filtered to 931 usable data points after removing non-relevant content, duplicates, and articles unrelated to the Chilean social outburst. Each news outlet required specific extraction approaches due to varying HTML structures, with some outlets inaccessible due to paywalls or anti-scraping mechanisms. The dataset is structured in JSON format with standardized fields including title, content, date, author, and source metadata. This resource supports research on media coverage during political events and provides data for Spanish-language processing tasks. The dataset and extraction code are publicly available on GitHub.
- Research Article
- 10.33365/jtk.v20i1.472
- Oct 4, 2025
- Jurnal Tekno Kompak
- Achmad Ulul Azmi Wafiqi + 2 more
Weather is a crucial factor in human life, especially in agriculture, shipping, and other activities. One of the main indicators of extreme weather conditions is convective clouds, which are clouds formed by warm air convection and have the potential to cause heavy rain, lightning, and strong winds. Monitoring convective clouds can currently be done using Himawari-9 satellite imagery, which provides infrared imagery data to detect the presence of clouds based on their peak temperature. However, this data is still raw and difficult for the general public to interpret. To address this issue, this study aims to design and develop a web-based ConvexView application capable of automatically visualizing Himawari-9 satellite imagery, specifically for the Cilacap region. This study also compares two Web Scraping techniques, BeautifulSoup and lxml, to determine the most optimal data extraction technique. The programming language used is Python, with FastAPI support for the backend and React JS for the frontend. The development method used is Prototyping, as it allows development to be carried out in stages and involves users in the design process. This application is designed so that users can view real-time visualizations of convective clouds and download the images. It is hoped that the results of this research will not only contribute in the form of a web-based application, but also serve as a reference for the development of satellite image-based meteorological applications in the future.
- Research Article
- 10.31004/riggs.v4i3.2282
- Aug 14, 2025
- RIGGS: Journal of Artificial Intelligence and Digital Business
- Ferdian Bangkit Wijaya + 1 more
This study addresses the critical disconnect between Indonesia's manufacturing sector and the digitally native Gen Z workforce, focusing on Banten Province, a region with high youth unemployment despite its industrial concentration. We quantitatively assess the digital presence of manufacturing companies on key recruitment platforms, including LinkedIn, Jobstreet, and the government's Karirhub portal, to quantify this gap. Using web scraping techniques with Python, company profile data was systematically collected and analyzed. The findings reveal a limited digital footprint, with overall company presence recorded at 43.24% on LinkedIn, 42.70% on Karirhub, and a notably lower 32.97% on Jobstreet. Significant disparities exist across subsectors; consumer-facing industries like Food, Beverage, and Automotive show high digital engagement, while sectors such as Textiles, Electronics, and Non-metallic Minerals lag considerably. Notably, the Tobacco industry was found exclusively on the government's Karirhub platform. This confirmed digital divide hinders effective talent acquisition and limits job seekers' access to credible information. We conclude that a strategic imperative exists for manufacturers to enhance their digital recruitment strategies. This is crucial not only for attracting Gen Z talent but also for aligning with Indonesia’s national digitalization agenda to reduce unemployment.
- Research Article
3
- 10.1111/edt.70011
- Aug 13, 2025
- Dental Traumatology
- Corinne Berger + 4 more
ABSTRACTBackgroundDomestic abuse (DA) frequently results in injuries to the head, neck, and orofacial regions. Despite the visibility of these injuries, many survivors do not access formal medical or dental care because of fear, stigma, or systemic barriers. Reddit, an anonymous online platform, offers a unique opportunity to examine unfiltered victims/survivors' narratives shared in public forums. The aim of this study was to explore how DA, particularly its physical, psychological, and social impacts, was represented, perceived, and discussed on Reddit. Special attention was given to posts describing injuries to the head, neck, and orofacial region, to understand how victims/survivors narrated their experiences, sought support, and navigated disclosure in anonymous digital spaces.MethodsThis study employed web scraping to analyze Reddit posts from four domestic abuse‐related subreddits (r/AbuseInterrupted, r/DomesticAbuse, r/DomesticViolence, and r/domesticviolence) using Python's Reddit API Wrapper (PRAW). Posts were filtered using anatomical keywords relevant to dental and maxillofacial trauma. After cleaning and manual review, first‐person accounts referencing injuries to the head, neck, or orofacial area underwent qualitative thematic analysis and quantitative content analysis.ResultsA total of 588 Reddit posts related to DA were initially collected. Of the 588 posts, 153 (26.0%) met the inclusion criteria and were retained for analysis. Analysis of the 153 posts meeting the inclusion criteria revealed the most affected regions in DA victims, with frequent descriptions of physical abuse including slapping, grabbing, strangulation, and blunt‐force trauma. Thematic analysis identified four central themes: (1) visible injuries, (2) barriers to accessing medical and dental care, (3) psychological and emotional consequences of abuse, and (4) inconsistent responses from healthcare and legal systems.ConclusionsOral and Maxillofacial injuries may serve as critical red flags of domestic abuse. Even when visible, they are often overlooked by healthcare providers. The findings of this study underscore the need for trauma‐informed training among dental professionals and support the integration of domestic abuse screening protocols into routine oral health care. Additionally, the ethical use of web scraping presents a valuable tool for public health research by amplifying survivor voices and helping to identify intervention gaps that may be missed in clinical or institutional data.
- Research Article
- 10.3103/s0005105525700670
- Aug 1, 2025
- Automatic Documentation and Mathematical Linguistics
- F M Grozovskiy + 1 more
The paper proposes an approach to the automated extraction and structuring of information from text, combining web scraping for data collection from online sources with a large language model for subsequent data mining. As a case study, texts from news publications on technology readiness levels from the CNews website were chosen to test the developed methodology in a specific domain. The model’s accuracy in identifying technology readiness assessments was 84–85%, which is comparable to similar results in other, less specialized tasks.
- Research Article
- 10.3145/epi.2025.ene.34202
- Jul 23, 2025
- Profesional de la información
- Xin-Hui Huang + 1 more
This study evaluates the completeness and representativeness of the SABI database, a widely used commercial source for firm-level data in Spain and Portugal, by comparing it to BORME, the official Spanish business register. Using web scraping techniques, we collected and processed approximately 100,000 BORME publications in PDF format, covering the period from 2010 to 2023. These were transformed into a structured dataset comprising over 1.2 million companies, which we then matched against SABI records from the same period. Our analysis reveals that SABI covers only 38.3% of newly established companies, with significant underrepresentation of younger firms, small enterprises, specific sectors, and certain regions. Furthermore, we find clear evidence of survivorship bias: the longer a company has been dissolved, the less likely it is to appear in SABI. Sectoral and geographic disparities are also substantial, and the coverage is skewed toward firms with higher initial capital and specific legal forms. These findings suggest that SABI represents a non-random subset of the Spanish business population, and caution should be exercised when using it for empirical research. Adjustments for sample bias are recommended to improve the reliability of analyses based on this database.
- Research Article
- 10.26650/d3ai.1665752
- Jul 18, 2025
- Journal of Data Analytics and Artificial Intelligence Applications
- Cemal Öztürk
A Journey Through Reviews: Exploring Machine Learning and Web Scraping to Decode Airline Sentiments