Abstract
This study aimed to cluster and analyze tweets associated with Petrobras, exploring its meaning and user profiles on social media to understand their impact on financial markets. The research applied a workflow including the data collection from Twitter's API (current X), preprocessing of tweets using Python libraries, word vectorization via Bag-of-Words (BoW) and Term Frequency-Inverse Document Frequency (TF-IDF), Principal Component Analysis (PCA) to reduce matrix dimensionality, and the K-means clustering technique. A total of 840 preprocessed tweets were clustered and analyzed for patterns related to Petrobras. Five clusters were identified in the initial analysis with no dimensionality reduction, showcasing differing characteristics, while the subsequent PCA-based analysis yielded three clusters showing contrasting themes in tweets. The PCA-based analysis showed grouped tweets about the market and economy (cluster 0), while cluster 1 was related to political concerns. Limitations included reliance on publicly available Twitter data, constraints due to the quantity and nature of tweets, and potential biases in sentiment analysis due to informal language and sarcasm. The research underscores the potential of unsupervised machine learning techniques in analyzing sentiments and user profiles related to financial markets. Insights derived from tweet clustering could aid investors in gauging market sentiment.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.