Twitter Archives and the Challenges of "Big Social Data" for Media and Communication Research

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Lists and Social MediaLists have long been an ordering mechanism for computer-mediated social interaction. While far from being the first such mechanism, blogrolls offered an opportunity for bloggers to provide a list of their peers; the present generation of social media environments similarly provide lists of friends and followers. Where blogrolls and other earlier lists may have been user-generated, the social media lists of today are more likely to have been produced by the platforms themselves, and are of intrinsic value to the platform providers at least as much as to the users themselves; both Facebook and Twitter have highlighted the importance of their respective “social graphs” (their databases of user connections) as fundamental elements of their fledgling business models. This represents what Mejias describes as “nodocentrism,” which “renders all human interaction in terms of network dynamics (not just any network, but a digital network with a profit-driven infrastructure).”The communicative content of social media spaces is also frequently rendered in the form of lists. Famously, blogs are defined in the first place by their reverse-chronological listing of posts (Walker Rettberg), but the same is true for current social media platforms: Twitter, Facebook, and other social media platforms are inherently centred around an infinite, constantly updated and extended list of posts made by individual users and their connections.The concept of the list implies a certain degree of order, and the orderliness of content lists as provided through the latest generation of centralised social media platforms has also led to the development of more comprehensive and powerful, commercial as well as scholarly, research approaches to the study of social media. Using the example of Twitter, this article discusses the challenges of such “big data” research as it draws on the content lists provided by proprietary social media platforms.Twitter Archives for ResearchTwitter is a particularly useful source of social media data: using the Twitter API (the Application Programming Interface, which provides structured access to communication data in standardised formats) it is possible, with a little effort and sufficient technical resources, for researchers to gather very large archives of public tweets concerned with a particular topic, theme or event. Essentially, the API delivers very long lists of hundreds, thousands, or millions of tweets, and metadata about those tweets; such data can then be sliced, diced and visualised in a wide range of ways, in order to understand the dynamics of social media communication. Such research is frequently oriented around pre-existing research questions, but is typically conducted at unprecedented scale. The projects of media and communication researchers such as Papacharissi and de Fatima Oliveira, Wood and Baughman, or Lotan, et al.—to name just a handful of recent examples—rely fundamentally on Twitter datasets which now routinely comprise millions of tweets and associated metadata, collected according to a wide range of criteria. What is common to all such cases, however, is the need to make new methodological choices in the processing and analysis of such large datasets on mediated social interaction.Our own work is broadly concerned with understanding the role of social media in the contemporary media ecology, with a focus on the formation and dynamics of interest- and issues-based publics. We have mined and analysed large archives of Twitter data to understand contemporary crisis communication (Bruns et al), the role of social media in elections (Burgess and Bruns), and the nature of contemporary audience engagement with television entertainment and news media (Harrington, Highfield, and Bruns). Using a custom installation of the open source Twitter archiving tool yourTwapperkeeper, we capture and archive all the available tweets (and their associated metadata) containing a specified keyword (like “Olympics” or “dubstep”), name (Gillard, Bieber, Obama) or hashtag (#ausvotes, #royalwedding, #qldfloods). In their simplest form, such Twitter archives are commonly stored as delimited (e.g. comma- or tab-separated) text files, with each of the following values in a separate column: text: contents of the tweet itself, in 140 characters or less to_user_id: numerical ID of the tweet recipient (for @replies) from_user: screen name of the tweet sender id: numerical ID of the tweet itself from_user_id: numerical ID of the tweet sender iso_language_code: code (e.g. en, de, fr, ...) of the sender’s default language source: client software used to tweet (e.g. Web, Tweetdeck, ...) profile_image_url: URL of the tweet sender’s profile picture geo_type: format of the sender’s geographical coordinates geo_coordinates_0: first element of the geographical coordinates geo_coordinates_1: second element of the geographical coordinates created_at: tweet timestamp in human-readable format time: tweet timestamp as a numerical Unix timestampIn order to process the data, we typically run a number of our own scripts (written in the programming language Gawk) which manipulate or filter the records in various ways, and apply a series of temporal, qualitative and categorical metrics to the data, enabling us to discern patterns of activity over time, as well as to identify topics and themes, key actors, and the relations among them; in some circumstances we may also undertake further processes of filtering and close textual analysis of the content of the tweets. Network analysis (of the relationships among actors in a discussion; or among key themes) is undertaken using the open source application Gephi. While a detailed methodological discussion is beyond the scope of this article, further details and examples of our methods and tools for data analysis and visualisation, including copies of our Gawk scripts, are available on our comprehensive project website, Mapping Online Publics.In this article, we reflect on the technical, epistemological and political challenges of such uses of large-scale Twitter archives within media and communication studies research, positioning this work in the context of the phenomenon that Lev Manovich has called “big social data.” In doing so, we recognise that our empirical work on Twitter is concerned with a complex research site that is itself shaped by a complex range of human and non-human actors, within a dynamic, indeed volatile media ecology (Fuller), and using data collection and analysis methods that are in themselves deeply embedded in this ecology. “Big Social Data”As Manovich’s term implies, the Big Data paradigm has recently arrived in media, communication and cultural studies—significantly later than it did in the hard sciences, in more traditionally computational branches of social science, and perhaps even in the first wave of digital humanities research (which largely applied computational methods to pre-existing, historical “big data” corpora)—and this shift has been provoked in large part by the dramatic quantitative growth and apparently increased cultural importance of social media—hence, “big social data.” As Manovich puts it: For the first time, we can follow [the] imaginations, opinions, ideas, and feelings of hundreds of millions of people. We can see the images and the videos they create and comment on, monitor the conversations they are engaged in, read their blog posts and tweets, navigate their maps, listen to their track lists, and follow their trajectories in physical space. (Manovich 461) This moment has arrived in media, communication and cultural studies because of the increased scale of social media participation and the textual traces that this participation leaves behind—allowing researchers, equipped with digital tools and methods, to “study social and cultural processes and dynamics in new ways” (Manovich 461). However, and crucially for our purposes in this article, many of these scholarly possibilities would remain latent if it were not for the widespread availability of Open APIs for social software (including social media) platforms. APIs are technical specifications of how one software application should access another, thereby allowing the embedding or cross-publishing of social content across Websites (so that your tweets can appear in your Facebook timeline, for example), or allowing third-party developers to build additional applications on social media platforms (like the Twitter user ranking service Klout), while also allowing platform owners to impose de facto regulation on such third-party uses via the same code. While platform providers do not necessarily have scholarship in mind, the data access affordances of APIs are also available for research purposes. As Manovich notes, until very recently almost all truly “big data” approaches to social media research had been undertaken by computer scientists (464). But as part of a broader “computational turn” in the digital humanities (Berry), and because of the increased availability to non-specialists of data access and analysis tools, media, communication and cultural studies scholars are beginning to catch up. Many of the new, large-scale research projects examining the societal uses and impacts of social media—including our own—which have been initiated by various media, communication, and cultural studies research leaders around the world have begun their work by taking stock of, and often substantially extending through new development, the range of available tools and methods for data analysis. The research infrastructure developed by such projects, therefore, now reflects their own disciplinary backgrounds at least as much as it does the fundamental principles of computer science. In turn, such new and often experimental tools and methods necessarily also provoke new epistemological and methodological challenges. The Twitter API and Twitter ArchivesThe Open

Similar Papers
  • Research Article
  • Cite Count Icon 115
  • 10.5204/mcj.620
Mining One Percent of Twitter: Collections, Baselines, Sampling
  • Mar 2, 2013
  • M/C Journal
  • Carolin Gerlitz + 1 more

The objective of the paper is to reflect on the affordances of different techniques for making Twitter collections and to suggest the use of a random sampling technique, made possible by Twitter’s Streaming API (Application Programming Interface), for baselining, scoping, and contextualising practices and issues. It discusses this technique by analysing a one per cent sample of all tweets posted during a 24-hour period and introducing a number of analytical directions considered useful for qualifying some of the core elements of the platform, in particular hashtags. To situate the proposal, the report first discusses how platforms propose particular affordances but leave considerable margins for the emergence of a wide variety of practices. This argument is then related to the question of how medium and sampling technique are intrinsically connected. Background Social media platforms present numerous challenges to empirical research, making it different from researching cases in offline environments, but also different from studying the “open” Web. Because of the limited access possibilities and the sheer size of platforms like Facebook or Twitter, the question of delimitation, i.e. the selection of subsets to analyse, is particularly relevant. Whilst sampling techniques have been thoroughly discussed in the context of social science research, sampling procedures in the context of social media analysis are far from being fully understood. Even for Twitter, a platform having received considerable attention from empirical researchers due to its relative openness to data collection, methodology is largely emergent. In particular the question of how smaller collections relate to the entirety of activities of the platform is quite unclear. Recent work comparing case based studies to gain a broader picture and the development of graph theoretical methods for sampling are certainly steps in the right direction, but it seems that truly large-scale Twitter studies are limited to computer science departments, where epistemic orientation can differ considerably from work done in the humanities and social sciences.

  • Research Article
  • Cite Count Icon 24
  • 10.5210/fm.v21i5.6358
A scholarly divide: Social media, Big Data, and unattainable scholarship
  • Apr 24, 2016
  • First Monday
  • Asta Zelenkauskaite + 1 more

Recent decades have witnessed an increased growth in data generated by information, communication, and technological systems, giving birth to the ‘Big Data’ paradigm. Despite the profusion of raw data being captured by social media platforms, Big Data require specialized skills to parse and analyze — and even with the requisite skills, social media data are not readily available to download. Thus, the Big Data paradigm has not produced a coincidental explosion of research opportunities for the typical scholar. The promising world of unprecedented precision and predictive accuracy that Big Data conjure remains out of reach for most communication and technology researchers, a problem that traditional platforms, namely mass media, did not present. In this paper, we evaluate the system architecture that supports the storage and retrieval of big social data, distinguishing between overt and covert data types, and how both the cost and control of social media data limit opportunities for research. Ultimately, we illuminate a curious but growing ‘scholarly divide’ between researchers with the technical know-how, funding, or institutional connections to extract big social data and the mass of researchers who merely hear big social data invoked as the latest, exciting trend in unattainable scholarship.

  • Book Chapter
  • Cite Count Icon 4
  • 10.1007/978-94-024-1202-4_3-1
Big Social Data Approaches in Internet Studies: The Case of Twitter
  • Jan 1, 2018
  • Axel Bruns

Well beyond Internet Studies itself, but arguably led by it to a considerable extent, there has been a turn towards computational methods in the study of social and communicative phenomena at large scale. This “computational turn” has commonly been described as a turn towards “big data” or, more specifically, towards “big social data,” and it continues to drive the development of new research methodologies, approaches, and tools. Internet Studies has been an advocate of “big data” approaches, because the field connects several core disciplines that use “big data” methods – media, communication and cultural studies, the social sciences, and computer science. Equally, the major objects of research in Internet Studies – including platforms, search engines, mobile apps and devices, and Internet technologies and networks themselves – are key sources of “big data” on user interests, attitudes, and activities. Proponents of such approaches suggest that it is becoming possible to “study society with the Internet,” while others ask critical questions about which observations are privileged and which are discounted as the logic of “big data” influences research agendas. The early development and application of “big social data” research methods in Internet Studies, as well as critical interrogations of such approaches, focused especially on research into Twitter as a global social media platform. This is largely due to Twitter’s (initially) highly accessible application programming interface (API), which enabled the development of powerful research methods and the promise of large, sometimes real-time, datasets tracing patterns of user activity around specific themes and topics on the platform, as well as, by proxy, in wider society. Twitter’s tightening of API access serves as a reminder of the precarious nature of “big social data” research drawing on proprietary datasets, just as concerns about the use of social media data for the social profiling of individual users raise questions about research ethics and user privacy. The growing body of “big data” research drawing on Twitter as a data source has paradoxically also underlined the many limitations and blind spots of such approaches, as researchers drawing on publicly available API data struggle to place their findings in the context of a platform whose overall global shape is shrouded in considerably more mystery, due to Twitter, Inc.’s interest in keeping aspects of the platform and its user community commercial-in-confidence. The increased work in this field also highlights shortcomings in research training and publishing models, which need to be addressed to further develop “big social data” research. This chapter outlines the current state of the art in computationally driven Twitter research, using platform-specific research as a case study for the computational turn in Internet Studies. It will consider the opportunities and challenges inherent in this shift toward more data-driven research and outline the key needs for the discipline which have emerged to date. Even as Twitter’s own fortunes fluctuate, the experiences made in this branch of Internet Studies stand as a guide for broader developments in our field.

  • PDF Download Icon
  • Preprint Article
  • 10.7287/peerj.preprints.1107v1
Social media as a big public health data source: review of the international bibliography
  • May 21, 2015
  • Evika Karamagioli

Background: As the use of social media creates huge amounts of data, the need for big data analysis has to synthesize the information and determine which actions is generated. Online communication channels such as Facebook, Twitter, Instagram etc provide a wealth of passively collected data that may be mined for public health purposes such as health surveillance, health crisis management, and last but not least health promotion and education. Objective: We explore international bibliography on the potential role and perceptive of use for social media as a big data source for public health purposes. Method: Systematic literature review. Data extraction and synthesis was performed with the use of thematic analysis. Results: Examples of those currently collecting and analyzing big data from generated social content include scientists who are working with the Centers for Disease Control and Prevention to track the spread of flu by analyzing what user searches, and the World Health Organization is working on disaster management relief. But what exactly do we do with this big social media data? We can track real-time trends and understand them quicker through the platforms and processing services. By processing this big social media data, it is possible to determine specific patterns in conversation topics, users behaviors, overall trends and influencers, sociodemographic characteristics, lifestyle behaviors, and social and cultural constructs. Conclusion: The key to fostering big data and social media converge is process and analyze the right data that may be mined for purposes of public health, so as to provide strategic insights for planning, execution and measurement of effective and efficient public health interventions. In this effort, political, economic and legal obstacles need to be seriously considered.

  • Single Book
  • Cite Count Icon 41
  • 10.1201/b19513
Graph-Based Social Media Analysis
  • Apr 19, 2016
  • Ioannis Pitas

Focused on the mathematical foundations of social media analysis, Graph-Based Social Media Analysis provides a comprehensive introduction to the use of graph analysis in the study of social and digital media. It addresses an important scientific and technological challenge, namely the confluence of graph analysis and network theory with linear algebra, digital media, machine learning, big data analysis, and signal processing. Supplying an overview of graph-based social media analysis, the book provides readers with a clear understanding of social media structure. It uses graph theory, particularly the algebraic description and analysis of graphs, in social media studies. The book emphasizes the big data aspects of social and digital media. It presents various approaches to storing vast amounts of data online and retrieving that data in real-time. It demystifies complex social media phenomena, such as information diffusion, marketing and recommendation systems in social media, and evolving systems. It also covers emerging trends, such as big data analysis and social media evolution. Describing how to conduct proper analysis of the social and digital media markets, the book provides insights into processing, storing, and visualizing big social media data and social graphs. It includes coverage of graphs in social and digital media, graph and hyper-graph fundamentals, mathematical foundations coming from linear algebra, algebraic graph analysis, graph clustering, community detection, graph matching, web search based on ranking, label propagation and diffusion in social media, graph-based pattern recognition and machine learning, graph-based pattern classification and dimensionality reduction, and much more. This book is an ideal reference for scientists and engineers working in social media and digital media production and distribution. It is also suitable for use as a textbook in undergraduate or graduate courses on digital media, social media, or social networks.

  • Research Article
  • Cite Count Icon 18
  • 10.5204/mcj.1379
Alts and Automediality: Compartmentalising the Self through Multiple Social Media Profiles
  • Apr 25, 2018
  • M/C Journal
  • Emily Van Der Nagel

IntroductionAlt, or alternative, accounts are secondary profiles people use in addition to a main account on a social media platform. They are a kind of automediation, a way of representing the self, that deliberately displays a different identity facet, and addresses a different audience, to what someone considers to be their main account. The term “alt” seems to have originated from videogame culture and been incorporated into understandings of social media accounts. A wiki page about alternate accounts on virtual world Second Life calls an alt “an account used by a resident for something other than their usual activity or to do things in privacy” (n.p.).Studying alts gives an insight into practices of managing and contextualising identities on networked platforms that are visible, persistent, editable, associable (Treem and Leonardi), spreadable, searchable (boyd), shareable (Papacharissi "Without"), and personalised (Schmidt). When these features of social media are understood as limitations that lead to context collapse (Marwick and boyd 122; Wesch 23), performative incoherence (Papacharissi Affective 99), and the risk of overexposure, people respond by developing alternative ways to use platforms.Plenty of scholarship on social media identities claims the self is fragmented, multifaceted, and contextual (Marwick 355; Schmidt 369). But the scholarship on multiple account use on single platforms is still emerging. Joanne Orlando writes for The Conversation that teens increasingly have more than one account on Instagram: “finstas” are “fake” or secondary accounts used to post especially candid photos to a smaller audience, thus they are deployed strategically to avoid the social pressure of looking polished and attractive. These accounts are referred to as “fake” because they are often pseudonymous, but the practice of compartmentalising audiences makes the promise that the photos posted are more authentic, spontaneous, and intimate. Kylie Cardell, Kate Douglas, and Emma Maguire (162) argue that while secondary accounts promise a less constructed version of life, speaking back to the dominant genre of aesthetically pleasing Instagram photos, all social media posts are constructed within the context of platform norms and imagined audiences (Litt & Hargittai 1). Still, secondary accounts are important for revealing these norms (Cardell, Douglas & Maguire 163). The secondary account is particularly prevalent on Twitter, a platform that often brings together multiple audiences into a public profile. In 2015, author Emily Reynolds claimed that Twitter alts were “an appealingly safe space compared to main Twitter where abuse, arguments and insincerity are rife” (n.p.).This paper draws on a survey of Twitter users with alts to argue that the strategic use of pseudonyms, profile photos without faces, locked accounts, and smaller audiences are ways to overcome some of the built-in limitations of social media automediality.Identity Is Multiple Chris Poole, founder of anonymous bulletin board 4chan, believes identity is a fluid concept, and designed his platform as a space in which people could connect over interests, not profiles. Positioning 4chan against real-name platforms, he argues:Your identity is prismatic […] we’re all multifaceted people. Google and Facebook would have you believe that you’re a mirror, that there is one reflection that you have, there is one idea of self. But in fact we’re more like diamonds. You can look at people from any angle and see something totally different, but they’re still the same. (n.p.)Claiming that identities are contextual performances stems from longstanding sociological and philosophical work on identity from theorists like Erving Goffman, who in the 1950s proposed a dramaturgical framework of the self to consider interactions as fundamentally social and performative rather than reflecting one core, essential inner self.Social media profiles allow people to use the language of the platform to represent themselves (Marwick 362), meaning identity performances are framed by platform architecture and features, formal and informal rules, and social ties (Schmidt 369). Social media profiles shape how people can engage in how they represent themselves, argue Shelly Farnham and Elizabeth Churchill, who claim that the assumption that a single, unified online identity is sufficient is a problematic trend in platform design. They argue that when facets of their lives are incompatible, people segment those lives into separate areas in order to maintain social norms and boundaries.Sidonie Smith and Julia Watson consider identity multiplicities to be crucial to automediality, which is built on an aesthetic of bricolage and pastiche rather than understanding subjectivity to be the essence of the self. In her work on automediality and online girlhood, Maguire ("Home"; "Self-Branding" 74) argues that an automedial approach attends to how mediation shapes the way selves can be represented online, claiming that the self is brought into being through these mediation practices.This article understands alt accounts as a type of social media practice that Nick Couldry (52) identifies as presencing: sustaining a public presence with media. I investigate presencing through studying alts as a way to manage separate publics, and the tension between public and private, on Twitter by surveying users who have a main and an alt account. Although research into multiple account use is nascent, Alice Marwick lists maintaining multiple accounts as a tactic to mitigate context collapse, alongside other strategies such as using nicknames, only sharing posts when they are appropriate for multiple audiences, and keeping more personal interactions to private messenger and text message.Ben Light argues that while connection is privileged on social media, disconnective practices like editing out, deleting, unfriending, untagging, rejecting follower requests, and in this case, creating alt accounts, are crucial. Disconnecting from some aspects of the social media experience allows people to stay connected on a particular platform, by negotiating the dynamics that do not appeal to them. While the disconnective practice of presencing through an alt has not been studied in detail, research I discuss in the next section focuses on multi-account use to argue that people who have more than one account on a single platform are aware of their audiences, and want control over which people see which posts.Multi-Platform and Multi-Account UseA conference presentation by Frederic Stutzman and Woodrow Hartzog calls maintaining multiple profiles on a single platform a strategy for boundary regulation, through which access is selectively granted to specific people. Stutzman and Hartzog interviewed 20 people with multiple profiles to determine four main motives for this kind of boundary regulation: privacy, identity management, utility (using one profile for a distinct purpose, like managing a restaurant page), and propriety (conforming to social norms around appropriate disclosure).Writing about multiple profiles on Reddit, Alex Leavitt argues that temporary or “throwaway” accounts give people the chance to disclose sensitive or off-topic information. For example, some women use throwaways when posting to a bra sizing subreddit, so men don’t exploit their main account for sexual purposes. Throwaways are a boundary management technique Leavitt considers beneficial for Redditors, and urges platform designers to consider implementing alternatives to single accounts.Jessa Lingel and Adam Golub also call for platforms to allow for multiple accounts, suggesting Facebook should let users link their profiles at a metadata level and be able to switch between them. They argue that this would be especially beneficial for those who take on specific personas, such as drag queens. In their study of drag queens with more than one Facebook profile, Lingel and Golub suggest that drag queens need to maintain boundaries between fans and friends, but creating a separate business page for their identity as a performer was inadequate for the kind of nuanced personal communication they engaged in with their fans. Drag queens considered this kind of communication relationship maintenance, not self-branding. This demonstrates that drag queens on Facebook are attentive to their audience, which is a common feature of users posting to social media: they have an idea, no matter how accurate, of who they are posting to.Eden Litt and Eszter Hargittai (1) call this perception the imagined audience, which serves as a guide for how to present the self and what to post about when an audience is unknown or not physically present. People in their study would either claim they were posting to no-one in particular, or that they had an audience in mind, whether this was personal ties (close friends, family, specific individuals like a best friend), communal ties (people interested in cleaning tips, local art community, people in Portland), professional ties (colleagues, clients, my radio show audience), and phantasmal ties (people with whom someone has an imaginary relationship, like famous people, brands, animals, and the dead).Based on these studies of boundary regulation, throwaway accounts, separate Facebook pages for fans and friends, and imagined audiences on social media, I designed a short survey that would prompt respondents to reflect on their own practices of negotiating platform limitations through their alt account.Asking Twitter about AltsTo research alts, I asked my own Twitter followers to tell me about theirs. I’ve been tweeting from @emvdn since 2010, and I have roughly 5,500 followers, mostly Melbourne academics, writers, and professionals. This method of asking my own Twitter followers questions builds on a study by Alice Marwick and danah boyd, in which they investigated context collapse on social media by tweeting questions like “w

  • Research Article
  • Cite Count Icon 6
  • 10.1002/isd2.12179
Ethics, big social data, data sharing, and attitude among the millennial generation: A case of Thailand
  • Apr 20, 2021
  • THE ELECTRONIC JOURNAL OF INFORMATION SYSTEMS IN DEVELOPING COUNTRIES
  • Suttisak Jantavongso + 1 more

Big social data and digital technologies create tremendous opportunities but raise questions and concerns on ethical data usage and sharing. Moreover, big social data plays a vital role in Thailand's 20‐year national strategy to turn Thailand into a developed nation by 2037, especially on security and human capital development strategies. Nonetheless, the progress in big social data must go hand‐in‐hand with ethical standards. To date, there are no universal ethical criteria for big social data sharing and governance. This study investigates the ethical issues of big data in social media. It maps big social data to workable ethical theories. The model of big social data sharing factors was proposed. Using Thailand as a case study, the exploratory study examined the digital behaviors and moral perceptions of the millennials' big social data sharing through 71 in‐depth interviews. The results revealed a strong pattern toward “ethical consequentialism” among the Thai millennials. Examining these findings fosters the formation of big social data ethics from the views of the data generators. This study has attempted to contribute to scholarship in the growing body of work on appropriate ethical guidelines for big social data sharing and help Thailand achieve its national strategy.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 66
  • 10.2196/jmir.7634
Images of Little Cigars and Cigarillos on Instagram Identified by the Hashtag #swisher: Thematic Analysis
  • Jul 14, 2017
  • Journal of Medical Internet Research
  • Jon-Patrick Allem + 4 more

BackgroundLittle cigar and cigarillo use is becoming more prevalent in the United States and elsewhere, with implications for public health. As little cigar and cigarillo use grows in popularity, big social media data (eg, Instagram, Google Web Search, Twitter) can be used to capture and document the context in which individuals use, and are marketed, these tobacco products. Big social media data may allow people to organically demonstrate how and why they use little cigars and cigarillos, unprimed by a researcher, without instrument bias and at low costs.ObjectiveThis study characterized Swisher (the most popular brand of cigars in the United States, controlling over 75% of the market share) little cigar- and cigarillo-related posts on Instagram to inform the design of tobacco education campaigns and the development of future tobacco control efforts, and to demonstrate the utility in using big social media data in understanding health behaviors.MethodsWe collected images from Instagram, an image-based social media app allowing users to capture, customize, and post photos on the Internet with over 400 million active users. Inclusion criteria for this study consisted of an Instagram post with the hashtag “#swisher”. We established rules for coding themes of images.ResultsOf 1967 images collected, 486 (24.71%) were marijuana related, 348 (17.69%) were of tobacco products or promotional material, 324 (16.47%) showed individuals smoking, 225 (11.44%) were memes, and 584 (29.69%) were classified as other (eg, selfies, food, sexually explicit images). Of the marijuana-related images, 157/486 (32.3%) contained a Swisher wrapper, indicating that a Swisher product was used in blunt making, which involves hollowing out a cigar and refilling it with marijuana.ConclusionsImages from Instagram may be used to complement and extend the study of health behaviors including tobacco use. Images may be as valuable as, or more valuable than, words from other social media platforms alone. Posts on Instagram showing Swisher products, including blunt making, could add to the normalization of little cigar and cigarillo use and is an area of future research. Tobacco control researchers should design social media campaigns to combat smoking imagery found on popular sites such as Instagram.

  • Book Chapter
  • Cite Count Icon 1
  • 10.1007/978-94-024-1202-4_15-1
Blended Data: Critiquing and Complementing Social Media Datasets, Big and Small
  • Jan 1, 2018
  • Sky Croeser + 1 more

Internet research, and especially social media research, has benefited from concurrent factors, technological and analytical, that have enabled access to vast amounts of user data and content online. These trends have accompanied a prevalence of Big Data studies of online activity, as researchers gather datasets featuring millions of tweets, for instance – here, Big Data is a reference not solely to the size of datasets but to the wider practices and research cultures around large-scale and exhaustive (and often ongoing) capture of data from large groups, often (but not always) studied quantitatively (see Kitchin and Lauriaut 2014a; Crawford et al. 2014). However, the accessibility of “big social data” (Manovich 2012) for Internet studies research is not without its limitations and challenges, and while extensive datasets enable valuable research, combining them with small data can provide more rounded perspectives and encourage us to think more about what we are studying. Similarly, privileging the online-only or the quantitative analysis of social media activity may overlook or mask key practices and relevant participants not present within the datasets. We argue for a blended data model as a critique and complement for different social media datasets, drawing in part on our research into social movements and activists’ use (and non-use) of online technologies. Together, these approaches may overcome and negotiate the respective limits and challenges of social media data, both big and small.

  • Research Article
  • Cite Count Icon 2
  • 10.2218/ijdc.v17i1.823
Data Curation Strategies to Support Responsible Big Social Research and Big Social Data Reuse
  • Dec 6, 2022
  • International Journal of Digital Curation
  • Sara Mannheimer

Big social research repurposes existing data from online sources such as social media, blogs, or online forums, with a goal of advancing knowledge of human behavior and social phenomena. Big social research also presents an array of challenges that can prevent data sharing and reuse. This brief report presents an overview of a larger study that aims to understand the data curation implications of big social research to support use and reuse of big social data. The study, which is based in the United States, identifies six key issues relating to big social research and big social data curation through a review of the literature. It then further investigates perceptions and practices relating to these six key issues through semi-structured interviews with big social researchers and data curators. This report concludes with implications for data curation practice: metadata and documentation, connecting with researchers throughout the research process, data repository services, and advocating for community standards. Supporting responsible practices for using big social data can help scale up social science research, thus enhancing our understanding of human behavior and social phenomena.

  • Research Article
  • Cite Count Icon 9
  • 10.5250/resilience.5.2.0172
The Digital Anthropocene, Deep Mapping, and Environmental Humanities' Big Data
  • Jan 1, 2018
  • Resilience: A Journal of the Environmental Humanities
  • Charles Travis

The Digital Anthropocene, Deep Mapping, and Environmental Humanities' Big Data Charles Travis (bio) Over the past two hundred years, the development of the steam engine, the mass burning of coal during the Industrial Revolution, the detonation of the atomic bomb in 1945, and global carbon dioxide emissions over the last half century are all manifestations of human-technological agencies that have culminated into a cultural crisis ushering us out of the Holocene and into the Anthropocene. As we advance into the twenty-first century, our use of social media, smartphones and smart-watches, X-Boxes, tablets, and laptops have transformed us into living, breathing remote sensors and unwitting environmental actors. We are now spawning digital wildfires; churning out oceans of big data; and in our quotidian existences, inaugurating what can be called the digital Anthropocene. This confluence of the digital revolution, the dilemma of climate change, and sociopolitical agency and violence has us reconsidering human-environmental relations by raising questions about the interplay between digital, social, psychological, built, and natural landscapes. As Finn Arne Jørgensen notes, the "idea of nature is becoming very hard to separate from the digital tools and media we use to observe, interpret, and manage it" (2014, 109). The intermeshing of analogue, digital, and natural environments captures this new human dispensation and was presciently anticipated by political theorist Hannah Arendt in Between Past and Future: "The world we have come to live in, however, is much more determined by man acting into nature, creating [End Page 172] natural processes and directing them into the human artifice and the realm of human affairs" (1961, 59). Arendt's phenomenological thought resonates with the "wicked problems," "humanities innovations," and "interdependencies" articulated by the "Common Threads" page of the Andrew W. Mellon–funded Humanities for the Environment project. This essay will discuss a technophenomenological deep mapping of James Joyce's Ulysses (1922) to explore how the novel and its traces of the Odyssey and the Inferno, when scripted digitally, enabled big-data social media performances at Bloomsday in contemporary Dublin. Spanning the classical, medieval, and modern eras, the arc of works composed by Homer, Dante, and Joyce, approximate the "three humanisms" of occidental history posited by Claude Lévi-Strauss in the 1950s (the rediscovery of the Greco-Roman, the repurposing of the humanistic perspective, and the discovery of everyday experience). Currently, digital humanism, coined by Milad Doueihi (2013), acts as a fourth convergence of the world's complex cultural heritage and technology and is changing relations between territory, knowledge, and habitat. This underscores the salience of Bethany Nowviskie's observation that the "rhetorical, technological, aesthetic, and deeply personal, sometimes even sentimental, struggles brought into focus by the Anthropocene […] prompt us to position the work of the digital humanities in time" (2014). The digital humanities' first wave (1980s–2010) witnessed the digitization of historical, cultural, literary, and artistic collections, facilitating online research methods and pedagogy, which dovetailed with a second wave (2002–2012) of humanities-computing quantification exercises, digital parsing, analysis, and visualization projects. Currently, a third wave (2012–2020) is cresting with the ontological tide turning, as humanities discourses and tropes are now beginning to shape emerging coding and software applications. The digital and environmental humanities are coming into league with smartphone applications, gaming platforms, tablets, and the visual and performing arts to force trans-disciplinary encounters between fields as diverse as human cognition, environmental studies, genetics, bioinformatics, linguistics, gaming, architecture, philosophy, social media, literature, painting, and history (MacTavish and Rockwell 2006; Liu and Thomas 2012; Travis 2015). Influenced by narrative, storytelling, cinematic, gaming, and network analysis techniques, these digital and environmental humanities practices represent the fluidity of human-environmental symbiosis captured [End Page 173] by the concept of the Anthropocene, in contrast to the static snapshots of human-environmental binaries portrayed within the frame of the Holocene. Nowviskie states that there is a strong possibility for connecting such "technologies and patterns of work in the humanities to deep time: both to times long past and very far in prospect" (2014). Similar lessons in how to plumb the depths of the Anthropocene can be learned from the Native American writer William Least Heat-Moon, who first employed deep...

  • Research Article
  • Cite Count Icon 1
  • 10.1360/n972014-00292
Overlapped user-based cross-network analysis: Exploring variety in big social media data
  • Dec 1, 2014
  • Chinese Science Bulletin
  • DongYuan LU + 2 more

Social media contributes much to big data. Among the 4V characteristics of big data, this article focuses on investigating the in big social media data. Social media variety mainly concerns with the heterogeneous user behaviors in differenet social media networks. Understanding into social emdia variety plays important roles in insightful social media analysis and comprehensive social media applications. Social meida is typically generated from user and desinged for user services. We propose to explore social media variety by investigating the overlapped users between different social media networks. Two problems are discussed: (1) cross-network user modeling, where the scattered user behaviors are integrated for complete user modeling and personalized service development; (2) heterogeneous knowledge association, where the overlapped users serve as bridge to mine the cross-network knowledge association and applied in social media collaborative applications.

  • Research Article
  • Cite Count Icon 156
  • 10.1108/jkm-07-2015-0296
Managing extracted knowledge from big social media data for business decision making
  • Apr 3, 2017
  • Journal of Knowledge Management
  • Wu He + 2 more

PurposeThis paper aims to propose a knowledge management (KM) framework for leveraging big social media data to help interested organizations integrate Big Data technology, social media and KM systems to store, share and leverage their social media data. Specifically, this research focuses on extracting valuable knowledge on social media by contextually comparing social media knowledge among competitors.Design/methodology/approachA case study was conducted to analyze nearly one million Twitter messages associated with five large companies in the retail industry (Costco, Walmart, Kmart, Kohl’s and The Home Depot) to extract and generate new knowledge and to derive business decisions from big social media data.FindingsThis case study confirms that this proposed framework is sensible and useful in terms of integrating Big Data technology, social media and KM in a cohesive way to design a KM system and its process. Extracted knowledge is presented visually in a variety of ways to discover business intelligence.Originality/valuePractical guidance for integrating Big Data, social media and KM is scarce. This proposed framework is a pioneering effort in using Big Data technologies to extract valuable knowledge on social media and discover business intelligence by contextually comparing social media knowledge among competitors.

  • Research Article
  • 10.5465/ambpp.2017.15007symposium
At the Interface of Social Media Analytics, Big Data and Social Movements: Research Challenges
  • Aug 1, 2017
  • Academy of Management Proceedings
  • Pratyush Bharati + 3 more

Social media is being employed to build support for social, economic, and political justice (Selander et al. 2016; Vaast et al. 2014) in movements such as occupy Wall Street, violence against women's movement and global sustainability movement. These movements have used social media in ways that goes beyond simple communications. As social media allows people to produce and share user generated content, they enable a certain set of affordances of these technologies (Bharati et al 2015; Bharati et al 2014). The social media affordances, available to both collective and individual actors, translate into capabilities afforded to social movements (Tufecki 2014). Research on connective action has examined the effects of digital action repertoires on interaction and engagement such as in the Tea Party and Occupy movements (Agarwal et al. 2014; Selander et al. 2016). Social media can also facilitate mobilization of movement and participation by new volunteers and oftentimes provides a transnational character by diffusing actions beyond the virtual (Van Laer and Van Aelst 2010). Conversation, an essential part of social movements, shapes “social life by altering individual and collective understandings, by creating and transforming social ties, by generating cultural materials that are then available for subsequent social interchange, and by establishing, obliterating, or shifting commitments on the part of participants&x201D; (Tilly 2002, p. 122). In a personal interaction that involves repeated organized interactions between individuals, typically, leads to shared values and trust. The role of social media technologies in furthering this conversation has to be studied and its' influence on social movement ascertained. A few scholars have started to investigate social media affordances and capabilities, especially focused on discourse, during contentious collective action. Still the research has been limited to studying mechanisms of participation, development of a sense of collective identity, creation of community, and framing of political discourse (Farrell 2012; Garrett 2006). Social media data on social movements can involve impersonal “like&x201D; and “share&x201D; to more engaged conversations. Twitter, Instagram, Facebook, and YouTube offer a wide communicative and discourse reach in networks with text, image, audio, and video data. This social media based big data, consisting mostly of unstructured data, comes in the form of social media posts, digital pictures and videos. The symposium will discuss how formal organizational structures and practices might be integrated with social media capabilities to reinforce and enhance social movement organizations as leaders in social change movements. It will explore the characteristics of social media discourse and assess the dynamics of social movements and, subsequent, impact on real-world protests. The symposium will also demonstrate how discourse analysis can be applied visually in order to understand communication patterns. The panel symposium will focus on theoretical and methodological challenges of social media analytics, big data and social movements. Panelists will engage the audience in an interactive discussion on: 1) Theoretical challenges: a. How and why are social movement recruitment and engagement mechanisms being impacted as a result of social media? b. How do we advance theory on social media and social movements when we are overwhelmed with social media based big data? c. What approach should we undertake if big data analysis contradicts most theories on social media and social movements? d. How do we address the issue of generalizability of social media and social movement research when data collection was limited to one social media platform, albeit involving big data? 2) Methodological challenges: a. What methodological approaches have worked in the analysis of social media, big data and social movements? b. How can discourse analysis be applied visually in order to understand communication patterns evident in social media-based big data? c. How can we employ social media analytics to investigate image and video data? d. What combinations of qualitative and quantitative methodologies be employed for big data and social media analytics in the context of social movements? e. What are the limitations of quantitative data analysis techniques, such as structural equation modeling, because of an extremely large sample size? f. What are the limitations of qualitative data analysis techniques as they become extremely labor intensive and, maybe even, impractical because of big data?

  • Front Matter
  • Cite Count Icon 52
  • 10.1016/j.ophtha.2019.02.015
Navigating Social Media in #Ophthalmology
  • May 20, 2019
  • Ophthalmology
  • Edmund Tsui + 1 more

Navigating Social Media in #Ophthalmology

Save Icon
Up Arrow
Open/Close