Understanding and Reducing the Challenges Faced by Creators of Accessible Online Data Visualizations
We sought to understand and reduce the challenges creators face with making their data visualizations accessible. Specifically, we administered a formative survey of 57 creators to comprehend their challenges, perceived importance, knowledge, and prioritization of data visualization accessibility. Participants identified five interventions to minimize their challenges: Workshops, Emulators, Evaluators, Feedback Collectors, and Multi-Modal Automated Tools. Additionally, we report specifications and recommendations from 12 visualization creators for effective versions of each intervention, gathered via semi-structured interviews. Utilizing our findings, such as a “mini-survey” format that is effective for collecting accessibility-related feedback from screen-reader users, we implemented and integrated these interventions into VoxLens (Sharif et al., 2022). We assessed our enhancements through a task-based user study with 10 visualization creators, finding 44%, 17%, and 12% improvements in their understanding of screen-reader users’ challenges with data visualizations, knowledge of visualization accessibility, and perceived usefulness of the enhanced VoxLens, respectively.
- Research Article
20
- 10.1109/tpds.2011.256
- Jun 1, 2012
- IEEE Transactions on Parallel and Distributed Systems
Current scientific applications have been producing large amounts of data. The processing, handling and analysis of such data require large-scale computing infrastructures such as clusters and grids. In this area, studies aim at improving the performance of data-intensive applications by optimizing data accesses. In order to achieve this goal, distributed storage systems have been considering techniques of data replication, migration, distribution, and access parallelism. However, the main drawback of those studies is that they do not take into account application behavior to perform data access optimization. This limitation motivated this paper which applies strategies to support the online prediction of application behavior in order to optimize data access operations on distributed systems, without requiring any information on past executions. In order to accomplish such a goal, this approach organizes application behaviors as time series and, then, analyzes and classifies those series according to their properties. By knowing properties, the approach selects modeling techniques to represent series and perform predictions, which are, later on, used to optimize data access operations. This new approach was implemented and evaluated using the OptorSim simulator, sponsored by the LHC-CERN project and widely employed by the scientific community. Experiments confirm this new approach reduces application execution time in about 50 percent, specially when handling large amounts of data.
- Research Article
30
- 10.3390/s21072353
- Mar 28, 2021
- Sensors (Basel, Switzerland)
Obesity is a major public health problem worldwide, and the prevalence of childhood obesity is of particular concern. Effective interventions for preventing and treating childhood obesity aim to change behaviour and exposure at the individual, community, and societal levels. However, monitoring and evaluating such changes is very challenging. The EU Horizon 2020 project “Big Data against Childhood Obesity (BigO)” aims at gathering large-scale data from a large number of children using different sensor technologies to create comprehensive obesity prevalence models for data-driven predictions about specific policies on a community. It further provides real-time monitoring of the population responses, supported by meaningful real-time data analysis and visualisations. Since BigO involves monitoring and storing of personal data related to the behaviours of a potentially vulnerable population, the data representation, security, and access control are crucial. In this paper, we briefly present the BigO system architecture and focus on the necessary components of the system that deals with data access control, storage, anonymisation, and the corresponding interfaces with the rest of the system. We propose a three-layered data warehouse architecture: The back-end layer consists of a database management system for data collection, de-identification, and anonymisation of the original datasets. The role-based permissions and secured views are implemented in the access control layer. Lastly, the controller layer regulates the data access protocols for any data access and data analysis. We further present the data representation methods and the storage models considering the privacy and security mechanisms. The data privacy and security plans are devised based on the types of collected personal, the types of users, data storage, data transmission, and data analysis. We discuss in detail the challenges of privacy protection in this large distributed data-driven application and implement novel privacy-aware data analysis protocols to ensure that the proposed models guarantee the privacy and security of datasets. Finally, we present the BigO system architecture and its implementation that integrates privacy-aware protocols.
- Conference Article
1
- 10.1117/12.2081926
- Mar 17, 2015
- Proceedings of SPIE, the International Society for Optical Engineering/Proceedings of SPIE
International audience
- Research Article
- 10.3897/aca.8.e152953
- May 28, 2025
- ARPHA Conference Abstracts
The Theia/OZCAR Information System (IS) aims to facilitate the discovery and reuse of in-situ data documenting the continental surfaces collected by French research organizations and their foreign partners, historically managed and disseminated in different databases and portals built independently by different communities. This includes data from the 22 observatories of the French Critical Zone Research Infrastructure, OZCAR-RI, which document the different environmental compartments of the critical zone. Their data were historically documented and distributed by information systems that used their own vocabulary and format for sharing data. The challenge for the Theia/OZCAR IS (Braud et al. 2020) is to federate this heterogeneous data (more than 300 variables), managed by different communities, into a common system and make it FAIR (Findable, Accessible, Interoperable, Reusable ; Wilkinson et al. 2016) for both humans and machines. This effort that started in 2017 is a contribution to national and European ecosystems for sharing and analysing multidisciplinary Earth system data that are being set up. At national level, the Theia/OZCAR IS is part of the French Earth System Data Terra Research Infrastructure (RI), which aims to provide access and analysis services for data from the entire Earth system and to strengthen interdisciplinary research. It also plays a role in the national research and innovation program OneWater and its OneWater Data Platform, which aims to improve access and interoperability of water data from French research and operational services. At the European level, Theia/OZCAR IS contributes to eLTER-RI data catalogue. The Theia/OZCAR IS was co-constructed with the scientific community, data producers and IT teams involved in data management. The needs expressed by scientific users were to be able to discover data by variable name and to download data from different producers in the same archive, with harmonised formats. Data producers also expressed the need for statistical reports on data use and/or a monitoring dashboard interface. The system has been designed taking into account that the FAIR principles apply to both metadata and data. A common data model, called ”pivot model”, has been defined to harmonise and standardise the description of data between the different data producers, from the granularity of dataset down to observation (time series of a variable at one location) and to set up information fluxes between the observatories’ information systems and the Theia/OZCAR IS (Fig. 1). It is based on several metadata standards (ISO 19115, O&M, DataCite). The Theia/OZCAR thesaurus, a controlled vocabulary for variable names and objects of interest, has been developed to support efficient data discovery and interoperability services. Its objectives are: to enable a user-friendly data discovery service thanks to simplified variable names, to offer precise descriptions of observations thanks to detailed variable names, to enable the implementation of standardised data exchange services that requires concepts describing the act of observation (observed variable and characteristics involved in sampling). The Theia/OZCAR thesaurus is published on the Web according to the FAIR principles: It is formally described using knowledge representation language : it implements the SKOS and I-ADOPT framework ontologies (Coussot et al. 2024), which allows variable names to be broken down into atomic elements (with at least one Property being measured and one Entity being observed) and to make it easier to establish semantic alignments with terms from other international disciplinary thesauri (alignments with EnvThes and GEMET are carried out); It is indexed in a searchable resource with access for both human (skosmos interface) and machines (SPARQL endpoint); The vocabulary and each of its terms have unique persistent web identifiers; Metadata for the vocabulary (license, provenance) and the terms (definition, synonyms) are provided and are sufficiently precise to enable users to understand what each term means. It is formally described using knowledge representation language : it implements the SKOS and I-ADOPT framework ontologies (Coussot et al. 2024), which allows variable names to be broken down into atomic elements (with at least one Property being measured and one Entity being observed) and to make it easier to establish semantic alignments with terms from other international disciplinary thesauri (alignments with EnvThes and GEMET are carried out); It is indexed in a searchable resource with access for both human (skosmos interface) and machines (SPARQL endpoint); The vocabulary and each of its terms have unique persistent web identifiers; Metadata for the vocabulary (license, provenance) and the terms (definition, synonyms) are provided and are sufficiently precise to enable users to understand what each term means. The Theia/OZCAR IS consists of several parts (Fig. 2): a data ingestion module, and data access and discovery services including a terminology service. The data ingestion module enriches the observation information supplied by producers with an additional harmonised vocabulary. For data discovery and access, the Theia/OZCAR IS includes a data portal with faceted search. Data visualisation, and APIs for data download and usage statistics services are currently been developed. The download service will make it possible to select several observations from different producers and to download the data at once in two open, self-describing formats (CSV and NetCDF) from a object-based S3 storage repository. To ensure interoperability of data for machines, a standardised data catalogue service (OGC CSW) is already implemented and used by Data Terra RI’s data sharing systems. A standardised data exchange service based on SensorThings API is under construction. The observations and datasets are indexed in a searchable resource with human (data portal) and machine (OGC data catalogue service, SensorThings API) access and are described by rich metadata with a plurality of precise attributes. As the data and metadata refer to terms from a vocabulary that follows the FAIR principles, this contributes to the semantic interoperability of the data. Data license and data provenance are provided with the data. Improving the fairness of data involves not only implementing semantic and technical features, but also supporting communities to adopt FAIR practices. Seminars on Data Management Plans and open licenses have been organised to encourage data producers to describe their data management and documentation practices and to associate data with a clear license. Supporting communities as they move towards FAIR data, and providing them with the human resources they need to make progress, remains one of the major challenges of the project.
- Conference Article
6
- 10.1117/12.381757
- Apr 6, 2000
- Proceedings of SPIE, the International Society for Optical Engineering/Proceedings of SPIE
Data mining and knowledge discovery in databases are providing means to analyze and discover new knowledge from large datasets. The growth of the Internet has provided the average user with the ability to more easily access and gather data. Many of the existing data mining tools require users to have advanced knowledge. New graphical-based tools are needed to allow the average user to easily and quickly discover new patterns and trends from heterogenous data. SAIC is developing an agent-based data mining tool called AgentMiner<SUB>tm</SUB> as part of an internal research project. AgentMiner<SUB>tm</SUB> will allow the user to perform advanced information retrieval and data mining to discover patterns and relationships across multiple distributed, heterogeneous data sources. The current system prototype utilizes an ontology to define common concepts and data elements that are contained in the distributed data sources. AgentMiner<SUB>tm</SUB> can access data from relational databases, structured text, web pages, and open text sources. It is a Java-based application that contains a suite of graphical tools such as the Mission Manager, Graphical Ontology Builder (GOB), and Qualified English Interpreter (QEI). In addition, AgentMiner<SUB>tm</SUB> provides the capability to support both 2-D and 3-D data visualization, including animation across a selected independent variable.
- Research Article
2
- 10.5204/mcj.1427
- Aug 15, 2018
- M/C Journal
This article reflects on part of a three-year battle over the redevelopment of an iconic Melbourne music venue, the Palace-Metro Nightclub (the Palace), involving the tactical use of Facebook Page data at trial. We were invited by the Save the Palace group, Melbourne City Council and the National Trust of Australia to provide Facebook Page data analysis as evidence of the social value of the venue at an appeals trial heard at the Victorian Civil Administration Tribunal (VCAT) in 2016. We take a reflexive ethnographic approach here to explore the data production, collection and analysis processes as these represent and constitute a “data public”.Although the developers won the appeal and were able to re-develop the site, the court accepted the validity of social media data as evidence of the building’s social value (Jinshan Investment Group Pty Ltd v Melbourne CC [2016] VCAT 626, 117; see also Victorian Planning Reports). Through the case, we elaborate on the concept of data publics by considering the “affordising” (Pollock) processes at play when extracting, analysing and visualising social media data. Affordising refers to the designed, deliberate and incidental effects of datafication and highlights the need to attend to the capacities for data collection and processing as they produce particular analytical outcomes. These processes foreground the compositional character of data publics, and the unevenness of data literacies (McCosker “Data Literacies”; Gray et al.) as a factor of the interpersonal and institutional capacity to read and mobilise data for social outcomes.We begin by reconsidering the often-assumed connection between social media data and their publics. Taking onboard theoretical accounts of publics as problem-oriented (Dewey) and dynamically constituted (Kelty), we conceptualise data publics through the key elements of a) consequentiality, b) sufficient connection over time, c) affective or emotional qualities of connection and interaction with the events. We note that while social data analytics may be a powerful tool for public protest, it equally affords use against public interests and introduces risks in relation to a lack of transparency, access or adequate data literacy.Urban Protest and Data Publics There are many examples globally of the use of social media to engage publics in battles over urban development or similar issues (e.g. Fredericks and Foth). Some have asked how social media might be better used by neighborhood organisations to mobilise protest and save historic buildings, cultural landmarks or urban sites (Johnson and Halegoua). And we can only note here the wealth of research literature on social movements, protest and social media. To emphasise Gerbaudo’s point, drawing on Mattoni, we “need to account for how exactly the use of these media reshapes the ‘repertoire of communication’ of contemporary movements and affects the experience of participants” (2). For us, this also means better understanding the role that social data plays in both aiding and reshaping urban protest or arming third sector groups with evidence useful in social institutions such as the courts.New modes of digital engagement enable forms of distributed digital citizenship, which Meikle sees as the creative political relationships that form through exercising rights and responsibilities. Associated with these practices is the transition from sanctioned, simple discursive forms of social protest in petitions, to new indicators of social engagement in more nuanced social media data and the more interactive forms of online petition platforms like change.org or GetUp (Halpin et al.). These technical forms code publics in specific ways that have implications for contemporary protest action. That is, they provide the operational systems and instructions that shape social actions and relationships for protest purposes (McCosker and Milne).All protest and social movements are underwritten by explicit or implicit concepts of participatory publics as these are shaped, enhanced, or threatened by communication technologies. But participatory protest publics are uneven, and as Kelty asks: “What about all the people who are neither protesters nor Twitter users? In the broadest possible sense this ‘General Public’ cannot be said to exist as an actual entity, but only as a kind of virtual entity” (27). Kelty is pointing to the porous boundary between a general public and an organised public, or formal enterprise, as a reminder that we cannot take for granted representations of a public, or the public as a given, in relation to Like or follower data for instance.If carefully gauged, the concept of data publics can be useful. To start with, the notions of publics and publicness are notoriously slippery. Baym and boyd explore the differences between these two terms, and the way social media reconfigures what “public” is. Does a Comment or a Like on a Facebook Page connect an individual sufficiently to an issues-public? As far back as the 1930s, John Dewey was seeking a pragmatic approach to similar questions regarding human association and the pluralistic space of “the public”. For Dewey, “the machine age has so enormously expanded, multiplied, intensified and complicated the scope of the indirect consequences [of human association] that the resultant public cannot identify itself” (157). To what extent, then, can we use data to constitute a public in relation to social protest in the age of data analytics?There are numerous well formulated approaches to studying publics in relation to social media and social networks. Social network analysis (SNA) determines publics, or communities, through links, ties and clustering, by measuring and mapping those connections and to an extent assuming that they constitute some form of sociality. Networked publics (Ito, 6) are understood as an outcome of social media platforms and practices in the use of new digital media authoring and distribution tools or platforms and the particular actions, relationships or modes of communication they afford, to use James Gibson’s sense of that term. “Publics can be reactors, (re)makers and (re)distributors, engaging in shared culture and knowledge through discourse and social exchange as well as through acts of media reception” (Ito 6). Hashtags, for example, facilitate connectivity and visibility and aid in the formation and “coordination of ad hoc issue publics” (Bruns and Burgess 3). Gray et al., following Ruppert, argue that “data publics are constituted by dynamic, heterogeneous arrangements of actors mobilised around data infrastructures, sometimes figuring as part of them, sometimes emerging as their effect”. The individuals of data publics are neither subjugated by the logics and metrics of digital platforms and data structures, nor simply sovereign agents empowered by the expressive potential of aggregated data (Gray et al.).Data publics are more than just aggregates of individual data points or connections. They are inherently unstable, dynamic (despite static analysis and visualisations), or vibrant, and ephemeral. We emphasise three key elements of active data publics. First, to be more than an aggregate of individual items, a data public needs to be consequential (in Dewey’s sense of issues or problem-oriented). Second, sufficient connection is visible over time. Third, affective or emotional activity is apparent in relation to events that lend coherence to the public and its prevailing sentiment. To these, we add critical attention to the affordising processes – or the deliberate and incidental effects of datafication and analysis, in the capacities for data collection and processing in order to produce particular analytical outcomes, and the data literacies these require. We return to the latter after elaborating on the Save the Palace case.Visualising Publics: Highlighting Engagement and IntensityThe Palace theatre was built in 1912 and served as a venue for theatre, cinema, live performance, musical acts and as a nightclub. In 2014 the Heritage Council decided not to include the Palace on Victoria’s heritage register and hence opened the door for developers, but Melbourne City Council and the National Trust of Australia opposed the redevelopment on the grounds of the building’s social significance as a music venue. Similarly, the Save the Palace group saw the proposed redevelopment as affecting the capacity of Melbourne CBD to host medium size live performances, and therefore impacting deeply on the social fabric of the local music scene. The Save the Palace group, chaired by Rebecca Leslie and Michael Raymond, maintained a 36,000+ strong Facebook Page and mobilised local members through regular public street protests, and participated in court proceedings in 2015 and February 2016 with Melbourne City Council and National Trust Australia. Joining the protesters in the lead up to the 2016 appeals trial, we aimed to use social media engagement data to measure, analyse and present evidence of the extent and intensity of a sustained protest public. The evidence we submitted had to satisfy VCAT’s need to establish the social value of the building and the significance of its redevelopment, and to explain: a) how social media works; b) the meaning of the number of Facebook Likes on the Save The Palace Page and the timing of those Likes, highlighting how the reach and Likes pick up at significant events; and c) whether or not a representative sample of Comments are supportive of the group and the Palace Theatre (McCosker “Statement”). As noted in the case (Jinshan, 117), where courts have traditionally relied on one simple measure for contemporary social value – the petition – our aim was to make use of the richer measures available through social media data, to better represent sustained engagement with the issues over time.Visualising a protest public in this way raises two significant problems for a workable concept of data p
- Research Article
6
- 10.5281/zenodo.8072
- Jan 19, 2013
- British Journal of Environment and Climate Change
Aims: Global change studies need to manipulate large volume of observation and prediction data, most likely from multiple sources. From the researchers’ perspective, the whole research process consists of the follow stages: data discovery, data access, data processing, data analysis and result dissemination. The aim of paper is to review the state-of-the-art of geospatial data systems to reveal the way towards a better support of global change studies. Methodology: This paper reviews the capabilities of exemplar geospatial data systems. It further analyzes the needs of manipulating large volume of diverse data when performing global change studies. By comparing the available capabilities with the real needs, this study shows the strengths and limitations of existing data systems when supporting global change studies. Results: The analysis shows that data systems are helpful for researchers to fulfill data discovery and access, while most of them do not provide further functionalities to cover other stages in the whole research process. This suggests that a new generation of data systems is highly needed to provide efficient and enough support for scientists to perform global change studies. Instead of simply moving data from sources to researchers’ local archives, it will enable more on-line data manipulation functionality and the interoperability of data and systems. Review Article British Journal of Environment & Climate Change, 2(4): 421-436, 2012 422 Conclusion: Traditional geospatial data systems are designed to operate locally without built-in interoperability and sharing capabilities. Such systems are operated under the paradigm of “everything-locally-owned-and-operated”. Conducting global change studies using such a system requires moving a large volume of data from providers’ sites to researchers’ site. Such a system does not provide strong support for the entire research process. Since climate research requires manipulating a huge volume of complex and diverse multi-source data, a new paradigm of “everything-shared-over-the-Web” is promising when designing a new generation of geospatial data systems, which are standard-based, interoperable, and sharable, for global change studies.
- Research Article
6
- 10.1016/j.nexus.2023.100259
- Dec 4, 2023
- Energy Nexus
Guiding the data collection for integrated Water-Energy-Food-Environment systems using a pilot smallholder farm in Costa Rica
- Research Article
6
- 10.1515/popets-2016-0023
- May 6, 2016
- Proceedings on Privacy Enhancing Technologies
The ability of an Internet user to access data collected about himself as a result of his online activity is a key privacy safeguard. Online, data access has been overshadowed by other protections such as notice and choice. This paper describes attitudes about data access. 873 US and Irish Internet users participated in a survey designed to examine views on data access to information held by online companies and data brokers. We observed low levels of awareness of access mechanisms along with a high desire for access in both participant groups. We tested three proposed access systems in keeping with industry programs and regulatory proposals. User response was positive. We conclude that access remains an important privacy protection that is inadequately manifested in practice. Our study provides insight for lawmakers and policymakers, as well as computer scientists who implement these systems.
- Conference Article
2
- 10.1117/12.2016325
- May 23, 2013
- Proceedings of SPIE, the International Society for Optical Engineering/Proceedings of SPIE
Current approaches to satellite observation data storage and distribution implement separate visualization and data access methodologies which often leads to the need in time consuming data ordering and coding for applications requiring both visual representation as well as data handling and modeling capabilities. We describe an approach we implemented for a data-encoded web map service based on storing numerical data within server map tiles and subsequent client side data manipulation and map color rendering. The approach relies on storing data using the lossless compression Portable Network Graphics (PNG) image data format which is natively supported by web-browsers allowing on-the-fly browser rendering and modification of the map tiles. The method is easy to implement using existing software libraries and has the advantage of easy client side map color modifications, as well as spatial subsetting with physical parameter range filtering. This method is demonstrated for the ASTER-GDEM elevation model and selected MODIS data products and represents an alternative to the currently used storage and data access methods. One additional benefit includes providing multiple levels of averaging due to the need in generating map tiles at varying resolutions for various map magnification levels. We suggest that such merged data and mapping approach may be a viable alternative to existing static storage and data access methods for a wide array of combined simulation, data access and visualization purposes.
- Preprint Article
- 10.5194/egusphere-egu25-18651
- Mar 18, 2025
The European Network for Earth System Modelling Research Infrastructure (ENES-RI) is acornerstone of climate science, providing essential datasets for understanding andaddressing climate change. However, the growing complexity and volume of climate modeldatasets pose challenges that demand innovative, interdisciplinary solutions. To addressthese challenges, "ENES-RI" is being integrated into the Framework of Integrated ResearchInfrastructure Services for Climate Change Risks (IRISCC), establishing a unified ecosystemof Research Infrastructures for data access, processing, and analysis.This integration introduces three key advancements:1. Harmonized data access and authentication: Federated systems ensure secure,standardized global access while maintaining data integrity and compliance withmanagement policies.2. Data-proximate processing services: On-site data analysis minimizes large-scaletransfers, improving efficiency, and supporting high-performance workflows.3. An integrated services platform leveraging JupyterHub: This platform combinesstreamlined data access, computational tools, and visualization capabilities enablingcollaborative and interdisciplinary research across diverse domains.A central objective is to incorporate ENES-RI into the IRISCC services catalog, enablingseamless discovery and utilization of distributed climate research resources. This effortfosters collaboration, streamlines workflows, and addresses challenges in managing large-scale climate data. Practical use cases illustrate how this framework empowers researchersto conduct advanced climate risk assessments and contribute to global mitigation efforts.This integration represents a pivotal advance toward a more efficient, collaborative, andimpactful research ecosystem for addressing climate change.
- Research Article
3
- 10.1016/j.fusengdes.2016.06.008
- Jun 15, 2016
- Fusion Engineering and Design
J-TEXT WebScope: An efficient data access and visualization system for long pulse fusion experiment
- Conference Article
4
- 10.2514/6.1997-2087
- Jun 29, 1997
Recent advances in unsteady flow visualization
- Research Article
111
- 10.5204/mcj.561
- Oct 11, 2012
- M/C Journal
Lists and Social MediaLists have long been an ordering mechanism for computer-mediated social interaction. While far from being the first such mechanism, blogrolls offered an opportunity for bloggers to provide a list of their peers; the present generation of social media environments similarly provide lists of friends and followers. Where blogrolls and other earlier lists may have been user-generated, the social media lists of today are more likely to have been produced by the platforms themselves, and are of intrinsic value to the platform providers at least as much as to the users themselves; both Facebook and Twitter have highlighted the importance of their respective “social graphs” (their databases of user connections) as fundamental elements of their fledgling business models. This represents what Mejias describes as “nodocentrism,” which “renders all human interaction in terms of network dynamics (not just any network, but a digital network with a profit-driven infrastructure).”The communicative content of social media spaces is also frequently rendered in the form of lists. Famously, blogs are defined in the first place by their reverse-chronological listing of posts (Walker Rettberg), but the same is true for current social media platforms: Twitter, Facebook, and other social media platforms are inherently centred around an infinite, constantly updated and extended list of posts made by individual users and their connections.The concept of the list implies a certain degree of order, and the orderliness of content lists as provided through the latest generation of centralised social media platforms has also led to the development of more comprehensive and powerful, commercial as well as scholarly, research approaches to the study of social media. Using the example of Twitter, this article discusses the challenges of such “big data” research as it draws on the content lists provided by proprietary social media platforms.Twitter Archives for ResearchTwitter is a particularly useful source of social media data: using the Twitter API (the Application Programming Interface, which provides structured access to communication data in standardised formats) it is possible, with a little effort and sufficient technical resources, for researchers to gather very large archives of public tweets concerned with a particular topic, theme or event. Essentially, the API delivers very long lists of hundreds, thousands, or millions of tweets, and metadata about those tweets; such data can then be sliced, diced and visualised in a wide range of ways, in order to understand the dynamics of social media communication. Such research is frequently oriented around pre-existing research questions, but is typically conducted at unprecedented scale. The projects of media and communication researchers such as Papacharissi and de Fatima Oliveira, Wood and Baughman, or Lotan, et al.—to name just a handful of recent examples—rely fundamentally on Twitter datasets which now routinely comprise millions of tweets and associated metadata, collected according to a wide range of criteria. What is common to all such cases, however, is the need to make new methodological choices in the processing and analysis of such large datasets on mediated social interaction.Our own work is broadly concerned with understanding the role of social media in the contemporary media ecology, with a focus on the formation and dynamics of interest- and issues-based publics. We have mined and analysed large archives of Twitter data to understand contemporary crisis communication (Bruns et al), the role of social media in elections (Burgess and Bruns), and the nature of contemporary audience engagement with television entertainment and news media (Harrington, Highfield, and Bruns). Using a custom installation of the open source Twitter archiving tool yourTwapperkeeper, we capture and archive all the available tweets (and their associated metadata) containing a specified keyword (like “Olympics” or “dubstep”), name (Gillard, Bieber, Obama) or hashtag (#ausvotes, #royalwedding, #qldfloods). In their simplest form, such Twitter archives are commonly stored as delimited (e.g. comma- or tab-separated) text files, with each of the following values in a separate column: text: contents of the tweet itself, in 140 characters or less to_user_id: numerical ID of the tweet recipient (for @replies) from_user: screen name of the tweet sender id: numerical ID of the tweet itself from_user_id: numerical ID of the tweet sender iso_language_code: code (e.g. en, de, fr, ...) of the sender’s default language source: client software used to tweet (e.g. Web, Tweetdeck, ...) profile_image_url: URL of the tweet sender’s profile picture geo_type: format of the sender’s geographical coordinates geo_coordinates_0: first element of the geographical coordinates geo_coordinates_1: second element of the geographical coordinates created_at: tweet timestamp in human-readable format time: tweet timestamp as a numerical Unix timestampIn order to process the data, we typically run a number of our own scripts (written in the programming language Gawk) which manipulate or filter the records in various ways, and apply a series of temporal, qualitative and categorical metrics to the data, enabling us to discern patterns of activity over time, as well as to identify topics and themes, key actors, and the relations among them; in some circumstances we may also undertake further processes of filtering and close textual analysis of the content of the tweets. Network analysis (of the relationships among actors in a discussion; or among key themes) is undertaken using the open source application Gephi. While a detailed methodological discussion is beyond the scope of this article, further details and examples of our methods and tools for data analysis and visualisation, including copies of our Gawk scripts, are available on our comprehensive project website, Mapping Online Publics.In this article, we reflect on the technical, epistemological and political challenges of such uses of large-scale Twitter archives within media and communication studies research, positioning this work in the context of the phenomenon that Lev Manovich has called “big social data.” In doing so, we recognise that our empirical work on Twitter is concerned with a complex research site that is itself shaped by a complex range of human and non-human actors, within a dynamic, indeed volatile media ecology (Fuller), and using data collection and analysis methods that are in themselves deeply embedded in this ecology. “Big Social Data”As Manovich’s term implies, the Big Data paradigm has recently arrived in media, communication and cultural studies—significantly later than it did in the hard sciences, in more traditionally computational branches of social science, and perhaps even in the first wave of digital humanities research (which largely applied computational methods to pre-existing, historical “big data” corpora)—and this shift has been provoked in large part by the dramatic quantitative growth and apparently increased cultural importance of social media—hence, “big social data.” As Manovich puts it: For the first time, we can follow [the] imaginations, opinions, ideas, and feelings of hundreds of millions of people. We can see the images and the videos they create and comment on, monitor the conversations they are engaged in, read their blog posts and tweets, navigate their maps, listen to their track lists, and follow their trajectories in physical space. (Manovich 461) This moment has arrived in media, communication and cultural studies because of the increased scale of social media participation and the textual traces that this participation leaves behind—allowing researchers, equipped with digital tools and methods, to “study social and cultural processes and dynamics in new ways” (Manovich 461). However, and crucially for our purposes in this article, many of these scholarly possibilities would remain latent if it were not for the widespread availability of Open APIs for social software (including social media) platforms. APIs are technical specifications of how one software application should access another, thereby allowing the embedding or cross-publishing of social content across Websites (so that your tweets can appear in your Facebook timeline, for example), or allowing third-party developers to build additional applications on social media platforms (like the Twitter user ranking service Klout), while also allowing platform owners to impose de facto regulation on such third-party uses via the same code. While platform providers do not necessarily have scholarship in mind, the data access affordances of APIs are also available for research purposes. As Manovich notes, until very recently almost all truly “big data” approaches to social media research had been undertaken by computer scientists (464). But as part of a broader “computational turn” in the digital humanities (Berry), and because of the increased availability to non-specialists of data access and analysis tools, media, communication and cultural studies scholars are beginning to catch up. Many of the new, large-scale research projects examining the societal uses and impacts of social media—including our own—which have been initiated by various media, communication, and cultural studies research leaders around the world have begun their work by taking stock of, and often substantially extending through new development, the range of available tools and methods for data analysis. The research infrastructure developed by such projects, therefore, now reflects their own disciplinary backgrounds at least as much as it does the fundamental principles of computer science. In turn, such new and often experimental tools and methods necessarily also provoke new epistemological and methodological challenges. The Twitter API and Twitter ArchivesThe Open
- Research Article
41
- 10.1145/3557899
- Mar 29, 2023
- ACM Transactions on Accessible Computing
Data visualization has become an increasingly important means of effective data communication and has played a vital role in broadcasting the progression of COVID-19. Accessible data representations, however, have lagged behind, leaving areas of information out of reach for many blind and visually impaired (BVI) users. In this work, we sought to understand (1) the accessibility of current implementations of visualizations on the web; (2) BVI users’ preferences and current experiences when accessing data-driven media; (3) how accessible data representations on the web address these users’ access needs and help them navigate, interpret, and gain insights from the data; and (4) the practical challenges that limit BVI users’ access and use of data representations. To answer these questions, we conducted a mixed-methods study consisting of an accessibility audit of 87 data visualizations on the web to identify accessibility issues, an online survey of 127 screen reader users to understand lived experiences and preferences, and a remote contextual inquiry with 12 of the survey respondents to observe how they navigate, interpret, and gain insights from accessible data representations. Our observations during this critical period of time provide an understanding of the widespread accessibility issues encountered across online data visualizations, the impact that data accessibility inequities have on the BVI community, the ways screen reader users sought access to data-driven information and made use of online visualizations to form insights, and the pressing need to make larger strides towards improving data literacy, building confidence, and enriching methods of access. Based on our findings, we provide recommendations for researchers and practitioners to broaden data accessibility on the web.