Big data or big (privacy) problem?
Nowadays, almost all of us transfer data (especially personal data) to someone or something (usually through web pages), and we do not precisely know how those data are managed. All of those data, considered altogether, represent what is generally called “Big Data”. Big data offer a big opportunity of growth for probably all human activities, since they make available important data collected in distant places (both in space and time) virtually to everybody who has access to the Internet. On the other hand, big data may also present problems that, if not properly addressed, may yield legal consequences to those who have been implied in data collection, validation, publication and utilization. This paper has covered the most important and, up to now, the most widely considered consequence: privacy. It has also warned that the lack of validation of the posted data and their incautious use may expose liable subjects to legal consequences. To make things even more complex, it has also shown that the potentially liable subjects have not yet been fully identified. Therefore, caution is mandatory when dealing with big data.
- Research Article
3
- 10.52214/vib.v7i.8403
- Jun 2, 2021
- Voices in Bioethics
Photo by Josh Riemer on Unsplash
 Introduction
 With the rapid advancements in neurotechnological machinery and improved analytical insights from machine learning in neuroscience, the availability of big brain data has increased tremendously. Neurological health research is done using digitized brain data.[1] There must be adequate data governance to secure the privacy of subjects participating in brain research and treatments. If not properly regulated, the research methods could lead to significant breaches of the subject’s autonomy and privacy. This paper will address the necessity for neuroprotection laws, which effectively govern the use of big brain data to ensure respect for patient privacy and autonomy.
 Background
 Artificial intelligence and machine learning can be integrated with neuroscience big brain data to drive research studies. This integrative technology allows patterns of electrical activity in neurons to be studied in detail.[2]Specifically, it uses a robotic system which can reason, plan, and exhibit biologically intelligent behavior. Machine learning is a method of computer programming where the code can adapt its behavior based on big brain data.[3] The big brain data is the collection of large amounts of information for the purpose of deciphering patterns through computer analysis using machine learning.[4] The information that these technologies provide is extensive enough to allow a researcher to read a patient’s mind. AI and machine learning technologies work by finding the underlying structure of brain data, which is then described by patterns known as latent factors, eventually resulting in an understanding of the brain’s temporal dynamics.[5]
 Through these technologies, researchers are able to decipher how the human brain computes its performances and thoughts. However, due to the extensive and complex nature of the data processed through AI and machine learning, researchers may gain access to personal information a patient may not wish to reveal. From a bioethical lens, tensions arise in the realm of patient autonomy. Patients are not able to control the transmission of data from their brains that is analyzed by researchers. Governing brain data through laws may enhance the extent of patient privacy in the case where brain data is being used through AI technologies.[6] A responsible approach to governing brain data would require a sophisticated legal structure.
 Analysis
 Impact on Patient Autonomy and Privacy 
 In research pertaining to big brain data, the consent forms do not fully cover the vast amounts of information that is collected. According to research, personal data has become the most sought out commodity to provide content to corporations and the web-based service industry. Unfortunately, data leaks that release private information frequently occur.[7] The storage of an individual’s data on technologies accessible on the internet during research studies makes it vulnerable to leaks, jeopardizing an individual’s privacy. These data leaks may cause the patient to be identified easily, as the degree of information provided by AI technologies are personalized and may be decoded through brain fingerprinting methods.[8]
 There has been an extensive growth in the development and use of AI. It is efficient in providing information to radiologists who diagnose various diseases including brain cancer and psychiatric disease, and AI assists in the delivery of telemedicine.[9] However, the ethical pitfall of reduced patient autonomy must be addressed by analyzing current AI technologies and creating more options for patient preference in how the data may be used. For instance, facial recognition technology[10] commonly used in health care produces more information than listed in common consent forms, threatening to undermine informed consent. Facial recognition software collects extensive data and may disclose more information than a person would prefer to provide despite being a useful tool for diagnosing medical and genetic conditions.[11] In addition, people may not be aware that their images are being used to generate more clinical data for other purposes. It is difficult to guarantee the data is anonymized. Consent requirements must include informing people about the complexity of the potential uses of the data; software developers should maximize patient privacy.[12] Furthermore, there is a “human element” in the use of AI technologies as medical providers control the use and the extent to which data is captured or accessed through the AI technologies.[13] People must understand the scope of the technology and have clear communication with the physician or health care provider about how the medical information will be used. 
 Existing Laws for Brain Data Governance 
 A strict system of defined legal responsibilities of medical providers will ensure a higher degree of patient privacy and autonomy when AI technologies and data from machine learning are used. Governing specific algorithmic data is crucial in safeguarding a patient’s privacy and developing a gold standard treatment protocol following the procurement of the information.[14] Certain AI technologies provide more data than others, and legal boundaries should be established to ensure strong performance, quality control, and scope for patient privacy and autonomy. For instance, currently AI technologies are being used in the realm of intensive neurological care. However, there is a significant level of patient uncertainty about how much control patients have over the data’s uses.[15] Calibrated legal and ethical standards will allow important brain data to be securely governed and monitored.
 Once brain signals are recorded and processed from one individual, the data may be merged with other data in Brain Computer Interface Technology (BCI).[16] To ensure a right and ability to retrieve personal data or pull it from the collection, specific regulations for varying types of data are needed.[17] The importance of consent and patient privacy must be considered through giving patients a transparent view of how brain data is governed.[18] The legal system must address discriminatory issues and risks to patients whose data is used in studies. Laws like the General Data Protection Regulation (GDPR) and the California Consumer Privacy Protection Act (CCPA) can serve as effective models to protect aggregated data. These laws govern consumer information and ensure the compliance when personal data is collected.[19] California voters recently approved expansion of the CCPA to health data. The Washington Privacy Act, which would have provided rights to access, change, and withdraw personal data, failed to pass. Other states should improve privacy as well,[20] although a federal bill would be preferable. Scientists at the Heidelberg Academy of Sciences argue for data security to be governed in a manner that balances patient privacy and autonomy with the commercial interests of researchers.[21] The balance could be achieved through privacy protections like those in the Washington Privacy Act. Although the Health Insurance Portability and Accountability Act (HIPAA) provides an overall framework to deter the likelihood of dangers to patient protection and privacy, more thorough laws are warranted to combat pervasive data transfer and analysis that technology has brought to the health care industry.[22] Breaches of patient privacy under current HIPAA regulations include releasing patient information to a reporter without their consent and sending HIV data to a patient’s employer without consent.[23] HIPAA does not cover information being shared with outside contractors who do not have an agreement with technology companies to keep patient data confidential. HIPAA regulations also do not always address blatant breaches on patient data confidentiality.[24] Patients must be provided with methods to monitor the data being analyzed to be able to view the extent of private information being generated via AI technologies. In health research, the medical purposes of better diagnosis, earlier detection of diseases, or prevention are ethical justifications for the use of the data if it was collected with permission, the person understood and approved the uses of the data, and the data was deidentified.
 A standard governance framework is required in providing the fairest system of care to patients who allow their brain data to be examined. Informed consent in the neuroscience field could reaffirm the privacy and autonomy of patients by ensuring that they understand the type of information collected. Laws also could protect data after a patient’s death. Malpractice in the scope of brain data could give people a cause of action critical in safeguarding patient’s rights. Data breach lawsuits will become common but generally do not cover deidentified data that becomes part of big data collection. A more synchronized approach to the collection and consent process will encourage an understanding of how big data is used to diagnose and treat patients. Some altruistic people may even be more likely to consent if they know the largescale data collection is helpful to treat and diagnose people. Others should have the ability to opt out of sharing neurological data, especially when there is not certainty surrounding deidentification.[25]
 Conclusion
 Artificial intelligence and machine learning technologies have the potential to aid in the diagnosis and treatment of people globally by extracting and aggregating brain data specific to individuals. However, the secure use of the data is necessary to build trust between care providers and patients, as well as in balancing the bioethical principles of beneficence and patient autonomy. We must ensure the highest quality of care to patients, while protecting their privacy, informed consent, and clinical trust. More sophis
- Research Article
- 10.13088/jiis.2016.22.4.109
- Dec 31, 2016
- Journal of Intelligence and Information Systems
최근 빅데이터 분석은 기업과 전문가뿐만 아니라 개인이나 비전문가들도 큰 관심을 갖는 분야로 발전하였다. 그에 따라 현재 공개된 데이터 또는 직접 수집한 이터를 분석하여 마케팅, 사회적 문제 해결 등에 활용되고 있다. 국내에서도 다양한 기업들과 개인이 빅데이터 분석에 도전하고 있지만 빅데이터 공개의 제한과 수집의 어려움으로 분석 초기 단계에서부터 어려움을 겪고 있다. 본 논문에서는 빅데이터 공유를 방해하는 개인정보, 빅트래픽 등의 요소들에 대한 기존 연구와 사례들을 살펴보고 정책기반의 해결책이 아닌 시스템을 통해서 빅데이터 공유 제한 문제를 해결 할 수 있는 클라이언트-서버 모델을 이용해 빅데이터를 공개 및 사용 할 때 발생하는 문제점들을 해소하고 공유와 분석 활성화를 도울 수 있는 방안에 대해 기술한다. 클라이언트-서버 모델은 SPARK를 활용해 빠른 분석과 사용자 요청을 처리하며 Server Agent와 Client Agent로 구분해 데이터 제공자가 데이터를 공개할 때 서버 측의 프로세스와 데이터 사용자가 데이터를 사용하기 위한 클라이언트 측의 프로세스로 구분하여 설명한다. 특히, 빅데이터 공유, 분산 빅데이터 처리, 빅트래픽 문제에 초점을 맞추어 클라이언트-서버 모델의 세부 모듈을 구성하고 각 모듈의 설계 방법에 대해 제시하고자 한다. 클라이언트-서버 모델을 통해서 빅데이터 공유문제를 해결하고 자유로운 공유 환경을 구성하여 안전하게 빅데이터를 공개하고 쉽게 빅데이터를 찾는 이상적인 공유 서비스를 제공할 수 있다. Recently, big data analysis has developed into a field of interest to individuals and non-experts as well as companies and professionals. Accordingly, it is utilized for marketing and social problem solving by analyzing the data currently opened or collected directly. In Korea, various companies and individuals are challenging big data analysis, but it is difficult from the initial stage of analysis due to limitation of big data disclosure and collection difficulties. Nowadays, the system improvement for big data activation and big data disclosure services are variously carried out in Korea and abroad, and services for opening public data such as domestic government 3.0 (data.go.kr) are mainly implemented. In addition to the efforts made by the government, services that share data held by corporations or individuals are running, but it is difficult to find useful data because of the lack of shared data. In addition, big data traffic problems can occur because it is necessary to download and examine the entire data in order to grasp the attributes and simple information about the shared data. Therefore, We need for a new system for big data processing and utilization. First, big data pre-analysis technology is needed as a way to solve big data sharing problem. Pre-analysis is a concept proposed in this paper in order to solve the problem of sharing big data, and it means to provide users with the results generated by pre-analyzing the data in advance. Through preliminary analysis, it is possible to improve the usability of big data by providing information that can grasp the properties and characteristics of big data when the data user searches for big data. In addition, by sharing the summary data or sample data generated through the pre-analysis, it is possible to solve the security problem that may occur when the original data is disclosed, thereby enabling the big data sharing between the data provider and the data user. Second, it is necessary to quickly generate appropriate preprocessing results according to the level of disclosure or network status of raw data and to provide the results to users through big data distribution processing using spark. Third, in order to solve the problem of big traffic, the system monitors the traffic of the network in real time. When preprocessing the data requested by the user, preprocessing to a size available in the current network and transmitting it to the user is required so that no big traffic occurs. In this paper, we present various data sizes according to the level of disclosure through pre - analysis. This method is expected to show a low traffic volume when compared with the conventional method of sharing only raw data in a large number of systems. In this paper, we describe how to solve problems that occur when big data is released and used, and to help facilitate sharing and analysis. The client-server model uses SPARK for fast analysis and processing of user requests. Server Agent and a Client Agent, each of which is deployed on the Server and Client side. The Server Agent is a necessary agent for the data provider and performs preliminary analysis of big data to generate Data Descriptor with information of Sample Data, Summary Data, and Raw Data. In addition, it performs fast and efficient big data preprocessing through big data distribution processing and continuously monitors network traffic. The Client Agent is an agent placed on the data user side. It can search the big data through the Data Descriptor which is the result of the pre-analysis and can quickly search the data. The desired data can be requested from the server to download the big data according to the level of disclosure. It separates the Server Agent and the client agent when the data provider publishes the data for data to be used by the user. In particular, we focus on the Big Data Sharing, Distributed Big Data Processing, Big Traffic problem, and construct the detailed module of the client - server model and present the design method of each module. The system designed on the basis of the proposed model, the user who acquires the data analyzes the data in the desired direction or preprocesses the new data. By analyzing the newly processed data through the server agent, the data user changes its role as the data provider. The data provider can also obtain useful statistical information from the Data Descriptor of the data it discloses and become a data user to perform new analysis using the sample data. In this way, raw data is processed and processed big data is utilized by the user, thereby forming a natural shared environment. The role of data provider and data user is not distinguished, and provides an ideal shared service that enables everyone to be a provider and a user. The client-server model solves the problem of sharing big data and provides a free sharing environment to securely big data disclosure and provides an ideal shared service to easily find big data.
- Research Article
16
- 10.1002/poi3.258
- Jun 14, 2021
- Policy & Internet
The use of big data and data analytics are slowly emerging in public policy-making, and there are calls for systematic reviews and research agendas focusing on the impacts that big data and analytics have on policy processes. This paper examines the nascent field of big data and data analytics in public policy by reviewing the literature with bibliometric and qualitative analyses. The study encompassed scientific publications gathered from SCOPUS (N = 538). Nine bibliographically coupled clusters were identified, with the three largest clusters being big data's impact on the policy cycle, data-based decision-making, and productivity. Through the qualitative coding of the literature, our study highlights the core of the discussions and proposes a research agenda for further studies. 大数据和数据分析的使用已逐渐出现在公共决策中,对此需要展开系统性综述和研究议程,聚焦大数据和数据分析对政策过程产生的影响。本文通过文献计量分析和定性分析,分析了公共政策中大数据和数据分析这一新兴领域。本研究包括了从SCOPUS中选取的科学刊物(N=538)。识别了9个文献耦合簇,其中最大的三个簇分别为:大数据对政策周期的影响、基于数据的决策、生产率。通过对文献进行定性编码,我们的研究强调了探讨的核心,并提出了进一步的研究议程。 El uso de macrodatos y análisis de datos está emergiendo lentamente en la formulación de políticas públicas, y hay pedidos de revisiones sistemáticas y agendas de investigación que se centren en los impactos que los macrodatos y el análisis tienen en los procesos de políticas. Este artículo examina el campo naciente del big data y el análisis de datos en las políticas públicas mediante la revisión de la literatura con análisis bibliométricos y cualitativos. El estudio abarcó publicaciones científicas recopiladas de SCOPUS (N = 538). Se identificaron nueve grupos acoplados bibliográficamente, siendo los tres grupos más grandes el impacto de los macrodatos en el ciclo de políticas, la toma de decisiones basada en datos y la productividad. A través de la codificación cualitativa de la literatura, nuestro estudio destaca el núcleo de las discusiones y propone una agenda de investigación para estudios posteriores. Big data and data analytics have been seen as augmenting knowledge, ultimately leading to better decision-making. Arguments such as that the broad-based use of big data and data analytics will lead to the end-of-theory speak volumes about our expectations of big data and data analytics technologies' transformative power. While industry has been leading the way to test big data and analytics, public actors have been slower to engage (Poel et al., 2018), despite an equal opportunity for big data and data analytics to augment the public policy process. Utilizing big data and data analytics has become a near necessity due to our increasing capability for creating and collecting data at an extraordinary rate. The terms "big data" and "data analytics" have been among the buzzwords of recent years, leading to an upsurge in research, industry, and government applications (Zhou et al., 2014). The increased interest in big data in public policy can be seen in the scientific literature Figure 1, highlighting the increase in big data and analytics related literature. We also see public organizations increasingly engaging with big data analytics to solve challenges like the sustainability crisis and pandemics.1 Scholarly discourse has highlighted case studies and narratives on implementing big data and data analytics in the policy process. However, the literature lacks a systematic view of the current state of big data and data analytics in public policy, and there are identifiable research gaps (Desouza & Jacob, 2017). RQ1. What are the thematic communities of big data and data analytics literature concerning public policy-making? RQ2. What are the research questions emerging under each of the thematic research communities? Our study adopted a mixed-method systematic literature review approach based on a robust empirical bibliometric analysis followed by a qualitative analysis of the core documents to answer these questions. Using a well-established bibliometric method, bibliographic coupling, we identified thematic differences within the literature, and here, we highlight points of departure from the extant literature. The bibliometric analysis was, in turn, used as a basis for the qualitative analysis of the core literature, which is used to propose a research agenda. We find nine contemporary research communities addressing different aspects of big data and data analytics in public policy. While these communities have significant overlap, our analysis identifies them drawing from different theoretical foundations. Moreover, we demonstrate three larger research strands taking different vantage points, namely building strategic capability, data-based decision-making and productivity increases. Finally, our work proposes a research agenda focusing on the role of strategic capability, data-based decision-making, how to address expectations for better services while simultaneously increasing productivity and how to leverage policy analytics and empiricism. Our results offer scholars in public policy a vantage point to the theoretical foundations of research in big data and data analytics in public policy-making. We also draw from the identified communities to highlight emerging research themes that can guide research forward. For policymakers, our results highlight the on-going scholarly debate that focuses on addressing critical issues in the adoption of big data in public policy-making, namely capability building and the extent of data-based decision-making. This article will proceed as follows: next, we review the central elements of big data in policy-making. This is followed by a description of the data and our mixed-method approach. Finally, the empirical results are described and followed by a discussion to make sense of the research themes emerging from the analysis. "Big data" is a general term used for the process of gathering massive amounts of data from different sources. Sources can include human-input data but also includes data from sensors or different types of monitoring systems that create process data while running. It is clear that we are accumulating data at a never before seen rate. Already, in 2014, the pace was staggering, with 90% of the world's data being collected during the prior 2 years and 2.5 quintillion bytes of data added each day (Kim et al., 2014). Having access to massive amounts of data has enabled significant innovation in both the public and private domains. Looking at companies like Google and Amazon, with their innovation of new services for consumers, or at the recent ability for doctors to detect cancer cells more precisely thanks to massive training data about what a cancerous cell is, we can see that we are very much on the cusp of creating a broad utility of big data and analytics. This has been seen as a shift in the Industrial Revolution's magnitude (Richards & King, 2014) and has been widely hyped in business (Margetts & Sutcliffe, 2013). That said, public policy is not at the forefront of the use of big data and data analytics in decision making (Kaski et al., 2019; Poel et al., 2018). This nonadoption is due to multiple factors limiting these technologies' utility (Malomo & Sena, 2017). The ever-increasing amount of data offers possibilities for discovering new relationships and inferencing a multitude of problems. However, this comes with new challenges involving reproducibility, complexity, security, and risks to privacy and a need for new technology and human skills. This is very much the case in public policy, where we need to clearly identify where big data can add value in an ethical and trustworthy manner. In a review, Giest (2017) highlighted three underlying factors to consider. First, institutional capacities have a significant role in the use of big data in public policy, producing solutions that can enable users to easily interact with data while also taking into account the siloed data structures in the public domain. However, we know from previous research that siloed structures are an important limiting factor for public policy utilization of big data (Malomo & Sena, 2017). Second, hand-in-hand with big data comes the broader digitalization of public services. Digitalization allows for mediums to interact with big data but also enables the creation of new data. There is, however, evidence that digitalization changes the interactions between citizens and public officials and requires new skills from both parties. Third, big data information will have an impact on the policy cycle. Studies have found that there has been limited progress in taking advantage of big data and analytics (Poel et al., 2018) because it requires a significant change in the policy cycle (Höchtl et al., 2016). Giest (2017) highlights two issues, the substantive role and the procedural role of big data in policy instruments. Procedural activities focus on regulatory activities, such as enabling open data, while substantive actions relate to collecting data for enhancing, for example, evidence-based policy making. Capacities, digitalization, and the role of big data in the (substantive and procedural) policy cycle are core to digital-era governance and evidence-based policy making. In this, it is important to note that policy-makers are not a homogeneous group, and policy cycles vary. Thus, the objectives of analytics throughout the policy cycle vary significantly (Daniell et al., 2016) whether or not we approach the policy cycle as separate discrete stages (Jann & Wegrich, 2007), and it has been shown that big data analytics, when used more in some policy stages than in others, notably improved government transparency, policy evaluation, foresight, and agenda setting (Poel et al., 2018). This should be reflected against findings that data analytics have been politically significant in all policy cycle stages (Van der Voort et al., 2019). To overcome the challenges, Poel et al. (2015) highlighted multiple topics that must be addressed to enable capacity building, digitalization, and data integration into the policy cycle. These are (1) a skills gap, (2) reduced transparency due to data analytics, (3) sources and tools, (4) standardization of methods and tools, (5) linking of policy experiments with impact assessments, and (6) enabling policy-makers to be informed about the tools that are developed and piloted. The highlighted themes give context to the issue of big data in policy. While we see the significant impacts being created by the use of big data in policy making, along with the subsequent adaptation of data analytics, we need to better explain and make transparent the utility and complementarity of big data-driven analyses for the policy cycle (Vydra & Klievink, 2019). The challenge highlighted by Poel et al. (2015) and Giest et al. (2017), is also reflected in Pencheva et al. (2020) and Ingram (2019). Both note that big data in public policy has focused more on the "techno-rational factors," dismissing the importance of interaction with the policy process. We know that technology adoption is dependent on the perceived usefulness by the user (Venkatesh & Davis, 2000) and that there is scepticism towards the use of big data and data analytics is public policy-making (Guenduez et al., 2020). This can be the result of a mismatch in practise and expectations. Durrant et al. (2018) show how there is a aspirational motivation to the use of big data and data analytics, not reflected by its everyday utility. While we know that by employing data drive approaches, there is significant potential for anticipatory governance, interaction among stakeholders is the key to draw value from big data and analytics (Maffei et al., 2020; Starke & Lünich, 2020). While the increased stakeholder involvement does not protect from big data and data analytics policy-making creating hard to detect inequalities (Giest & Samuels, 2020; van Veenstra et al., in press). However, engaging with a large pool of stakeholders in the public policy-making process will increase the complexities of adopting big data and data analytics (Janssen et al., 2017). In addition to stakeholder interaction, the ability to build capacity and also evaluate deficiencies is important (Okuyucu & Yavuz, 2020). Building capacity should not be merely seen as the technical capacity, while that is also important (Poel et al., 2018), but a more holistic capability to integrate big data and analytics into the policy cycle (Höchtl et al., 2016). Policy-making organizations, while being exceptionally technically capable, can be in a situation where the benefits from big data and data analytics remain small due to "applications" not fitting "their organizations and main statutory tasks" (Klievink et al., 2017). This to say that when we talk about big data and data analytics capabilities in public policy-making, the literature focuses on technical issues but also on the ability of big data and data analytics to produce policy-relevant applications. While we see an increasing and diverse set of research addressing different challenges of big data and data analytics, the current body of literature lacks holistic research agendas (Desouza & Jacob, 2017) addressing the issues highlighted from practice by Giest (2017) and Poel et al. (2015). While we can note emerging fields such as policy analytics (De Marchi et al., 2016; Tsoukias et al., 2013), there is a need to better understand the theoretical grounding and research gap of big data and data analytics in public policy-making. This study's methodological approach was based on a mixed method of quantitative and qualitative analyses of the bibliometric data and the publication's content. The selected four-step mixed-methods approach, described below, enables a holistic approach to comprehending the current state-of-the-art and allows us to propose an agenda for going forward. The first phase focused on retrieving the sample of relevant articles and their bibliometric data for analysis. The second phase involved the bibliometric analysis of the retrieved data, which was performed by analysing descriptive statistics, bibliographical coupling, network analytics, and community detection. By gaining a comprehensive view of the more extensive body of literature, we could implement more filtering process based on eigenvector centrality to get a shortlist of papers for the next phase. The third phase, qualitative analysis, continued the process with an in-depth review and coding of the articles' full text. Finally, in the fourth phase, Synthesis, we draw insights from the MAXQDA coding analysis and reporting. This four-phase process is shown in Figure 2. The data used in this study were retrieved from the Scopus database. Scopus is Elsevier's abstract and citation database that has over 1.7 billion cited references dating back to 1970. A central aspect of the quality of the results is that the query used to search for relevant articles was correctly designed. The study focuses on public policy-making and big data and data analytics. The study's scope is relatively narrow, focusing solely on publications that address policy-making and the policy process. The decision of the scope excludes articled that focus on, for example, big data or data analytics, but lack the specific aspect of policy-making. This was the key inclusion criteria of articles into the data set. To focus on this specific scope, we used an iterative approach where multiple search strings were tested, and after each search, the abstracts of the 10 most cited articles and the 10 most recent articles were reviewed to understand if the query results reflected the objectives of the study. In practice, the process started with a seed query of "big data" or "data analytics" and "public policy." The query results were reviewed to estimate which articles focused on big data or data analytics and policy-making. These articles were reviewed to see if new terms emerged through the titles, abstracts, and keywords that needed to be included in the analysis. The process adjusted based on a subjective evaluation of the number of false-positives in the 10 most recent and 10 most cited publications and the number of articles retrieved. This method of short-listing the important literature is known as the snowball method, and the process includes consulting the bibliographies of the key documents to find other relevant titles in the subject (Jalali & Wohlin, 2012). After multiple tests of a comprehensive query that also limited the number of false-positives, we downloaded the metadata for 538 documents. These documents were retrieved using the query "public policy," "policy analysis," "policy making," or "public administration," with the terms "big data," "data analytics," or "automated decision-making" in the title, abstract, or keywords of the document. To analyze the literature, we used the well-established bibliometric method of bibliographical coupling. Bibliographical coupling allows for analysis of the publications' shared intellectual background (Kessler, 1963), highlighting contemporary research (Youtie et al., 2013). It is an approach to analyzing the shared theoretical background of scientific publications where the link between documents is calculated by the number of references the two documents share. Kessler (1963) elaborates, "A single item of reference shared by two documents is defined as a unit of coupling between them," and if multiple references are shared, the weight of the coupling increases. Bibliographical coupling is able to highlight hot topics (Glanzel & Czerwon, 1996) and links documents with a similar research focus (Jarneving, 2007), ultimately creating a "contemporaneous representation of knowledge" (Youtie et al., 2013). This approach has been used in several research papers to form the basis for research agenda building (Suominen et al., 2019; Yuan et al., 2015). Using the retrieved publication metadata, the VOSviewer tool (van Eck & Waltman, 2009) was selected to calculate bibliographical coupling weights for all the documents in our data set. VOSviewer is a free tool used for bibliometrics and was selected due to the exports available in the software allowing for deeper network analysis in Gephi. The SCOPUS data export was used as an input to the VOSviewer. During the analysis process, we selected documents as the level of analysis, minimum number of citations for a document was set to zero and the full set was selected for the analysis. The full counting method, which assigns each researcher with full credit of one publication rather than a fractional share per the number of authors, was used for the calculation method. Finally, we accepted VOSviewer default to keep the most extensive set of related items, which limited the analysis to the largest, by created by the bibliographical coupling analysis. This limited the analysis to documents. Bibliographical coupling analysis of a data set a by a set of and a set of we calculated the link weight between each publication we created a This data, with the VOSviewer was to because it allows for more network and community detection. network analysis, network descriptive for example, for were calculated in Gephi. were identified using et al., The is one of the most methods to find of in The methods by each to a separate community there after the in by in a This process is continued through the ultimately creating a new network with to communities et al., for a In the the can be by a that the number of communities the This was to the number of We increased the value the community has one share of the documents. We also calculated the eigenvector centrality for each document in the data set. In eigenvector centrality the a has in A value is calculated to all based on an that to other important are more important than equal This centrality value a publication's to the network created by the bibliographical coupling analysis. for we identified the most central publications from each community to be selected for systematic the most central publications was to keep the sample of documents in the coding phase and the most eigenvector central publications was a method to the most important publications from each community for a deeper analysis. of study of study or empirical study findings The offer the as a to be to the study's at For the current we adopted the of elements in the creating a approach (1) the context of the (2) and (3) the scope defined as a research We the or empirical study and to be the (4) for the theoretical or methodological We also included a separate for (5) the method used to understand the specific approach used in the study. For the key we also created The first focused on (6) results and the second focused on discussion and This was to highlight other study results and to the analysis in the scholarly debate the research To add to the coding process, the qualitative analysis was using MAXQDA The software allows for coding the documents with the and drawing from the created The use of the software transparency and the of the analysis et al., & 2012). After the coding was the coding process was by one the eigenvector central documents from each and the documents based on the The second researcher the role of the The interaction between the was to that all of the in the coding were The second researcher through the papers and by the first researcher to make that each if was However, as publications can the our approach was to we not the information multiple per This of an approach for example, The MAXQDA analysis software used in the coding created of the documents and created a document about the of text. These were used in the phase. The of by MAXQDA was used to the The two researcher with the MAXQDA of the communities After review of the the on their There after the to draw insights from the coding of the core of The retrieved publications are with the first publication in the data set in as seen in Figure The data were retrieved in 2020; publications for the first and one should a in publication While Figure it is important to this into Figure the of the topics of big data and data analytics in the public policy related literature concerning the public policy literature for the The publication volumes are on 10 to be able to them in one It is clear that the body of literature focused on big data and data analytics in public policy is a much increase in interest in to the public policy body of literature. of the publications are from or seen in 1, the three have over highlights all with over publications in the data set. the are highlighting the use of big data and data analytics for example, the topics of and issues and The descriptive analyses of the data also highlight the different that are on the shown in the publication sources for the articles included in the data set are and information we should note our The study focused solely on big data or data analytics and its use in public policy-making. In the the number of articles focusing on single for example, big data is much We should also note that the search in SCOPUS at the title, abstract and making articles big data or data analytics and policy-making in by the It is that the publication sources with at publications all or of all This that publications are over different publication and an on-going debate on the subject is hard to to a specific The of the publications by with that of scientific publication with is significantly the with The two largest are followed by the and and with Looking at the of the between different seen in of has a significant of publications in the data and with the publication there were publications per To understand more in-depth the we the and keywords of the publications in the We the and fields by and in the Figure the terms are as focusing on big data and public policy. thematic terms emerging to the most terms relate to and into the articles' the descriptive analyses highlight that the data set recent among multiple sources and with relatively volumes by The terms in the publications with the search The 538 publications' bibliometric data were for bibliographic coupling using VOSviewer During the calculation process, the software first if of the publications were and from the network emerging from the data. VOSviewer the largest in the full and offers an to used the largest set. to use the largest set allows focusing on the core the In our data the largest set of creating a network was documents. The documents were from the as these articles not be included in the bibliometric coupling based clusters but remain throughout the analysis. with the sample of the VOSviewer created network was to software for further analysis. were and the bibliographic coupling network a with the of the network being This to say that there is significant between in the communities as is but the full network is not as the is the communities were using the et al., The was increased from its default value of the community an share of the documents. a of the in nine with the the community has of the The largest community of the A representation of the created can be seen in Figure In this the the clusters created using the The of a the number of citations of an article in the the the network was created by two large communities
- Research Article
25
- 10.1002/1944-2866.poi326
- Jun 1, 2013
- Policy & Internet
Addressing the policy challenges and opportunities of “Big data”
- Book Chapter
14
- 10.1017/9781108147972.029
- Mar 8, 2018
Undeniably “Big Data” plays a crucial role in the ongoing evolution of health care and life science sector innovations. In recent years U.S. and European authorities have developed public platforms and infrastructures providing access to vast stores of health-care knowledge, including data from clinical trials and chosen patient information. The rising demand for access to data is to be expected given the mounting dependence of innovation on market data and market information. It is uncontentious to state that innovation is a competitive market activity reliant on information and knowledge, especially in the life science sectors where competitive innovation and research and development (R&D) resources are persistent considerations. For private actors, the like of pharmaceutical companies, health care providers, laboratories and insurance companies, it is becoming common practice to accumulate R&D data making it searchable through medical databases. This trend is advanced and supported by recent initiatives and legislation that are increasing the transparency of various forms of data, such as clinical trials data. As a result, researchers, companies, patients and health care providers gain access to an increased volume of personal and biological data. In light of these developments and rapid technical advances that have greatly facilitated the collection and analysis of information and knowledge from multiple sources, the protection of such data but also the access thereto has become ever critical to the innovation narrative. For the purposes of this paper this information is regarded as “big data,” i.e. not only because of the greater volume, but also for its’ increased variety, velocity, and veracity . Researchers can mine big data to identify the most effective treatments for particular conditions, to find 2nd and further medical uses, to detect patterns related to drug side effects or hospital readmissions, and gain other important information that can help patients, reduce treatment costs, channel R&D expenditure and influence innovations. It is expected that these developments will push the trend towards precision medicine and help industry and practitioners address pressing problems related to discrepancies in healthcare quality and escalating pressures to reduce and control healthcare expenditure. The vast prospects of Big Data and the gradual shift to more “personalized”, “open” and “transparent” innovation models highlight the importance of an effective and well-calibrated regulation, governance, and use of biological and personal data. At the same time, the translation of Big Data science into safe and efficient “real world” applications raise complex legal challenges relating to public-private research collaborations, data integrity, privacy, ethics and commercialization amongst others. One such challenge arises from the intersection between the Big Data narrative and intellectual property rights (IPRs). The interplay with IPRs comes into play when research needs to be translated into safe and efficient “real world” applications. Data analytics or data mining will often involve the copying of, or references to, information or databases, all of which might be protected by various forms of intellectual property rights in relevant jurisdictions. Where data is not owned or licensed the user will need to rely on an exception to IPR infringement to use data. This has given rise to fierce controversies between data “owners”, data researches, and entities that provide enabling technologies, large research infrastructures and standardization platforms . These tensions at the interface of Big Data and IPRs involve complex considerations relating to innovation economics, ownership, licensing and exceptions to IPRs. Seemingly there is convincing evidence that the current Intellectual Property regime needs re-calibration in certain areas to harness the full potential of Big Data in the health and life science sectors. Unfortunately, there seems to be much confusion and little agreement about the availability of IPRs and their legal effects among the multiple stakeholders in Big Data science. Against this background, this Chapter offers insights into challenges resulting from the intersection between the Big Data narrative and IPRs. Section 2 provides a brief summary of the most relevant IPRs for data-based life science research. Due to space constraints, we will concentrate on IPRs and related “sui generis” rights protecting commercially relevant data, rather than legislation protecting personal data integrity or regulating data transfer. This serves as the basis for section 3, which briefly discusses selected areas, where emerging tensions between IPRs and Big Data crystallize. Recognizing that the choice of how to address and interact with IPRs varies between different settings and technical applications, our concluding remarks in section 4 argue that these tensions need to be discussed and addressed on a case-by case basis.
- Research Article
- 10.58489/2836-2225/014
- Jun 30, 2023
- International Journal of Reproductive Research
Without a doubt, artificial intelligence in today's era has made tremendous progress, it is all around us in various forms and automation. It is constantly evolving and the goal is to reach a level where it will be very close to human thought and logic and will be able to perform processes that were time-consuming, complex, or thought impossible until now. To implement all of the above, sophisticated algorithms and a large collection of data, known as Big Data, are required in order to extract the necessary information to be used later by the algorithms. All countries of the world do not apply the same policy to protect their citizens' personal data, some countries apply a strict policy and other countries do not apply any personal data protection policy. Big Data and artificial intelligence, unfortunately, do not discriminate between countries and the laws applied by the country, Big Data is collected from different sources, in different ways, and the way of processing varies. Given the above, it is impossible to apply the GDPR to limit and control the personal data of European citizens living in the European Union. Citizens of the European Union do not have the possibility to give their consent or not to the personal data collected and processed by third parties. It should be noted that the GDPR regulation is opt-in / opt-out which means that the European citizen should always give his/her consent to the collection, storage, and processing of personal data, he/she is also given the possibility to request the deletion of his/her personal data. This is a problem with Big Data, as there is no consent from the citizen. The European citizen, when he/she becomes aware of the leakage of his personal data, can do little to secure his personal data that has been leaked, he/she can report the leak to the competent authority for the protection of personal data, and the citizen himself can contact the company/organization directly from which it realized the projection of its personal data, can contact non-governmental organizations that deal with the protection of personal data, or proceed to third-party organizations that may be available in each country. It is true that the citizen has minimal capabilities, and his/her personal data that has been published and that he/she does not know to the real extent exists in data centers around the world, the citizen detected a specific leak of personal data, does not mean this are the unique personal data that exist for the specific citizen, there may be other personal data that either have not yet been processed or have not yet been identified by the algorithms.
- Supplementary Content
51
- 10.1016/j.xinn.2022.100279
- Jul 6, 2022
- The Innovation
Geographic information science in the era of geospatial big data: A cyberspace perspective
- Research Article
28
- 10.1038/s41436-018-0067-8
- Jan 1, 2019
- Genetics in Medicine
Big data phenotyping in rare diseases: some ethical issues
- Research Article
111
- 10.5204/mcj.561
- Oct 11, 2012
- M/C Journal
Lists and Social MediaLists have long been an ordering mechanism for computer-mediated social interaction. While far from being the first such mechanism, blogrolls offered an opportunity for bloggers to provide a list of their peers; the present generation of social media environments similarly provide lists of friends and followers. Where blogrolls and other earlier lists may have been user-generated, the social media lists of today are more likely to have been produced by the platforms themselves, and are of intrinsic value to the platform providers at least as much as to the users themselves; both Facebook and Twitter have highlighted the importance of their respective “social graphs” (their databases of user connections) as fundamental elements of their fledgling business models. This represents what Mejias describes as “nodocentrism,” which “renders all human interaction in terms of network dynamics (not just any network, but a digital network with a profit-driven infrastructure).”The communicative content of social media spaces is also frequently rendered in the form of lists. Famously, blogs are defined in the first place by their reverse-chronological listing of posts (Walker Rettberg), but the same is true for current social media platforms: Twitter, Facebook, and other social media platforms are inherently centred around an infinite, constantly updated and extended list of posts made by individual users and their connections.The concept of the list implies a certain degree of order, and the orderliness of content lists as provided through the latest generation of centralised social media platforms has also led to the development of more comprehensive and powerful, commercial as well as scholarly, research approaches to the study of social media. Using the example of Twitter, this article discusses the challenges of such “big data” research as it draws on the content lists provided by proprietary social media platforms.Twitter Archives for ResearchTwitter is a particularly useful source of social media data: using the Twitter API (the Application Programming Interface, which provides structured access to communication data in standardised formats) it is possible, with a little effort and sufficient technical resources, for researchers to gather very large archives of public tweets concerned with a particular topic, theme or event. Essentially, the API delivers very long lists of hundreds, thousands, or millions of tweets, and metadata about those tweets; such data can then be sliced, diced and visualised in a wide range of ways, in order to understand the dynamics of social media communication. Such research is frequently oriented around pre-existing research questions, but is typically conducted at unprecedented scale. The projects of media and communication researchers such as Papacharissi and de Fatima Oliveira, Wood and Baughman, or Lotan, et al.—to name just a handful of recent examples—rely fundamentally on Twitter datasets which now routinely comprise millions of tweets and associated metadata, collected according to a wide range of criteria. What is common to all such cases, however, is the need to make new methodological choices in the processing and analysis of such large datasets on mediated social interaction.Our own work is broadly concerned with understanding the role of social media in the contemporary media ecology, with a focus on the formation and dynamics of interest- and issues-based publics. We have mined and analysed large archives of Twitter data to understand contemporary crisis communication (Bruns et al), the role of social media in elections (Burgess and Bruns), and the nature of contemporary audience engagement with television entertainment and news media (Harrington, Highfield, and Bruns). Using a custom installation of the open source Twitter archiving tool yourTwapperkeeper, we capture and archive all the available tweets (and their associated metadata) containing a specified keyword (like “Olympics” or “dubstep”), name (Gillard, Bieber, Obama) or hashtag (#ausvotes, #royalwedding, #qldfloods). In their simplest form, such Twitter archives are commonly stored as delimited (e.g. comma- or tab-separated) text files, with each of the following values in a separate column: text: contents of the tweet itself, in 140 characters or less to_user_id: numerical ID of the tweet recipient (for @replies) from_user: screen name of the tweet sender id: numerical ID of the tweet itself from_user_id: numerical ID of the tweet sender iso_language_code: code (e.g. en, de, fr, ...) of the sender’s default language source: client software used to tweet (e.g. Web, Tweetdeck, ...) profile_image_url: URL of the tweet sender’s profile picture geo_type: format of the sender’s geographical coordinates geo_coordinates_0: first element of the geographical coordinates geo_coordinates_1: second element of the geographical coordinates created_at: tweet timestamp in human-readable format time: tweet timestamp as a numerical Unix timestampIn order to process the data, we typically run a number of our own scripts (written in the programming language Gawk) which manipulate or filter the records in various ways, and apply a series of temporal, qualitative and categorical metrics to the data, enabling us to discern patterns of activity over time, as well as to identify topics and themes, key actors, and the relations among them; in some circumstances we may also undertake further processes of filtering and close textual analysis of the content of the tweets. Network analysis (of the relationships among actors in a discussion; or among key themes) is undertaken using the open source application Gephi. While a detailed methodological discussion is beyond the scope of this article, further details and examples of our methods and tools for data analysis and visualisation, including copies of our Gawk scripts, are available on our comprehensive project website, Mapping Online Publics.In this article, we reflect on the technical, epistemological and political challenges of such uses of large-scale Twitter archives within media and communication studies research, positioning this work in the context of the phenomenon that Lev Manovich has called “big social data.” In doing so, we recognise that our empirical work on Twitter is concerned with a complex research site that is itself shaped by a complex range of human and non-human actors, within a dynamic, indeed volatile media ecology (Fuller), and using data collection and analysis methods that are in themselves deeply embedded in this ecology. “Big Social Data”As Manovich’s term implies, the Big Data paradigm has recently arrived in media, communication and cultural studies—significantly later than it did in the hard sciences, in more traditionally computational branches of social science, and perhaps even in the first wave of digital humanities research (which largely applied computational methods to pre-existing, historical “big data” corpora)—and this shift has been provoked in large part by the dramatic quantitative growth and apparently increased cultural importance of social media—hence, “big social data.” As Manovich puts it: For the first time, we can follow [the] imaginations, opinions, ideas, and feelings of hundreds of millions of people. We can see the images and the videos they create and comment on, monitor the conversations they are engaged in, read their blog posts and tweets, navigate their maps, listen to their track lists, and follow their trajectories in physical space. (Manovich 461) This moment has arrived in media, communication and cultural studies because of the increased scale of social media participation and the textual traces that this participation leaves behind—allowing researchers, equipped with digital tools and methods, to “study social and cultural processes and dynamics in new ways” (Manovich 461). However, and crucially for our purposes in this article, many of these scholarly possibilities would remain latent if it were not for the widespread availability of Open APIs for social software (including social media) platforms. APIs are technical specifications of how one software application should access another, thereby allowing the embedding or cross-publishing of social content across Websites (so that your tweets can appear in your Facebook timeline, for example), or allowing third-party developers to build additional applications on social media platforms (like the Twitter user ranking service Klout), while also allowing platform owners to impose de facto regulation on such third-party uses via the same code. While platform providers do not necessarily have scholarship in mind, the data access affordances of APIs are also available for research purposes. As Manovich notes, until very recently almost all truly “big data” approaches to social media research had been undertaken by computer scientists (464). But as part of a broader “computational turn” in the digital humanities (Berry), and because of the increased availability to non-specialists of data access and analysis tools, media, communication and cultural studies scholars are beginning to catch up. Many of the new, large-scale research projects examining the societal uses and impacts of social media—including our own—which have been initiated by various media, communication, and cultural studies research leaders around the world have begun their work by taking stock of, and often substantially extending through new development, the range of available tools and methods for data analysis. The research infrastructure developed by such projects, therefore, now reflects their own disciplinary backgrounds at least as much as it does the fundamental principles of computer science. In turn, such new and often experimental tools and methods necessarily also provoke new epistemological and methodological challenges. The Twitter API and Twitter ArchivesThe Open
- Research Article
50
- 10.2139/ssrn.2482780
- Aug 27, 2014
- SSRN Electronic Journal
The Role of 'Big Data' in Online Platform Competition
- Supplementary Content
- 10.5281/zenodo.831717
- Jan 1, 2017
- Zenodo (CERN European Organization for Nuclear Research)
Executive summary Big data analysis has become a source of competitive advantage for businesses and even whole economies. Big data is indeed a critical issue for the digital economy, innovation and services that are expected to drive growth and job creation in the EU. Compared to the USA, however, the EU has been slow in the adoption of big data analysis. In particular, performance may be improved by investing in and making use of big customer data to offer goods and services that better meet customer needs, i.e., “big data-driven marketing (BDM)”. However, adopting BDM has a steep learning curve due to organizations’ lack of understanding regarding the diverse factors required to succeed. Against this backdrop, the BIDAMARK project aims to assess (1) investment into big data, i.e., how the acquisition and/or internal development of strategic big data –related resources improve firm marketing strategy, and ultimately, firm performance; and (2) utilization of big data, i.e., how the organizational use of customer insights in marketing decision-making helps firms offer goods and services that better meet customer needs, and ultimately, improve firm profitability. To address these issues, a multivariate analysis of a multi-industry sample of 301 US-based firms is implemented. Three important results were found in relation to big data investment. First, the results show that despite the challenges associated with big data, firm performance increases 11-12% as a result of big data resources, as opposed to previously reported 5-6% increases with prior (pre-big data) IT investment. Second, this study identifies and assesses the relative importance of three critical IT resources: big data-related (1) technology; (2) analytics skills; and (3) organizational resources, all of which are confirmed as necessary and complementary dimensions for superior data-driven performance. Third, this study highlights the critical role of the firm’s adopted business strategy in driving big data success: Differentiating firms are far more likely to benefit from big data than cost leadership firms. Three important results were also found in relation to big data utilization. First, this study highlights the role of information quality (IQ) in predicting big data customer analytics use. In particular, the format (visualization) of customer information is the most critical aspect of IQ. Second, big data customer analytics use, and personalization as its most crucial dimension, is found to be a key predictor of firm performance. Third, the performance impacts of big data customer analytics use are highly contingent on its prevalence within an industry, i.e., competitive advantage can be imitated away by rivals. However, hyper-personalization afforded by big data-driven customer insights makes the firm’s customers less vulnerable to competitor moves when strong customer relationships are built. The results of BIDAMARK can be exploited in guiding big data –related policymaking in public bodies related to society and citizens as consumers, and strategic decision-making in firms investing and utilizing big data for better competitiveness.
- Research Article
2
- 10.25204/iktisad.1144242
- Oct 29, 2022
- İktisadi İdari ve Siyasal Araştırmalar Dergisi
Çalışmanın amacı, kişisel sağlık verilerinin dijitalleşmesi sonucu oluşan büyük veri kavramının önemini ortaya koymaktır. Bu anlamda; kişisel sağlık verileri, elektronik sağlık kayıtları ve büyük veri kavramları ile ilgili literatür taraması yapılmıştır. Ayrıca büyük veri ile kişisel sağlık verileri arasındaki ilişki ele alınmıştır. Yapılan literatür taramasında, bilgi ve iletişim teknolojilerinin gelişmesi sonucu, kişisel sağlık verilerini işleme, depolama ve aktarma aşamalarında değişiklik ve elektronik sağlık kayıtlarının oluşturulduğu tespit edilmiştir. Elektronik sağlık kayıtlarının depolanması için fiziki arşivlerin yerine, büyük veri depolama alanlarının kullanıldığı belirlenmiştir. Böylece geliştirilen büyük veri sayesinde; klinik karar verme, kaliteli hizmet sunumu gibi birçok alanda sağlık hizmetlerine katkı sağladığı tespit edilmiştir. Çalışmada, kişisel sağlık verileri ile büyük veri arasındaki ilişkinin, sağlık hizmetlerini geliştirdiği ve insan kaynaklı hataları azalttığı sonucuna ulaşılmıştır. Diğer bir yandan, büyük veri kavramının gelişiminde; siber saldırılar, kötü niyetli yazılım şirketleri veya kullanıcı olarak bireylerin sağlık bilgi sistemlerine karşı olası güvensizlik kaygısına yönelik olumsuz tutum ve davranışlar gibi, önemli zorlukları oluşturduğu değerlendirilmiştir.
- Research Article
- 10.5281/zenodo.831718
- Jul 18, 2017
- Zenodo (CERN European Organization for Nuclear Research)
<p><strong>Executive summary</strong></p> <p>Big data analysis has become a source of competitive advantage for businesses and even whole economies. Big data is indeed a critical issue for the digital economy, innovation and services that are expected to drive growth and job creation in the EU. Compared to the USA, however, the EU has been slow in the adoption of big data analysis. In particular, performance may be improved by investing in and making use of big customer data to offer goods and services that better meet customer needs, i.e., “big data-driven marketing (BDM)”. However, adopting BDM has a steep learning curve due to organizations’ lack of understanding regarding the diverse factors required to succeed.</p> <p>Against this backdrop, the BIDAMARK project aims to assess (1) investment into big data, i.e., how the acquisition and/or internal development of strategic big data –related resources improve firm marketing strategy, and ultimately, firm performance; and (2) utilization of big data, i.e., how the organizational use of customer insights in marketing decision-making helps firms offer goods and services that better meet customer needs, and ultimately, improve firm profitability. To address these issues, a multivariate analysis of a multi-industry sample of 301 US-based firms is implemented.</p> <p>Three important results were found in relation to big data investment. First, the results show that despite the challenges associated with big data, firm performance increases 11-12% as a result of big data resources, as opposed to previously reported 5-6% increases with prior (pre-big data) IT investment. Second, this study identifies and assesses the relative importance of three critical IT resources: big data-related (1) technology; (2) analytics skills; and (3) organizational resources, all of which are confirmed as necessary and complementary dimensions for superior data-driven performance. Third, this study highlights the critical role of the firm’s adopted business strategy in driving big data success: Differentiating firms are far more likely to benefit from big data than cost leadership firms.</p> <p>Three important results were also found in relation to big data utilization. First, this study highlights the role of information quality (IQ) in predicting big data customer analytics use. In particular, the format (visualization) of customer information is the most critical aspect of IQ. Second, big data customer analytics use, and personalization as its most crucial dimension, is found to be a key predictor of firm performance. Third, the performance impacts of big data customer analytics use are highly contingent on its prevalence within an industry, i.e., competitive advantage can be imitated away by rivals. However, hyper-personalization afforded by big data-driven customer insights makes the firm’s customers less vulnerable to competitor moves when strong customer relationships are built.</p> <p>The results of BIDAMARK can be exploited in guiding big data –related policymaking in public bodies related to society and citizens as consumers, and strategic decision-making in firms investing and utilizing big data for better competitiveness.</p>
- Book Chapter
- 10.1017/9781780685786.020
- Sep 1, 2018
INTRODUCTION Information that can be associated with an identifiable individual is given special treatment in law in many countries This ‘personal information’, as it is called in Australia, attracts the coverage of privacy and data protection law The scope of the information that fits within the relevant definition varies between jurisdictions, and is a core concern of this chapter. Information, which cannot be associated with an identifiable individual in the way defined in a particular jurisdiction, escapes such classification as ‘personal’ It can then be dealt with under oft en less restrictive controls outside of privacy and data protection law such as contractual restrictions on data scraping imposed by data owners, or issues involving copyright clearance This has significant implications for both data subjects (whose information is less protected) and data custodians (who face fewer regulatory constraints) Privacy and data protection laws typically only apply when data or information is linked to a person, or is capable of the identification of a person If Big Data items or records do not fall within the relevant legal definition of categories such as ‘personal information’, ‘personal data’ or their equivalent, they are not subject to privacy law in a given jurisdiction, so there are fewer rules for their use and transfer Where information is ‘personal’ there are a variety of methods that are used, in particular in data analytics, to make the information non-identified The terms for these methods include ‘anonymous’, ‘pseudonymous’, ‘de-identified’, ‘depersonalised’, ‘data minimisation’ and ‘not personal data’ . Although the technology practices known as ‘Big Data’ raise many important privacy issues, this paper focuses on their effect on questions about whether a particular record or data set is considered ‘personal’ and therefore ‘about’ an identifiable individual We survey definitions of personal data or information (or their equivalent) in jurisdictions including Australia, the United States and the European Union in the context of identifiability and Big Data. The divergence of the definitions used in the jurisdictions helps explain some of the differences in how they respond to Big Data developments – and to the challenges that Big Data tools may pose for legal systems.
- Research Article
- 10.52663/kcsr.2023.28.4.211
- Dec 31, 2023
- The Korea Association for Corruption Studies
The proliferation of indispensable platforms and services has elevated the utilization of users' personal data, consequently giving rise to an escalation in personal information disputes. Over the past decade, the number of cases managed by the Personal Information Dispute Resolution Committee has surged by 582%. Traditional legal proceedings face formidable challenges due to substantial time and cost constraints, hindering expeditious resolutions. Furthermore, the multifaceted nature of personal information breaches adds complexity to accurately discerning the nuances of the damages incurred.
 Personal information breaches occur on various fronts, disseminating sensitive information through online behavioral tracking technology and big data analysis. Due to the unique characteristics associated with personal information, including anonymity, data intricacies, and technological vulnerabilities, a diverse range of breach methods emerges.
 Personal information breaches occur on various fronts, disseminating sensitive information through online behavioral tracking technology and big data analysis. Due to the unique characteristics associated with personal information, including anonymity, data intricacies, and technological vulnerabilities, a diverse range of breach methods emerges.
 Disputes related to personal information have the potential to evolve into significant issues affecting not only individual concerns but the entire societal landscape. Recognizing and increasing awareness of this importance, along with efforts for legal system enhancements, becomes imperative. This study aims to explore effective improvements in the system and appropriate policy directions for addressing disputes related to personal information.