Coded Computing: Mitigating Fundamental Bottlenecks in Large-Scale Distributed Computing and Machine Learning
Recent years have witnessed a rapid growth of large-scale machine learning and big data analytics, facilitating the developments of data intensive applications like voice/image recognition, real-time mapping services, autonomous driving, social networks, and augmented/virtual reality. These applications are supported by cloud infrastructures composed of large datacenters. The large scale distributed machine learning/data analytics systems provide the necessary processing power to handle these applications, but suffer three major performance bottlenecks; namely, communication, straggler and security. In this ground-breaking monograph, the authors introduce the novel concept of Coded Computing. Coded Computing exploits coding theory to optimally inject and leverage data/task redundancy in distributed computing systems, creating coding opportunities to overcome the bottlenecks. After introducing the reader to the core of the problem, the authors describe in detail each of the bottlenecks that can be overcome using Coded Computing. The monograph provides an accessible introduction into how this new technique can be used in developing large-scale computing systems.
- Research Article
3
- 10.52214/vib.v7i.8403
- Jun 2, 2021
- Voices in Bioethics
Legal Governance of Brain Data Derived from Artificial Intelligence
- Research Article
31
- 10.1016/j.procs.2015.04.095
- Jan 1, 2015
- Procedia Computer Science
Cloud Based Big Data Analytics Framework for Face Recognition in Social Networks Using Machine Learning
- Research Article
90
- 10.1016/j.ymeth.2018.05.015
- Jun 8, 2018
- Methods
m-Health 2.0: New perspectives on mobile health, machine learning and big data analytics.
- Research Article
13
- 10.3390/su16072764
- Mar 27, 2024
- Sustainability
Over the past years, machine learning and big data analysis have emerged, starting as a scientific and fictional domain, very interesting but difficult to test, and becoming one of the most powerful tools that is part of Industry 5.0 and has a significant impact on sustainable, resilient manufacturing. This has garnered increasing attention within scholarly circles due to its applicability in various domains. The scope of the article is to perform an exhaustive bibliometric analysis of existing papers that belong to machine learning and big data, pointing out the capability from a scientific point of view, explaining the usability of applications, and identifying which is the actual in a continually changing domain. In this context, the present paper aims to discuss the research landscape associated with the use of machine learning and big data analysis in Industry 5.0 in terms of themes, authors, citations, preferred journals, research networks, and collaborations. The initial part of the analysis focuses on the latest trends and how researchers lend a helping hand to change preconceptions about machine learning. The annual growth rate is 123.69%, which is considerable for such a short period, and it requires a comprehensive analysis to check the boom of articles in this domain. Further, the exploration investigates affiliated academic institutions, influential publications, journals, key contributors, and most delineative authors. To accomplish this, a dataset has been created containing researchers’ papers extracted from the ISI Web of Science database using keywords associated with machine learning and big data, starting in 2016 and ending in 2023. The paper incorporates graphs, which describe the most relevant authors, academic institutions, annual publications, country collaborations, and the most used words. The paper ends with a review of the globally most cited documents, describing the importance of machine learning and big data in Industry 5.0.
- Research Article
- 10.55041/ijsrem13480
- May 22, 2022
- INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT
All applications in current trends need to use Machine Learning and Big Data in this huge data world.Machine learning is a type of artificial intelligence that allows software applications to become more accurate at predicting outcomes without being explicitly programmed to do so.Machine learning use historical data as input to predict new output values.Small Scale and even large developers cannot maintain Machine Learning Algorithms and other big data processing techniques all by themselves.To harvest, process and analyze the data we need a data processing engine.Our project is to provide them with a machine learning engine which runs on.a specified port on their servers which collects the data sets in real time.This which can be queried using a simple Query Language and analyze the database on the query.The purpose of our project is to create a Data Processing Engine for Big Data Analytics. Machine learning and Big Data plays a major role in the current huge data world. Our project is to maintain a dedicated machine learning and data processing engine, so that even small developers without the knowledge of machine learning and big data analysis can produce expected results based on Machine Learning to make their application smooth. All applications in current trends need to use Machine Learning and Big Data in this huge data world. Machine learning is a subset of artificial intelligence that allows software applications to become more accurate at predicting outcomes without being explicitly programmed to do so. Machine learning uses historical data as input to predict new output values. Small Scale and even large developers cannot maintain Machine Learning Algorithms and other big data processing techniques all by themselves. To harvest, process and analyze the data we need a data processing engine. This project is to provide them a data processing engine which runs on a specified port on their servers which collects the data sets in real time. This can be queried using a simple Query Language and analyze the data and produce respected results using ML based on the query
- Research Article
4
- 10.51594/csitrj.v4i3.1500
- Dec 30, 2023
- Computer Science & IT Research Journal
The rapid digitization of industries and the proliferation of connected devices have exponentially increased the surface area for cyber threats, making traditional cybersecurity methods increasingly inadequate. The paper explores the integration of advanced technologies to enhance threat detection capabilities in a dynamically evolving cyber landscape. This study emphasizes the critical role of machine learning (ML) and big data analytics in identifying, analyzing, and mitigating cyber threats in real time. By leveraging the massive volumes of data generated across networks, ML algorithms can detect anomalous behavior patterns and predict potential threats with high accuracy. Big data analytics further enhances this process by processing and analyzing data at unprecedented speeds, enabling swift identification and response to security breaches. The comprehensive approach outlined in this study addresses key challenges, including the complexity of modern cyber threats, the need for scalability in cybersecurity solutions, and the importance of minimizing false positives. Additionally, the research highlights the importance of continuous learning models that adapt to new and emerging threats, ensuring that the system remains resilient against sophisticated attacks. Case studies of successful implementations across various industries are examined to demonstrate the practical applications and benefits of this approach. The findings suggest that integrating ML and big data analytics in real-time threat detection systems significantly improves cybersecurity defenses, providing organizations with the tools to proactively counteract cyber threats. This approach is positioned as a vital strategy for organizations seeking to fortify their cybersecurity posture in an increasingly interconnected world, where the speed and accuracy of threat detection are paramount to safeguarding critical assets and maintaining trust. Keywords: Real-Time, Cybersecurity, Threat Detection, ML, Big Data Analytics.
- Research Article
59
- 10.1016/j.ecoser.2022.101478
- Sep 12, 2022
- Ecosystem Services
Ecosystem services are essential for human well-being, but are currently facing many natural and anthropogenic threats. Modeling and mapping ecosystem services helps us mitigate, adapt to, and manage these pressures, but overall the field faces multiple major limitations. These include: 1) data availability, 2) understanding, estimation, and reporting of uncertainties, and 3) connecting socio-ecological aspects of ecosystem services. Recent technological advancements in machine learning coupled with rising availability of big data, offer an opportunity to overcome these challenges. We review studies utilizing machine learning and/or big data to overcome these limitations. We collect 56 papers that exemplify the current use of machine learning and big data to address the three identified gaps in the ecosystem service field. We find that although the use of these tools in ecosystem service research is relatively new, it is growing quickly. Big data can directly address data gaps, especially as new big data resources relevant to ecosystem service mapping become available (ex. social media data). Some properties of machine learning can also contribute to addressing data gaps in data sparse environments. Also, many machine learning algorithms can estimate and consider uncertainty, whereas big data can significantly increase sample size, reducing uncertainties in some situations. Some big data sources, like crowdsourced data, provide direct sources of social behaviors and preferences that relate to ecosystem service demand, thus allowing researchers to connect social and biophysical aspects of ecosystem services. Machine learning algorithms provide an effective and efficient tool for handling these large nonlinear socio-ecological datasets in tandem, giving researchers the ability to more realistically model and map ecosystem services without relying on oversimplified proxies or linear algorithms. Despite these opportunities, implementation is still lacking and limitations still hinder use.
- Dissertation
- 10.6092/polito/porto/2668398
- Jan 1, 2017
Over the past 20 years, the Internet saw an exponential grown of traffic, users, services and applications. Currently, it is estimated that the Internet is used everyday by more than 3.6 billions users, who generate 20 TB of traffic per second. Such a huge amount of data challenge network managers and analysts to understand how the network is performing, how users are accessing resources, how to properly control and manage the infrastructure, and how to detect possible threats. Along with mathematical, statistical, and set theory methodologies machine learning and big data approaches have emerged to build systems that aim at automatically extracting information from the raw data that the network monitoring infrastructures offer. In this thesis I will address different network monitoring solutions, evaluating several methodologies and scenarios. I will show how following a common workflow, it is possible to exploit mathematical, statistical, set theory, and machine learning methodologies to extract meaningful information from the raw data. Particular attention will be given to machine learning and big data methodologies such as DBSCAN, and the Apache Spark big data framework. The results show that despite being able to take advantage of mathematical, statistical, and set theory tools to characterize a problem, machine learning methodologies are very useful to discover hidden information about the raw data. Using DBSCAN clustering algorithm, I will show how to use YouLighter, an unsupervised methodology to group caches serving YouTube traffic into edge-nodes, and latter by using the notion of Pattern Dissimilarity, how to identify changes in their usage over time. By using YouLighter over 10-month long races, I will pinpoint sudden changes in the YouTube edge-nodes usage, changes that also impair the end users' Quality of Experience. I will also apply DBSCAN in the deployment of SeLINA, a self-tuning tool implemented in the Apache Spark big data framework to autonomously extract knowledge from network traffic measurements. By using SeLINA, I will show how to automatically detect the changes of the YouTube CDN previously highlighted by YouLighter. Along with these machine learning studies, I will show how to use mathematical and set theory methodologies to investigate the browsing habits of Internauts. By using a two weeks dataset, I will show how over this period, the Internauts continue discovering new websites. Moreover, I will show that by using only DNS information to build a profile, it is hard to build a reliable profiler. Instead, by exploiting mathematical and statistical tools, I will show how to characterize Anycast-enabled CDNs (A-CDNs). I will show that A-CDNs are widely used either for stateless and stateful services. That A-CDNs are quite popular, as, more than 50% of web users contact an A-CDN every day. And that, stateful services, can benefit of A-CDNs, since their paths are very stable over time, as demonstrated by the presence of only a few anomalies in their Round Trip Time. Finally, I will conclude by showing how I used BGPStream an open-source software framework for the analysis of both historical and real-time Border Gateway Protocol (BGP) measurement data. By using BGPStream in real-time mode I will show how I detected a Multiple Origin AS (MOAS) event, and how I studies the black-holing community propagation, showing the effect of this community in the network. Then, by using BGPStream in historical mode, and the Apache Spark big data framework over 16 years of data, I will show different results such as the continuous growth of IPv4 prefixes, and the growth of MOAS events over time. All these studies have the aim of showing how monitoring is a fundamental task in different scenarios. In particular, highlighting the importance of machine learning and of big data methodologies.
- Conference Article
53
- 10.1109/bigdata50022.2020.9378407
- Dec 10, 2020
In the current technological era, huge amounts of big data are generated and collected from a wide variety of rich data sources. These big data can be of different levels of veracity in the sense that some of them are precise while some others are imprecise and uncertain. Embedded in these big data are useful information and valuable knowledge to be discovered. An example of these big data is healthcare and epidemiological data such as data related to patients who suffered from epidemic diseases like the coronavirus disease 2019 (COVID-19). Knowledge discovered from these epidemiological data-via data science techniques such as machine learning, data mining, and online analytical processing (OLAP)-helps researchers, epidemiologists and policy makers to get a better understanding of the disease, which may inspire them to come up ways to detect, control and combat the disease. In this paper, we present a machine learning and big data analytic tool for processing and analyzing COVID-19 epidemiological data. Specifically, the tool makes good use of taxonomy and OLAP to generalize some specific attributes into some generalized attributes for effective big data analytics. Instead of ignoring unknown or unstated values of some attributes, the tool provides users with flexibility of including or excluding these values, depending on their preference and applications. Moreover, the tool discovers frequent patterns and their related patterns, which help reveal some useful knowledge such as absolute and relative frequency of the patterns. Furthermore, the tool learns from the patterns discovered from historical data and predicts useful information such as clinical outcomes for future data. As such, the tool helps users to get a better understanding of information about the confirmed cases of COVID-19. Although this tool is designed for machine learning and analytics of big epidemiological data, it would be applicable to machine learning and analytics of big data in many other real-life applications and services.
- Research Article
88
- 10.1111/bdi.12828
- Sep 18, 2019
- Bipolar Disorders
The International Society for Bipolar Disorders Big Data Task Force assembled leading researchers in the field of bipolar disorder (BD), machine learning, and big data with extensive experience to evaluate the rationale of machine learning and big data analytics strategies for BD. A task force was convened to examine and integrate findings from the scientific literature related to machine learning and big data based studies to clarify terminology and to describe challenges and potential applications in the field of BD. We also systematically searched PubMed, Embase, and Web of Science for articles published up to January 2019 that used machine learning in BD. The results suggested that big data analytics has the potential to provide risk calculators to aid in treatment decisions and predict clinical prognosis, including suicidality, for individual patients. This approach can advance diagnosis by enabling discovery of more relevant data-driven phenotypes, as well as by predicting transition to the disorder in high-risk unaffected subjects. We also discuss the most frequent challenges that big data analytics applications can face, such as heterogeneity, lack of external validation and replication of some studies, cost and non-stationary distribution of the data, and lack of appropriate funding. Machine learning-based studies, including atheoretical data-driven big data approaches, provide an opportunity to more accurately detect those who are at risk, parse-relevant phenotypes as well as inform treatment selection and prognosis. However, several methodological challenges need to be addressed in order to translate research findings to clinical settings.
- Research Article
9
- 10.1111/bcpt.13828
- Jan 3, 2023
- Basic & clinical pharmacology & toxicology
Machine learning can operationalize the rich and complex data in electronic patient records for exploratory pharmacovigilance endeavours. The objective of this review is to identify applications of machine learning and big patient data in exploratory pharmacovigilance. We searched PubMed and Embase and included original articles with an exploratory pharmacovigilance purpose, focusing on medicinal interventions and reporting the use of machine learning in electronic patient records with ≥1000 patients collected after market entry. Of 2557 studies screened, seven were included. Those covered six countries and were published between 2015 and 2021. The most prominent machine learning methods were random forests, logistic regressions, and support vector machines. Two studies used artificial neural networks or naive Bayes classifiers. One study used formal concept analysis for association mining, and another used temporal difference learning. Five studies compared several methods against each other. The numbers of patients in most data sets were in the order of thousands; two studies used what can more reasonably be considered big data with >1 000 000 patients records. Despite years of great aspirations for combining machine learning and clinical data for exploratory pharmacovigilance, only few studies still seem to deliver somewhat on these expectations.
- Research Article
10
- 10.1016/j.ijpe.2023.109137
- Dec 26, 2023
- International Journal of Production Economics
Big data and machine learning in cost estimation: An automotive case study
- Single Book
10
- 10.1002/9781119522225
- Nov 30, 2018
Get to know the ‘why' and ‘how' of machine learning and big data in quantitative investment. Big Data and Machine Learning in Quantitative Investment is not just about demonstrating the maths or the coding. Instead, it's a book by practitioners for practitioners, covering the questions of why and how of applying machine learning and big data to quantitative finance. The book is split into 13 chapters, each of which is written by a different author on a specific case. The chapters are ordered according to the level of complexity; beginning with the big picture and taxonomy, moving onto practical applications of machine learning and finally finishing with innovative approaches using deep learning. • Gain a solid reason to use machine learning • Frame your question using financial markets laws • Know your data • Understand how machine learning is becoming ever more sophisticated Machine learning and big data are not a magical solution, but appropriately applied, they are extremely effective tools for quantitative investment — and this book shows you how.
- Research Article
5
- 10.3390/make7010013
- Feb 7, 2025
- Machine Learning and Knowledge Extraction
The integration of machine learning (ML) with big data has revolutionized industries by enabling the extraction of valuable insights from vast and complex datasets. This convergence has fueled advancements in various fields, leading to the development of sophisticated models capable of addressing complicated problems. However, the application of ML in big data environments presents significant challenges, including issues related to scalability, data quality, model interpretability, privacy, and the handling of diverse and high-velocity data. This survey provides a comprehensive overview of the current state of ML applications in big data, systematically identifying the key challenges and recent advancements in the field. By critically analyzing existing methodologies, this paper highlights the gaps in current research and proposes future directions for the development of scalable, interpretable, and privacy-preserving ML techniques. Additionally, this survey addresses the ethical and societal implications of ML in big data, emphasizing the need for responsible and equitable approaches to harnessing these technologies. The insights presented in this paper aim to guide future research and contribute to the ongoing discourse on the responsible integration of ML and big data.
- Research Article
- 10.46610/jobdabi.2024.v01i01.003
- Apr 16, 2024
- Journal of Big Data Analytics and Business Intelligence
Machine learning (ML) and big data analytics have emerged as transformative technologies in the agricultural sector, offering innovative solutions to enhance productivity, sustainability, and profitability. This abstract provides an overview of how ML and big data are revolutionizing traditional farming practices by leveraging advanced algorithms and data analytics. It discusses the potential of ML and big data in optimizing crop yield and quality, promoting sustainable practices, enabling precision agriculture, and addressing crop diseases through advanced predictive modelling. The abstract emphasizes the potential of ML and big data to reshape the future of agriculture, fostering resilience and prosperity for farmers and stakeholders worldwide. The application of machine learning in the field of agriculture and its correlation with weather patterns delves into various ways in which machine learning techniques can be used to analyze and predict the impact of weather conditions on agricultural productivity. The paper discusses specific machine learning algorithms and models that have been successfully applied in this domain, providing insights into the potential benefits and challenges of integrating machine learning into agricultural practices.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.