Scientific Production and Real-World Applications of Data Science in Ecuador: A Bibliometric Perspective (1985–2023)

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Abstract This study presents a comprehensive bibliometric analysis of data science research in Ecuador from 1985 to 2023, aiming to trace its academic development and highlight real-world applications. Rather than evaluating the direct integration of data science into national infrastructure, the study focuses on how the Ecuadorian scientific community has contributed to this field’s evolution. The analysis is divided into two stages: moderate growth (1985–2015) and exponential expansion (2016–2023), with a strong correlation between scientific output and legislative reforms such as the Organic Law of Higher Education and the Prometheus Project. Additionally, selected case studies illustrate how data science has been applied in healthcare, education, and business through technologies such as IBM Watson, Microsoft Azure, and machine learning models. This dual approach – bibliometric and applicative – sheds light on Ecuador’s trajectory in scientific production and technological adoption, offering insight into the nation’s research landscape and future innovation potential.

Similar Papers
  • Research Article
  • Cite Count Icon 6
  • 10.2196/54990
Health Care Professionals and Data Scientists' Perspectives on a Machine Learning System to Anticipate and Manage the Risk of Decompensation From Patients With Heart Failure: Qualitative Interview Study.
  • Jan 20, 2025
  • Journal of medical Internet research
  • Joana Seringa + 4 more

Heart failure (HF) is a significant global health problem, affecting approximately 64.34 million people worldwide. The worsening of HF, also known as HF decompensation, is a major factor behind hospitalizations, contributing to substantial health care costs related to this condition. This study aimed to explore the perspectives of health care professionals and data scientists regarding the relevance, challenges, and potential benefits of using machine learning (ML) models to predict decompensation from patients with HF. A total of 13 individual, semistructured, qualitative interviews were conducted in Portugal between October 31, 2022, and June 23, 2023. Participants represented different health care specialties and were selected from different contexts and regions of the country to ensure a comprehensive understanding of the topic. Data saturation was determined as the point at which no new themes emerged from participants' perspectives, ensuring a sufficient sample size for analysis. The interviews were audio recorded, transcribed, and analyzed using MAXQDA (VERBI Software GmbH) through a reflexive thematic analysis. Two researchers (JS and AH) coded the interviews to ensure the consistency of the codes. Ethical approval was granted by the NOVA National School of Public Health ethics committee (CEENSP 14/2022), and informed consent was obtained from all participants. The participants recognized the potential benefits of ML models for early detection, risk stratification, and personalized care of patients with HF. The importance of selecting appropriate variables for model development, such as rapid weight gain and symptoms, was emphasized. The use of wearables for recording vital signs was considered necessary, although challenges related to adoption among older patients were identified. Risk stratification emerged as a crucial aspect, with the model needing to identify patients at high-, medium-, and low-risk levels. Participants emphasized the need for a response model involving health care professionals to validate ML-generated alerts and determine appropriate interventions. The study's findings highlight ML models' potential benefits and challenges for predicting HF decompensation. The relevance of ML models for improving patient outcomes, reducing health care costs, and promoting patient engagement in disease management is highlighted. Adequate variable selection, risk stratification, and response models were identified as essential components for the effective implementation of ML models in health care. In addition, the study identified technical, regulatory and ethical, and adoption and acceptance challenges that need to be overcome for the successful integration of ML models into clinical workflows. Interpretation of the findings suggests that future research should focus on more extensive and diverse samples, incorporate the patient perspective, and explore the impact of ML models on patient outcomes and personalized care in HF management. Incorporation of this study's findings into practice is expected to contribute to developing and implementing ML-based predictive models that positively impact HF management.

  • Conference Article
  • Cite Count Icon 528
  • 10.1145/3313831.3376219
Interpreting Interpretability: Understanding Data Scientists' Use of Interpretability Tools for Machine Learning
  • Apr 21, 2020
  • Harmanpreet Kaur + 5 more

Machine learning (ML) models are now routinely deployed in domains ranging from criminal justice to healthcare. With this newfound ubiquity, ML has moved beyond academia and grown into an engineering discipline. To that end, interpretability tools have been designed to help data scientists and machine learning practitioners better understand how ML models work. However, there has been little evaluation of the extent to which these tools achieve this goal. We study data scientists' use of two existing interpretability tools, the InterpretML implementation of GAMs and the SHAP Python package. We conduct a contextual inquiry (N=11) and a survey (N=197) of data scientists to observe how they use interpretability tools to uncover common issues that arise when building and evaluating ML models. Our results indicate that data scientists over-trust and misuse interpretability tools. Furthermore, few of our participants were able to accurately describe the visualizations output by these tools. We highlight qualitative themes for data scientists' mental models of interpretability tools. We conclude with implications for researchers and tool designers, and contextualize our findings in the social science literature.

  • Research Article
  • Cite Count Icon 4
  • 10.2298/yjor131112005k
Scientific research publication productivity in the areas of mathematics and physics in south eastern Europe
  • Jan 1, 2014
  • YUJOR
  • Djuro Kutlaca + 4 more

The paper presents the scientific publication productivity, registered in Web of Science (WoS) databases in two fields of science, Mathematics and Physics, for authors from countries of South East Europe (SEE).Using Revealed Publication Advantage (RPA) indicator calculated for SEE countries, policy makers could get insight into the scientific publication productivity of SEE countries, in these two scientific fields, compared with the world average. The scientific output in Mathematics and Physics from the SEE region represents majority of the overall scientific output in every particular country in this region. The scientific output in Mathematics and Physics from the SEE region is comparable with those of other research groups in the world. When analysing Web of Science publications by field of research, Mathematics represents 2.1% of the total worldwide scientific production, while Physics accounts for 8.8%, giving a total of 10.9% for Physics and Mathematics combined - over 1,547,187 publications in the period 2005-2010. In South East Europe, Mathematics is 3.5% of the total scientific production, while Physics is 9.6% - bringing the total for Physics and Mathematics to 13.1%.

  • Research Article
  • Cite Count Icon 4
  • 10.1145/3687267
Understanding the performance of machine learning models from data- to patient-level
  • Dec 11, 2024
  • Journal of Data and Information Quality
  • Maria Gabriela Valeriano + 3 more

Machine Learning (ML) models have the potential to support decision-making in healthcare by grasping complex patterns within data. However, decisions in this domain are sensitive and require active involvement of domain specialists with deep knowledge of the data. To address this task, clinicians need to understand how predictions are generated so they can provide feedback for model refinement. There is usually a gap in the communication between data scientists and domain specialists that needs to be addressed. Specifically, many ML studies are only concerned with presenting average accuracies over an entire dataset, losing valuable insights that can be obtained at a more fine-grained patient-level analysis of classification performance. In this article, we present a case study aimed at explaining the factors that contribute to specific predictions for individual patients. Our approach takes a data-centric perspective, focusing on the structure of the data and its correlation with ML model performance. We utilize the concept of Instance Hardness , which measures the level of difficulty an instance poses in being correctly classified. By selecting the hardest and easiest to classify instances, we analyze and contrast the distributions of specific input features and extract meta-features to describe each instance. Furthermore, we individually examine certain instances, offering valuable insights into why they offer challenges for classification, enabling a better understanding of both the successes and failures of the ML models. This opens up the possibility for discussions between data scientists and domain specialists, supporting collaborative decision-making.

  • Conference Article
  • Cite Count Icon 53
  • 10.1145/3394486.3403205
Vamsa: Automated Provenance Tracking in Data Science Scripts
  • Aug 20, 2020
  • Mohammad Hossein Namaki + 7 more

There has recently been a lot of ongoing research in the areas of fairness, bias and explainability of machine learning (ML) models due to the self-evident or regulatory requirements of various ML applications. We make the following observation: All of these approaches require a robust understanding of the relationship between ML models and the data used to train them. In this work, we introduce the ML provenance tracking problem: the fundamental idea is to automatically track which columns in a dataset have been used to derive the features/labels of an ML model. We discuss the challenges in capturing such information in the context of Python, the most common language used by data scientists. We then present Vamsa, a modular system that extracts provenance from Python scripts without requiring any changes to the users' code. Using 26K real data science scripts, we verify the effectiveness of Vamsa in terms of coverage, and performance. We also evaluate Vamsa's accuracy on a smaller subset of manually labeled data. Our analysis shows that Vamsa's precision and recall range from 90.4% to 99.1% and its latency is in the order of milliseconds for average size scripts. Drawing from our experience in deploying ML models in production, we also present an example in which Vamsa helps automatically identify models that are affected by data corruption issues.

  • Research Article
  • Cite Count Icon 21
  • 10.1021/acs.estlett.2c00949
Identification of Polymers with a Small Data Set of Mid-infrared Spectra: A Comparison between Machine Learning and Deep Learning Models
  • Jan 11, 2023
  • Environmental Science & Technology Letters
  • Xin Tian + 4 more

Identifying environmental polymers and microplastics is crucial for the scientific world, environmental agencies, and water authorities to estimate their environmental impact and increase efforts to decrease emissions. On the basis of different spectroscopy techniques, e.g., laser-directed infrared imaging and Raman spectroscopy, polymers can be observed and represented as spectroscopic signals. The latter can be further analyzed and classified by data science, in particular, machine learning (ML). Past studies applied a variety of ML models to identify polymers from small or large data sets. However, a comprehensive comparison of multiple models across different data set sizes is still needed, which is presented in this study. Furthermore, we also provide a practical data augmentation technique to generate synthetic samples when only a limited number of samples are available. Our results show that the ensemble ML model, compared to neural network models, takes the least training time to achieve the best performance, i.e., a classification accuracy of 99.5%. This study provides a generic framework for selecting ML models and boosting model performance to accurately identify polymers.

  • Conference Article
  • Cite Count Icon 10
  • 10.1145/3439961.3439971
Brazilian Data Scientists: Revealing their Challenges and Practices on Machine Learning Model Development
  • Dec 1, 2020
  • João Lucas Correia + 9 more

Data scientists often develop machine learning models to solve a variety of problems in the industry and academy. To build these models, these professionals usually perform activities that are also performed in the traditional software development lifecycle, such as eliciting and implementing requirements. One might argue that data scientists could rely on the engineering of traditional software development to build machine learning models. However, machine learning development presents certain characteristics, which may raise challenges that lead to the need for adopting new practices. The literature lacks in characterizing this knowledge from the perspective of the data scientists. In this paper, we characterize challenges and practices addressing the engineering of machine learning models that deserve attention from the research community. To this end, we performed a qualitative study with eight data scientists across five different companies having different levels of experience in developing machine learning models. Our findings suggest that: (i) data processing and feature engineering are the most challenging stages in the development of machine learning models; (ii) it is essential synergy between data scientists and domain experts in most of stages; and (iii) the development of machine learning models lacks the support of a well-engineered process.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 52
  • 10.1007/s11192-016-2126-8
Unbalanced international collaboration affects adversely the usefulness of countries’ scientific output as well as their technological and social impact
  • Jan 1, 2016
  • Scientometrics
  • Sonia R Zanotto + 2 more

The unbalanced international scientific collaboration as cause of misleading information on the country’s contribution to the scientific world output was analyzed. ESI Data Base (Thomson Reuters’ InCites), covering the scientific production of 217 active countries in the period 2010–2014 was used. International collaboration implicates in a high percentage (33.1 %) of double-counted world articles, thus impacting qualitative data as citations, impact and impact relative to word. The countries were divided into three groups, according to their individual contribution to the world publications: Group I (24 countries, at least 1 %) representing 83.9 % of the total double-counted world articles. Group II (40 countries, 0.1–0.99 % each). Group III, 153 countries (70.5 %) with <0.1 % and altogether 1.9 % of the world. Qualitative characteristics of each group were also analyzed: percentage of the country’s GNP applied in R&D, proportion of Scientists and Engineers per million inhabitants and Human Development Index. Average international collaboration were: Group I, 43.0 %; Group II, 55.8 % and Group III, 85.2 %. We concluded that very high and unbalanced international collaboration, as presented by many countries, misrepresent the importance of their scientific production, technological and social outputs. Furthermore, it jeopardizes qualitative outputs of the countries themselves, artificially increasing their scientific impact, affecting all fields and therefore, the whole world. The data confirm that when dealing with the qualitative contribution of countries, it is necessary to take in consideration the level of international cooperation because, as seen here, it can and in fact it does create false impression of the real contribution of countries.

  • Research Article
  • Cite Count Icon 3
  • 10.3390/su17156811
Towards Sustainable Construction: Experimental and Machine Learning-Based Analysis of Wastewater-Integrated Concrete Pavers
  • Jul 27, 2025
  • Sustainability
  • Nosheen Blouch + 5 more

The escalating global demand for fresh water, driven by urbanization and industrial growth, underscores the need for sustainable water management, particularly in the water-intensive construction sector. Although prior studies have primarily concentrated on treated wastewater, the practical viability of utilizing untreated wastewater has not been thoroughly investigated—especially in developing nations where treatment expenses frequently impede actual implementation, even for non-structural uses. While prior research has focused on treated wastewater, the potential of untreated or partially treated wastewater from diverse industrial sources remains underexplored. This study investigates the feasibility of incorporating wastewater from textile, sugar mill, service station, sewage, and fertilizer industries into concrete paver block production. The novelty lies in a dual approach, combining experimental analysis with XGBoost-based machine learning (ML) models to predict the impact of key physicochemical parameters—such as Biochemical Oxygen Demand (BOD), Chemical Oxygen Demand (COD), and Hardness—on mechanical properties like compressive strength (CS), water absorption (WA), ultrasonic pulse velocity (UPV), and dynamic modulus of elasticity (DME). The ML models showed high predictive accuracy for CS (R2 = 0.92) and UPV (R2 = 0.97 direct, 0.99 indirect), aligning closely with experimental data. Notably, concrete pavers produced with textile (CP-TXW) and sugar mill wastewater (CP-SUW) attained 28-day compressive strengths of 47.95 MPa and exceeding 48 MPa, respectively, conforming to ASTM C936 standards and demonstrating the potential to substitute fresh water for non-structural applications. These findings demonstrate the viability of using untreated wastewater in concrete production with minimal treatment, offering a cost-effective, sustainable solution that reduces fresh water dependency while supporting environmentally responsible construction practices aligned with SDG 6 (Clean Water and Sanitation) and SDG 12 (Responsible Consumption and Production). Additionally, the model serves as a practical screening tool for identifying and prioritizing viable wastewater sources in concrete production, complementing mandatory laboratory testing in industrial applications.

  • Research Article
  • Cite Count Icon 9
  • 10.1007/s11135-020-01063-w
Comparing the efficiency of countries to assimilate and apply research investment
  • Oct 24, 2020
  • Quality &amp; Quantity
  • Barbara S Lancho-Barrantes + 2 more

One of the main purposes of the countries’ economic expenditure in research is to achieve higher levels of scientific results which could impact in better living standards for society. Besides, research efficiency could be considered as a way of getting the largest number of scientific results with the minimum amount of financial investment. To assess the repercussion of research investment in scientific production and measure scientific efficiency we selected a sample of 19 countries representing each region of the world. We used 17 years of social and economic indicators from the UNESCO database, and scientific data from Scopus and SciVal. We introduce two new notions of economic efficiency for national research systems: one based on the capability for assimilating R&D investment, and another based on overall productivity and impact. Through a causal model based on multiple linear regression on panel data, we model assimilation efficiency and confirm that the scientific production of a country can be explained in 98% through the GERD expressed as a percentage of GDP, and the number of Academic and Research Institutions that concentrate at least 50% of the national scientific production. And to measure countries efficiency on productivity and impact we introduce four indicators that quantify the relation between economic inputs and scientific outputs (dollars per paper (DPP), dollars per citation (DPC), citations received per 1,000 dollars invested (CIREDI) and papers produced per 1,000 dollars invested (PAPPDI)).

  • Research Article
  • Cite Count Icon 90
  • 10.3390/arm92050037
Secure and Transparent Lung and Colon Cancer Classification Using Blockchain and Microsoft Azure
  • Oct 17, 2024
  • Advances in Respiratory Medicine
  • Entesar Hamed I Eliwa + 3 more

HighlightsThe study presents a novel framework for remote consultation and lung and colon cancer classification, leveraging blockchain technology and Microsoft Azure cloud services to ensure data privacy and security. The proposed framework achieves an impressive accuracy of 100% for lung and colon cancer classification using advanced machine learning models, demonstrating its potential to improve diagnostic accuracy and streamline cancer care.What are the main findings?Effective Cancer Classification: The framework effectively classifies lung and colon cancer using state-of-the-art machine learning models, achieving high accuracy, precision, recall, and F1-score.Enhanced Data Security: Blockchain technology and Microsoft Azure cloud services provide a secure and transparent environment for data storage, access, and sharing, ensuring patient privacy and data integrity.What is the implication of the main finding?Improved Diagnostic Efficiency: The proposed framework has the potential to significantly improve the efficiency of lung and colon cancer diagnosis by enabling remote consultations and providing accurate and timely results.Enhanced Patient Outcomes: By improving diagnostic accuracy and streamlining the cancer care process, this framework can contribute to better patient outcomes and reduce the overall burden of lung and colon cancers.Background: The global healthcare system faces challenges in diagnosing and managing lung and colon cancers, which are significant health burdens. Traditional diagnostic methods are inefficient and prone to errors, while data privacy and security concerns persist. Objective: This study aims to develop a secure and transparent framework for remote consultation and classification of lung and colon cancer, leveraging blockchain technology and Microsoft Azure cloud services. Dataset and Features: The framework utilizes the LC25000 dataset, containing 25,000 histopathological images, for training and evaluating advanced machine learning models. Key features include secure data upload, anonymization, encryption, and controlled access via blockchain and Azure services. Methods: The proposed framework integrates Microsoft Azure’s cloud services with a permissioned blockchain network. Patients upload CT scans through a mobile app, which are then preprocessed, anonymized, and stored securely in Azure Blob Storage. Blockchain smart contracts manage data access, ensuring only authorized specialists can retrieve and analyze the scans. Azure Machine Learning is used to train and deploy state-of-the-art machine learning models for cancer classification. Evaluation Metrics: The framework’s performance is evaluated using metrics such as accuracy, precision, recall, and F1-score, demonstrating the effectiveness of the integrated approach in enhancing diagnostic accuracy and data security. Results: The proposed framework achieves an impressive accuracy of 100% for lung and colon cancer classification using DenseNet, ResNet50, and MobileNet models with different split ratios (70–30, 80–20, 90–10). The F1-score and k-fold cross-validation accuracy (5-fold and 10-fold) also demonstrate exceptional performance, with values exceeding 99.9%. Real-time notifications and secure remote consultations enhance the efficiency and transparency of the diagnostic process, contributing to better patient outcomes and streamlined cancer care management.

  • Research Article
  • Cite Count Icon 16
  • 10.1016/j.ijinfomgt.2022.102566
Beyond effective use: Integrating wise reasoning in machine learning development
  • Aug 24, 2022
  • International Journal of Information Management
  • Morteza Namvar + 3 more

Beyond effective use: Integrating wise reasoning in machine learning development

  • Research Article
  • 10.1212/wnl.0000000000203275
Using Predictive Models to Reduce Heterogeneity in Alzheimer’s Disease Clinical Trials (S26.005)
  • Apr 25, 2023
  • Neurology
  • Ali Ezzati + 3 more

1. To investigate the proportion of individuals showing meaningful cognitive decline (MCD) in the placebo arm of Alzheimer’s disease (AD) trials during trials. 2. To evaluate data-driven predictive models for identifying participants who will show MCD if given placebo.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 6
  • 10.5771/0935-9915-2006-2-184
Work Styles, Attitudes, and Productivity of Scientists in the Netherlands and the United Kingdom: A Comparison by Gender
  • Jan 1, 2006
  • management revu
  • Dara L Woerdeman + 1 more

With scientific research growing increasingly multidisciplinary in nature, team playing and communication skills have become critical in the achievement of scientific breakthroughs. This study adds valuable evidence to the oft-cited puzzle in the sciences by comparing the work styles, attitudes, and productivity of female and male scientists. The application of t-test analysis to data on scientists from the United Kingdom and the Netherlands indicates that women report relatively higher abilities in communication skills and teamwork than men. Also, both female and male scientists report difficulties in balancing work and family responsibilities, but proportionately more women than men rely on outside sources of childcare. A separate distribution analysis of academic productivity demonstrates substantial overlap between men and women in the number of scientific publications per year. These results add support to mounting pressure for policy reforms that effectively support the retention and advancement of women in the sciences. Key words: Skills, Workplace Diversity, Technical Innovation, Scientific Output, Women in Science, Science in Europe 1. Introduction Creativity leads to innovations, higher productivity, and ultimately, to economic growth. Human factors govern scientific innovation, with creativity across industries as an important factor in the stimulation of innovation in all its forms.1 Innovation, in turn, contributes to competitiveness and economic growth. A variety of ways of thinking and backgrounds are needed for an environment in which fruitful ideas can prosper. A broader participation in the scientific workforce is the surest strategy for bringing the best ideas, highest creativity, and greatest innovation to the science, technology, engineering, and mathematics enterprise and the service of the nation, (CEOSE 2004: xv). Cultural factors can also have a direct impact on scientific output and productivity, and cultural differences between the United States and Europe have been linked to differences in scientific productivity. In comparison with Americans, Europeans are notably less inclined to risk failure.2 The same is true for companies in terms of their willingness to be bold and experimental, and their general attitudes toward risk. However, the increasing demand on personnel to continually adapt their skills to the requirements of the labor market has provided impetus for many on both continents to acquire new knowledge and skills. European institutions have become the source of a growth in the number of high quality publications, and a growing amount of basic research is originating from European laboratories (TFFAI 2005). Questions about how to diversify the scientific workforce have gained attention in recent years in academic circles, policy discourse, and the media.3 A large literature, based mostly on American statistics, reveals numerous factors that influence women in scientific and technical disciplines, and why far fewer reach high positions.4 European countries exhibit the same pattern, as women remain under-represented in Europe's professional scientific employment across the business sector and academia (European Commission 2005). The low female representation comes at a cost because women bring a distinct set of skills, work styles, and attitudes to the table that can potentially affect productivity at all levels. The lack of consensus on the puzzle in science leaves open the question of whether gender differences in productivity do exist, and if so, the path by which these gender differences occur.5 To address this question, we conduct tests of statistical differences between male and female scientists in work styles, attitudes toward work, and productivity. The work is two-fold. In the first part of the study, we apply t-test analysis to samples of scientists from two western European countries known for their high indicators of scientific output: the United Kingdom and the Netherlands. …

  • Research Article
  • Cite Count Icon 7
  • 10.1016/j.bpsgos.2024.100397
The Transition From Homogeneous to Heterogeneous Machine Learning in Neuropsychiatric Research
  • Sep 26, 2024
  • Biological Psychiatry Global Open Science
  • Qingyu Zhao + 6 more

The Transition From Homogeneous to Heterogeneous Machine Learning in Neuropsychiatric Research

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant