Related Topics
Articles published on Jupyter Notebook
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
1422 Search results
Sort by Recency
- New
- Research Article
2
- 10.1016/j.jss.2025.112758
- May 1, 2026
- Journal of Systems and Software
- Md Saeed Siddik + 2 more
• This research provides the first comprehensive systematic literature review on software engineering research specifically targeting Jupyter notebooks, identifying 199 primary studies published up to September 2025 and categorizing them into 11 core software engineering topics. • This research reveals that a large portion of the studies have been published outside traditional software engineering venues, with Human-Computer Interaction conferences like ACM Conference on Human Factors in Computing Systems (CHI) being the top publishing venues, highlighting the interdisciplinary nature of Jupyter Notebook research. • This research identifies a reusability gap in existing research, showing that only 82 out of 199 studies offer usable replication packages, and most are hosted on GitHub instead of permanent repositories, which violates open science best practices. • This research identifies that notebook-specific solutions for software engineering issues such as testing, refactoring, and documentation are relatively underexplored. Future directions include resolving duplicated execution numbers, refactoring inter-notebook clones, and generating grouped documentation for coherent-code cells are future directions derived from our study. • This research proposes the integration of modern AI-based solutions into Jupyter notebooks to support various software engineering topics, including code search and code generation. Additionally, future research should leverage advanced AI techniques (e.g., large language models), to improve conversational AI-powered assistants for automated code generation by multi-step workflow automation in data science notebooks. • Although the paper exceeds the recommended length due to the inclusion of detailed tables, figures, and categorized analyses (covering 11 topics and 21 subtopics), we believe that this extended content is essential for clearly and completely reporting our findings. As the first systematic literature review in this domain, we have carefully structured the paper to ensure readability. We believe the length is justified by the value and breadth of this paper’s contributions. Context : Jupyter Notebook has emerged as a versatile tool that transforms how researchers, developers, and data scientists conduct and communicate their work. As the adoption of Jupyter notebooks continues to rise, so does the interest from the software engineering research community in improving the software engineering practices for Jupyter notebooks. Objective : The purpose of this study is to analyze trends, gaps, and methodologies used in software engineering research on Jupyter notebooks. Method : We selected 199 relevant publications up to September 2025, following established systematic literature review guidelines. We explored publication trends, categorized them based on software engineering topics, and reported findings based on those topics. Results : The most popular venues for publishing software engineering research on Jupyter notebooks are related to human-computer interaction instead of traditional software engineering venues. Researchers have addressed a wide range of software engineering topics on notebooks, such as code reuse, readability, and execution environment. Although reusability is one of the research topics for Jupyter notebooks, only 82 of the 199 studies can be reused based on their provided URLs. Additionally, most replication packages are not hosted on permanent repositories for long-term availability and adherence to open science principles. Conclusion : Solutions specific to notebooks for software engineering issues, including testing, refactoring, and documentation, are underexplored. Future research opportunities exist in automatic testing frameworks, refactoring clones between notebooks, and generating group documentation for coherent code cells.
- New
- Research Article
- 10.55041/ijsrem60886
- Apr 22, 2026
- INTERNATIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT
- Dr M Kalpana Devi Bai + 4 more
Abstract - Consumer review systems on e-commerce platforms suffer from critical ranking deficiencies: aggregate star ratings ignore text quality, raw helpfulness vote counts introduce temporal popularity bias and vote sparsity in newly listed products renders rank orderings statistically unreliable. This paper presents a domain-agnostic, time- aware trustworthy review ranking framework whose three-component pipeline can be applied to any structured review dataset containing text, star ratings, helpfulness votes and timestamps. The framework integrates: (i) Wilson Lower Bound (WLB) confidence scoring to quantify community trust under sparse vote conditions; (ii) a Natural Language Processing (NLP) quality module employing VADER sentiment analysis, review length normalization and keyword detection; and (iii) a quartile-driven time-decay weighting scheme that privileges recent reviews without discarding historically informative ones. All three components are fused into a weighted hybrid score and implemented in a reproducible Google Colab / Jupyter Notebook environment. Validation is conducted on the publicly available Amazon Kindle Store review corpus (960,000 reviews). Quantitative evaluation using NDCG@10 (0.847) and Precision@10 (0.80) demonstrates that the proposed hybrid framework outperforms all single- dimensional baselines by up to 65.4%, while requiring no model training and running to completion in under two minutes on standard hardware. Keywords — Review Ranking, Helpfulness Prediction, Wilson Lower Bound, VADER Sentiment Analysis, Time- Decay Weighting, NDCG, Precision@K, Amazon Kindle, NLP, E-commerce, Google Colab
- New
- Research Article
- 10.5194/gmd-19-3075-2026
- Apr 21, 2026
- Geoscientific Model Development
- Cléa Denamiel + 7 more
Abstract. Landslide-Tsurrogate v1.0 is an open-source Python and MATLAB tool that helps scientists quickly estimate the tsunami hazards generated by submarine landslides. Instead of running thousands of heavy deterministic numerical simulations, the software builds surrogate models that reproduce the main results with a fraction of the computational cost. The method relies on a mathematical approach called generalized polynomial chaos expansion, which efficiently explores how uncertain landslide parameters affect tsunami generation. Users can perform sensitivity analyses, identify the most influential parameters, and quantify the variability of possible outcomes. The tool includes a Jupyter Notebook User Manual and interactive MATLAB and Jupyter Notebook interfaces, making it easy to understand the methodology, set up the surrogate simulations and visualize the results. The Landslide-Tsurrogate v1.0 model's performance is demonstrated through a real-world test case involving five zones in Mayotte (France). For this application, the surrogate models achieve convergence with only 135 deterministic simulations per zone and produce probabilistic results in less than 2 s within the user-friendly interfaces used on a basic laptop, demonstrating the computational efficiency of the approach. Beyond this example, the framework can be applied to any coastal region prone to submarine landslides. By combining physical modeling, statistical analysis, and user-friendly design, Landslide-Tsurrogate v1.0 enables faster and more transparent probabilistic tsunami hazard assessments.
- New
- Research Article
- 10.33619/2414-2948/125/65
- Apr 15, 2026
- Bulletin of Science and Practice
- B Biymursaeva + 2 more
The article discusses the development of a methodological system for teaching algebra using the capabilities of the Python programming language and its specialized libraries. It is shown how the integration of information and communication technologies into the educational process contributes to improving the efficiency of mastering algebraic material, the formation of competencies and research skills of students. Methodological approaches are presented, structured into a system that includes theoretical training, practical tasks, project activities, and visualization of algebraic objects using the NumPy, SymPy, and Matplotlib libraries. The article discusses the modernization of algebra teaching methods in the context of the digital transformation of education. The author substantiates the transition from traditional computational methods to the use of Python programming language libraries (SymPy, NumPy, Matplotlib) as an effective tool for developing mathematical competencies. The structure of the methodological system is described, including symbolic computations, matrix modeling, and cognitive visualization. Practical examples of implementing educational scenarios in the Jupyter Notebook environment are provided. The article presents an innovative methodological system for teaching algebra based on the integration of the Python language into the educational process. The authors substantiates the use of SymPy, NumPy, and Matplotlib libraries as tools for developing mathematical competencies. The transition from routine calculations to research activities in the interactive Jupyter Notebook environment is considered. Special attention is paid to the implementation of the cognitive visualization principle and the algorithmization of learning. The proposed system optimizes students' cognitive load, shifting the focus to the substantive analysis of mathematical structures. The work develops the author’s ideas on interactivity and the aestheticization of education. The described methodology promotes a deep understanding of abstract concepts and the development of students' algorithmic thinking in the context of the digital transformation of higher education.
- Research Article
- 10.3390/mca31020060
- Apr 11, 2026
- Mathematical and Computational Applications
- Pablo García-González + 5 more
Background: The aim of the present study was to identify the discipline with the greatest predictive value for overall performance in Olympic-distance triathlon. Methods: Data were extracted from the API (Application Programming Interface) service on the World Triathlon website by signing up for the free service. A custom Python code was written to perform different data collection operations. General statistical analyses and machine learning analyses were performed by creating a Jupyter Notebook file. TensorFlow and PyTorch libraries were used for machine learning analysis. Results: Fifty percent of the employed models identified cycling as the most predictive discipline for race success for both sexes, whereas 33% selected running as the determining discipline. To achieve a podium finish, approximately 78% of the models classified running as the most predictive discipline for males, and approximately 56% of the models did so for females. For finishes between fourth and tenth place, approximately 78% of the models proposed running as the most predictive discipline for both sexes. Swimming was never identified as the most predictive discipline by the majority of models for any group or sex. Conclusion: The most predictive discipline in Olympic triathlon depends on the athlete’s sex and competitive level. Nonetheless, running remains the most consistently predictive discipline, whereas swimming rarely acts as a performance differentiator.
- Research Article
- 10.1080/17486025.2026.2653876
- Apr 5, 2026
- Geomechanics and Geoengineering
- Dhawal Kumar + 1 more
ABSTRACT This study explores the predictive performance of two ensemble learning models, Adaptive Boosting (AdaBoost) and Gradient Boosting (GBoost), optimised using Bayesian Optimization (BO), for estimating the bearing pressure (P) of spread foundations on clayey soil. A dataset comprising plate load tests (PLTs) with 576 data points across 16 literatures was utilised for model development. Input parameters included settlement (S), undrained cohesion (Cu), plate width (B), footing embedment depth (Df), depth to groundwater table (Dw), and soil unit weight (γt). GBoost-BOA outperformed AdaBoost-BOA and standalone models, and Bayesian Optimization played a crucial role in mitigating overfitting and enhancing model generalisability by systematically selecting optimal hyperparameters (number of estimators, maximum depth, a learning rate and maximum features). The implementation was carried out using Python in a Jupyter Notebook environment. To further address overfitting, a diverse dataset was used to capture complex patterns and variations, ensuring reliable and transferable predictions across varied soil conditions. The consistency of performance metrics across training and testing datasets highlights the model’s robustness and adaptability. The training and testing of diverse dataset performances of the optimal hybrid model is benchmarked against an Artificial Neural Network (ANN) model used in previous study.
- Research Article
- 10.59231/sari7919
- Apr 1, 2026
- Shodh Sari-An International Multidisciplinary Journal
- Gulshan Kumar
Abstract The effect of growth of technology can be seen through the rapid increase in the use of AI in every sphere of life. AI has become widely integrated into almost every domain of human activity, including the field of education, where its presence has increased significantly. Consequently, evaluating the level of student satisfaction towards AI assisted learning has emerged as an important area of research. For the study of this paper a dataset is used. The title of the dataset is “AI Assistant Usage in Student Life.” This dataset has the record of 10,000 students which is showing the AI interaction session with various attributes. The objective of this paper is to predict Student satisfaction by using machine learning regression techniques. The dataset was processed using Python in Jupyter notebook. The dataset was divided into two parts i.e. training and testing sets in 75:25 ratio. On the dataset four regression algorithms were applied. This research paper shows that the regression-based machine learning models can analyze and predict student satisfaction effectively. The overall study supports the growing role of AI in education because the result shows that most of the students are satisfied after using AI assistants. This paper also tries to provide comparative study of all the regression models applied on the dataset. For this purpose, a confusion matrix has been generated which is divided into two categories labeled as satisfied and not satisfied. This study helps in making decisions about the use of AI in education.
- Research Article
- 10.1016/j.dib.2026.112586
- Apr 1, 2026
- Data in brief
- James Weatherhead + 2 more
Hospitals and vendors now run HIPAA-compliant Business Associate Agreement (BAA) large language models (LLMs) for clinical work. These systems do not use input data for further training, so clinicians can enter Protected Health Information (PHI) into them. LLMs are trained on a fixed corpus with a historical cutoff, therefore their answers often need to be supplemented with more recent clinical evidence from external sources such as live web search or other tools that are often not covered by a BAA. This creates a "safe handoff" point where a clinician's PHI-containing query must be transformed into a HIPAA Safe Harbor compliant version before leaving the protected environment. However, publicly shareable datasets for this setting are scarce; this article describes PHI-rich clinician-style questions paired with HIPAA Safe Harbor annotations at the point where an external tool is called. Existing de-identification benchmarks are typically built from long electronic health record narratives such as discharge summaries and clinic notes, rather than from short, compressed search-style queries such as those that might be used in chat-based clinical LLM interfaces. ASQ-PHI (Adversarial Synthetic Queries for Protected Health Information de-identification) is a fully synthetic benchmark dataset designed for this safe handoff setting; no real patient data, electronic health records, or protected health information were accessed, used, or referenced during dataset creation. It contains 1051 single-turn clinical search queries that are designed to resemble prompts that clinicians might enter into HIPAA-compliant LLMs. Each record uses machine-parsable delimiters to separate the free text query from PHI annotations, which are provided as one JSON object per element specifying the HIPAA Safe Harbor identifier category and exact string value. The corpus includes 832 PHI-positive queries (79.2%) and 219 hard negatives (20.8%) engineered to mimic PHI-like syntax while containing only non-identifying clinical information such as ages under 90 years, diagnoses, medications, and symptoms. Across the dataset, there are 2973 PHI elements labeled from 13 textual HIPAA Safe Harbor identifier types that can be represented as short alphanumeric strings in single-line clinical questions, supporting the measurement of both PHI removal and over-redaction on PHI-free queries. All queries were generated with an adversarial few-shot prompting pipeline using Azure OpenAI GPT-4o. The associated Mendeley Data repository provides the complete dataset file, a Jupyter notebook that implements the generation pipeline, summary statistics, baseline metrics for a commercial PHI detection service, and six figures that describe the dataset. ASQ-PHI is released under an MIT license.
- Research Article
2
- 10.1038/s41592-026-03029-6
- Apr 1, 2026
- Nature methods
- Samuel Alber + 5 more
Modern biology increasingly relies on complex, high-dimensional datasets such as single-cell RNA sequencing (scRNA-seq), which present a vast space of potential hypotheses. Systematically exploring this space is often impractical, as scRNA-seq analyses are time-consuming and require substantial computational and domain expertise. To address this challenge, we introduce CellVoyager, an AI agent built on large language models that autonomously generates and implements scRNA-seq analyses within a Jupyter notebook environment. We evaluate CellVoyager on CellBench, a benchmark of 76 published scRNA-seq studies, where it outperforms GPT-4o and o3-mini by up to 23% in predicting which analyses authors ultimately conducted, given only the papers' background sections. Across three in-depth case studies, CellVoyager generated novel findings in COVID-19, cell-cell communication and aging that experts consistently rated as creative and scientifically sound. These results demonstrate CellVoyager's potential to accelerate computational biology and uncover missing insights by autonomously analyzing biological data at scale.
- Research Article
- 10.3390/earth7020053
- Mar 21, 2026
- Earth
- Yves Hategekimana + 7 more
This study presents the development of a Python-based flood-susceptibility risk-mapping tool, implemented in Jupyter Notebook, applied to Rwanda. A Flood Susceptibility Index (FSI) was developed by integrating 20 causal factors associated with flood occurrences, including topographic, hydrological, geological, and anthropogenic variables. Logistic regression, and Variance Inflation Factor were implemented in Python using libraries such as Numpy, Arcpy, traceback, scipy, Pandas, Seaborn, and statsmodel to assign weights to each factor, and to address multicollinearity. The model was validated against flood extent data derived from Sentinel-1 satellite imagery for the major historical flood event that occurred from 2014 to 2024, ensuring spatial consistency and predictive reliability. To project future flood susceptibility for 2030, precipitation data from the Institut Pierre Simon Laplace Coupled Model, version 5A, Medium Resolution (IPSL-CM5A-MR) climate model under the Representative Concentration Pathway 8.5 (RCP 8.5) scenario were utilized. The resulting FSI was classified into five susceptibility levels, from very low to very high, and visualized using Python’s geospatial and plotting tools within Jupyter Notebook in ArcGIS Pro 3.5. It indicates that areas with high amounts of rainfall, and proximity to wetlands and rivers reveal the highest flood risk. The automated and reproducible approach offered by Python enhances transparency and scalability, providing a decision-support tool for disaster risk reduction and climate adaptation planning in Rwanda.
- Research Article
1
- 10.3390/app16062871
- Mar 17, 2026
- Applied Sciences
- Pablo García-González + 5 more
Background: The aim of this study was to determine the optimal discipline position in the overall result of Olympic-distance triathlon. Methods: Data were extracted for free from the API (Application Programming Interface) service on the World Triathlon website and collected using a custom Python code. Statistical and machine learning analyses were employed within a Jupyter Notebook file. Linear and polynomial regressions were calculated between the overall race position and final positions in each discipline. Descriptive statistics and machine learning analyses were computed to identify the average position and most likely average position required in each discipline, respectively. A heatmap correlation analysis was conducted between the best overall triathletes and the best discipline triathletes. Differences between the two sub-databases were assessed using the student’s t-test. Results: Across all disciplines, the average position required in each segment remains consistently better than 13th place. The heat map shows a very small, negative correlation between the best time in each discipline and the overall best race time (p-values < 0.001). The student’s t-test establishes significant differences for all disciplines and overall race time (p-values < 0.001). Conclusions: Consistently high-level performance across all disciplines is essential for ensuring a podium finish or race victory in an Olympic triathlon. Achieving the best time in each discipline is not required to contend for victory, although running appears to be a strong predictor of overall race outcome.
- Research Article
- 10.1016/j.dib.2026.112682
- Mar 16, 2026
- Data in Brief
- Nisrean Thalji + 1 more
The used car market in Jordan is a very important component of national transportation and economic systems that reflect consumer preferences, import policies, and affordability constraints. This article presents the Jordanian Used Cars Dataset (JUCars-2024), records of used car advertisements collected throughout 2024. The dataset contains detailed information about vehicles offered for sale, their prices, technical features, and location-related features. Online car marketplaces in Jordan were selected as prominent online car markets and used as a source of data using automated web scraping implemented in Python in a Jupyter Notebook environment. After collection, the data were preprocessed systematically, including removal of duplicates. The resulting data constitute a structured dataset, which can be reused in machine learning, economic efforts, and policy-related studies. JUCars-2024 provides a publicly available, reproducible tool, which assists in price forecasting, market classification, modeling of the Jordanian used car market.
- Research Article
- 10.1103/v17d-68v7
- Mar 11, 2026
- Physical Review A
- Anonymous
This repository contains the experimental data and corresponding processing code for the paper Direct energy dissipation measurements for a driven superfluid via the harmonic-potential theorem arXiv:2508.15626. The raw imaging data is stored in the folder "DnD_data". The jupyter notebook 'DnD_public.ipynb' in "DnD" applies functions from the python files 'DnD_utils.py' and 'DnD.py' to proces the raw data, store it in separate datafiles linked to the separate stirrer strengths, and recreate the experimental data-plots from the publication using those datafiles. Pre-processed datafiles are already present in the folder "DnD", removing the need to download the data in "DnD_data".
- Research Article
- 10.59075/n2ry5315
- Mar 6, 2026
- The Critical Review of Social Sciences Studies
- Mir Rahib Hussain Talpur + 3 more
Artificial intelligence (AI) is transforming education by intelligent tutoring, predictive analytics, intelligent sequencing of content, intelligent development of feedback and less famous but more recent generations of AI chatbots, article summarization, document writing, and expert simulators. However, the problem of educational deployment is not just a technical issue. It is a socio-technical design dilemma and entails pedagogy, teacher workload, equity, data control, scholarly integrity, accessibility and social trust. In this paper, a more advanced review-and-framework paper on the AI in education with a clear journal orientation is developed. Instead of purporting a live field trial, the article asserts a concept scoping synthesis of highly instrumental literature and policy reportages as well as a deployable reference architecture, algorithmic procedures, notebook-based prototyping commodiatives, and explicitly outlined illustrative analytics. There are four contributions of the manuscript. To begin with, it synthesizes the recent work around the areas of personalization, assessment, learning analytics, generative support, and governance. Second, it suggests a multi-level structure that links the learner information, instructional regulations, machine learning algorithms, retrieval enhanced generation, instructor control, and monitoring. Third, it demonstrates the adaptive recommendation and retrieval-grounded feedback generation pseudocode and provides evidence in the form of mock Jupyter notebooks, diagrams, and graphs with which institutions can examine such systems. Fourth, it is converting up-to-date ethical and policy discussions into an action plan in terms of a governance checklist and a roadmap of gradual deployment. The main point is that educational AI of high worth will be created not through the replacement of educators but through the systemate engineering of human-AI co-operation where the intent of instruction, transparency, and accountability can be seen throughout the system lifecycle.
- Research Article
- 10.3390/computers15030157
- Mar 3, 2026
- Computers
- Ahmed M Hasan + 3 more
A seatbelt is an essential aspect of safety in road traffic accidents. Although most traffic regulations enforce drivers and passengers to wear and fasten the seatbelt manually, AI-based techniques have been introduced for monitoring to improve safety standards. In this study, a new approach is proposed to address the monitoring problem of seatbelts. Deep learning (DL) classification based on adaptive Siamese Neural Network (SNN) has been developed utilizing the K-fold method for feature verification. The proposed adaptive K-Fold-based SNN approach utilizes a binary seatbelt dataset, with positive and negative classes, to verify the status of the seatbelt. The network involves sharing a convolutional feature extractor, followed by a distinct-based similarity function. To enhance model reliability, 5-fold cross validation is applied (k = 5), splitting the dataset into 5 subsets, where the model is trained on four sets and validated on the fifth one. The model was trained using binary cross entropy loss, Adam optimization, and performance metrics such as accuracy, precision, recall, and F1 score. The seatbelt dataset is basically designed for object detection models. In this work, we used a dataset in the verification model and achieved high-performance metrics. The model is implemented using a Python-based Jupyter Notebook 7.5.1. It achieved a high performance in seatbelt verification with an average Accuracy = 0.9989, average Precision = 0.9988, average Recall = 0.9990, and average F1 Score = 0.9989. The proposed adaptive K-Fold SNN model can ensure reliability and reduce the risk of over fitting.
- Research Article
- 10.62527/joiv.10.1.5114
- Mar 2, 2026
- JOIV : International Journal on Informatics Visualization
- Shahla Abdulqader + 3 more
Pneumonia is a respiratory disorder that involves inflammation of the air sacs of the lungs and is normally diagnosed through imaging of the chest using X-rays. This research proposes a deep learning-based classification system to classify chest X-ray images and identify pneumonia. The methodological framework adopted is the Cross-Industry Standard Process for Data Mining (CRISP-DM), in which experiments are run in a Jupyter Notebook using five-fold cross-validation. The data is made up of anterior-posterior chest X-ray photographs of children aged between one and five years of age at Guangzhou Women and Children Medical Center. A number of convolutional neural network models are assessed and compared with the proposed Improved DenseNet (ImDenseNet), including DenseNet, VGG16, and InceptionNet. According to experimental findings, ImDenseNet achieves 96.15% accuracy, 92.86% precision, and 92.94% recall, which are significantly better than those of the base models. The results show that the proposed architectural improvements improve feature discrimination and classification performance, and ImDenseNet is a credible solution for detecting pneumonia using chest X-ray images. Future research could focus on expanding ImDenseNet into a multi-class classifier to differentiate between bacterial and viral pneumonia. Pruning and quantization techniques can also be used to optimize the model for lightweight deployment on the edge or in clinical devices. Also, by incorporating explainable artificial intelligence (XAI) algorithms, clinical interpretability and confidence may be improved.
- Research Article
- 10.1111/cyt.70036
- Mar 1, 2026
- Cytopathology : official journal of the British Society for Clinical Cytology
- Nupur Pradhan + 2 more
This study applied an ensemble learning model combining six transfer learning architectures to detect malignancy in effusion cytology. In this current study, we had a total of 110 cases of effusion cytology consisting of 59 benign and 51 malignant cases. We took a total of 755 representative microphotographs from the Papanicolaou's stained smear. The ensemble learning model consists of DenseNet121, Xception, ResNet50, MobileNetV2, InceptionV3, and VGG16 with a soft voting technique. After initial feature extraction, fine-tuning was performed by unfreezing the final layers of each backbone. The neural network was implemented in Jupyter Notebook. The model achieved sensitivity, specificity, accuracy, precision, negative predictive value, F1 score, and AUROC of 0.92, 0.89, 0.90, 0.89, 0.92, 0.91, and 0.96, respectively. To our knowledge, this is the first study applying a six-model ensemble deep learning approach in effusion cytology. The combined transfer learning framework demonstrated excellent diagnostic performance and may serve as a future tool for carcinoma detection in effusion cytology.
- Research Article
- 10.24002/biota.v11i1.12898
- Feb 27, 2026
- Biota : Jurnal Ilmiah Ilmu-Ilmu Hayati
- Monika Ruwaimana + 7 more
Vegetation mapping is essential for monitoring conservation efforts in national parks and can be performed remotely using remote sensing and GIS technologies. However, the process is often complex and requires technical expertise. This study explores the use of AI, specifically ChatGPT, to simplify and support vegetation mapping workflows. We monitored monthly vegetation changes in Merapi Mountain National Park (TNGM) from 2017 to 2023 using the Normalized Difference Vegetation Index (NDVI) derived from Sentinel-2 satellite data. The workflow combined Google Earth Engine (GEE) for satellite image processing and Python in Jupyter Notebook for time series analysis, with ChatGPT assisting in code editing. Our results show NDVI patterns are significantly influenced by volcanic activity, particularly eruptions and pyroclastic clouds, and about one-third of images were affected by cloud cover, especially during the rainy season. ChatGPT performed well in non-coding queries with a 79% satisfaction rate, but only 53% of generated code prompts were correct without modification. We conclude that while AI tools like ChatGPT have strong potential to enhance accessibility and efficiency in remote vegetation mapping, human oversight and foundational knowledge in geospatial analysis remain essential for accurate results.
- Research Article
- 10.1785/0220250055
- Feb 23, 2026
- Seismological Research Letters
- Dara E Goldberg + 5 more
Abstract Models of the spatiotemporal evolution of earthquake slip, termed finite-fault models, are a critical component of rapid earthquake and tsunami response, earthquake forecasting, seismic ground-motion estimates, and studies of earthquake kinematics. Here, we detail a newly released finite-fault modeling software, Wavelet Inversion for SliP (WISP), in use at the U.S. Geological Survey’s National Earthquake Information Center (NEIC) and available to the public. WISP version 1.1.0 allows inversion of teleseismic body and surface waves, as well as local strong-motion, static and dynamic Global Navigation Satellite System, and satellite imagery (e.g., Interferometric Synthetic Aperture Radar) observations on single or multiple planar fault segments. The software is used in NEIC rapid response of earthquakes Mw≥7, generally resulting in a published model within the first few hours after the event origin time. The rupture location and dimensions are then used as inputs to downstream products to estimate earthquake shaking, predict loss, and model the likelihood of secondary hazards, namely landslides and liquefaction. WISP is also used in research studies to evaluate the characteristics of complex ruptures including multifault ruptures and earthquake doublets, among others. The WISP version 1.1.0 software release is composed of Python-wrapped FORTRAN code to accomplish the inversion procedure. A simple command line interface facilitates ease of use even for those with only a cursory knowledge of Python scripting. WISP version 1.1.0 includes a Jupyter Notebook tutorial demonstrating use of the software for modeling the 2015 Mw 8.3 Illapel, Chile, earthquake. In parallel with the tutorial, we demonstrate the typical usage of the WISP software using the Mw 8.3 Illapel earthquake example here.
- Research Article
- 10.1080/1573062x.2026.2622959
- Feb 16, 2026
- Urban Water Journal
- Mohamed Hussain K + 1 more
ABSTRACT Accurate prediction of nodal pressure is necessary for effectual scheduling and enhancing pipeline reliability in water distribution systems. In this study, a Python Jupyter Notebook was used to build a predictive model. Nodal pressure analysis was executed at four-hour intervals across all 99 nodes in the Peroorkada urban water distribution network. The nodal pressure prediction was achieved using sixteen machine learning algorithms, including three hybrid/stacking regressors namely multi-layer perceptron (MLP), stochastic gradient descent (SGD) and K-nearest neighbours (KNN). The stacking regressor models consistently outperformed individual prediction models, with the MLP-hybrid regressor attaining the best results with a high regression score of 0.995 and lowest mean squared error of 0.36. The findings suggest that employing hybrid machine learning models, particularly the MLP-hybrid regressor, can significantly improve the accuracy of nodal pressure prediction in urban water distribution networks. Enhanced prediction capabilities can establish better scheduling and support earlier detection of leaks.