Articles published on Metadata
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
1154 Search results
Sort by Recency
- New
- Research Article
- 10.1038/s41467-026-73521-2
- May 20, 2026
- Nature communications
- Tong Sun + 9 more
Carbon emissions from groundwater depletion (GWD) can be involved in the global carbon cycle; however, it is poorly understood that how much contribution they have. We here generate two maps on global GWD using measurements from >171,000 monitoring wells and on global groundwater bicarbonate (HCO₃⁻) concentrations using the meta data from 28,902 measurements. We then evaluate the global GWD-induced carbon emissions and show that 1) the total emissions from GWD can be a type of significant source (52 ± 18 MMT CO₂ yr⁻¹) within the global carbon cycle, which have exceeded the net emissions of grasslands (~35 MMT CO₂ yr⁻¹), one of the world's main carbon sources; 2) the total GWD is approximately 550 ± 73 km³/year in recent 20 years, which is generally consistent with previous studies; 3) China (6.7 ± 1.6 MMT CO₂ yr⁻¹), Brazil (6.6 ± 2.5MMT CO₂ yr⁻¹), and India (5.8 ± 1.5 MMT CO₂ yr⁻¹) are the top three contributors of GWD-induced carbon emissions.
- Research Article
- 10.5334/dsj-2026-010
- Mar 5, 2026
- Data Science Journal
- Ge Peng + 8 more
The FAIR Principles—Findable, Accessible, Interoperable, and Reusable—offer a widely accepted framework for improving the sharing and reuse of digital scientific data by both human and machine users. Following these principles is critical for effective scientific data stewardship, broader scientific collaboration, and compliance with federal and agency data policies. This paper, based on the work of NASA’s Open, Free, and FAIR Working Group (O’FAIR WG) under the Earth Science Data Systems Program, presents an overview of how FAIR Principles are being applied within NASA’s Earth science data landscape. It highlights ongoing progress and challenges, identifies FAIR-enabling resources, and offers recommendations and strategic actions to enhance the FAIRness of NASA-funded open and free Earth science data products. The FAIR-enabling resources identified underscore the vital role of NASA’s existing enterprise processes, standards, tools, and infrastructures in supporting FAIR implementation. Our findings show strong performance in making NASA Earth science data findable and accessible. However, further work is needed—especially in enhancing interoperability, so that different systems and tools can better understand and exchange data. This is especially important for enabling machine-driven discovery and analysis. We emphasize the importance of a balanced strategy that combines a centralized, top-down approach—focused on building enterprise-level capabilities and processes—with a decentralized, bottom-up approach driven by discipline-specific needs and community practices. We advocate for coordinated efforts to enhance (meta)data interoperability to facilitate seamless data and information sharing and exchange of Earth science data both within NASA and across other agencies managing Earth science data.
- Research Article
- 10.1016/j.cherd.2026.02.055
- Feb 1, 2026
- Chemical Engineering Research and Design
- Franziska Lais + 3 more
Mechanical vapor recompression (MVR) is a widely used heat integration method for distillation processes. But so far no systematic work has been conducted on the applicability of MVR-Systems for binary mixtures in single distillation columns. In this work, a systematic evaluation of MVR for binary distillation tasks is presented with the aim of identifying the technical and economic envelopes in which heat pump-assisted distillation outperforms conventional setups. A comprehensive literature review and meta-analysis of published MVR case studies is conducted to support the identification of beneficial conditions. A theoretical thermodynamic analysis of the distillation process with and without MVR is carried out, identifying the required driving temperature gradient in the column reboiler as the crucial parameter. Combining both approaches, it is possible to point out the optimum trade-off between economic viability and thermodynamic efficiency, with particular focus on the maximum allowable temperature gradient. This enables quick assessment of the suitability of mechanical vapor recompression for binary mixtures. • Mechanical vapor recompression (MVR) as heat integration method for distillation processes. • Discussion about influencing parameters on the Coefficient of Performance (COP) based on a literature review and meta data analysis. • Analysis of thermodynamic separation efficiency identifies driving temperature gradient at the column reboiler as crucial parameter to assess MVR scenarios. • The economic feasibility of MVR is demonstrated by correlating the energy price ratio with the driving temperature gradient in the reboiler of the distillation column.
- Research Article
- 10.12688/f1000research.172078.1
- Jan 29, 2026
- F1000Research
- Mahyumi Rantina + 2 more
This study presents a comprehensive three-level meta-analysis examining the relationship between creative play and creativity development, focusing on the moderating effects of age, culture, and game type. Using meta-analytic methods, we systematically analyzed 78 empirical studies (N = 21,456 participants) published between 2000 and 2024. Results revealed a significant positive relationship between creative play engagement and creativity development (g = 0.62, 95% CI [0.58, 0.66]). Moderator analyses showed the relationship’s strength varied significantly by age group: early childhood (3–7 years) demonstrated the strongest effect (g = 0.74), followed by middle childhood (8–12 years; g = 0.61), and adolescence (13–18 years; g = 0.84). Cultural context significantly moderated the relationship, with collectivist cultures showing a stronger effect (g = 0.68) compared to individualist cultures (g = 0.57). Regarding play type, dramatic play exhibited the strongest relationship with creativity (g = 0.71), followed by constructive play (g = 0.65), and digital play (g = 0.52). These findings underscore the importance of developmentally and culturally appropriate play interventions for fostering creativity. This research offers valuable insights for educators, policymakers, and future researchers in designing effective creativity-enhancing programs across developmental stages and cultural contexts.
- Research Article
- 10.1038/s41597-026-06586-9
- Jan 22, 2026
- Scientific data
- Vamshi Damagatla + 8 more
We present a comprehensive dataset of absorption and reduced scattering spectra collected via time-domain diffuse optical spectroscopy in the 610-1110 nm range, across 10 subjects and on 5 different body locations - the upper arm, the radius-ulna region, the abdomen, the forehead, and the calcaneus. The ultrasound images acquired in the same location are included as well, and along with the demographic information shed useful insights on the inter-subject variability. The dataset, openly available in Zenodo, contains the raw data, the meta data, the tools to operate on them, and can be exploited to devise light-based diagnostics or therapeutic techniques, to appreciate biological variability, and also to test different models of photon migration.
- Research Article
- 10.1093/gigascience/giag038
- Jan 21, 2026
- GigaScience
- Georgios K Georgakilas + 16 more
As the technological advancements of the early 21st century are pushing industrial biotechnology (IB) into the realm of Big Data-driven innovation, the requirement for trustworthy data management, annotation, and standardization is emerging as a necessity. Minimum information models (MIMs) have long been used across disciplines as the backbone of good data management practices by providing the scaffold upon which standardized recording of metadata can adequately and succinctly describe an understudied phenomenon. Here we present a minimum set of metadata, named the minimum information for fermentation experiments (MIFE) and devices (MIFD), that has been specifically designed to accommodate the data management and annotation needs of IB-related fermentation experiments. Although the proposed schema is tailored to IB applications, MIFE and MIFD build upon well-established models and community standards to facilitate easier integration with existing infrastructure and easier adoption by the community, as well as aim to integrate Findable, Accessible, Interoperable, and Reproducible (FAIR) principles in the IB field. In addition, the integration with FAIR Data Station (FAIR DS), a tool that offers metadata validation and enables the automated uptake of (meta)data from data management repositories such as FAIRDOM-SEEK, is showcased. The proposed models are accompanied by a Python package that enables their programmatic use by creating a Linked Data Modeling Language (LinkML) schema that can fuel subsequent analyses. Through the promotion and simplification of knowledge discovery, we believe that MIFE and MIFD can accelerate the application of state-of-the-art artificial intelligence (AI) methods and the adoption of explainable AI to better understand bioprocesses at scale.
- Research Article
2
- 10.1039/d5fd00098j
- Jan 1, 2026
- Faraday discussions
- R M R Reese + 2 more
The development of artificial intelligence and machine learning in chemistry is opening new avenues for data-driven discoveries. However, the application of such methodologies in polymer chemistry has been hampered due to the complex structure-properties relationship of polymers and the lack of (meta)data available. Recent efforts have been made to experimentally determine or computationally evaluate thermodynamic parameters associated with (de)polymerisation reactions, such as enthalpy and entropy of polymerisation, as well as ceiling temperature, to design polymers primed for chemical recycling. Here, we report TROPIC (Thermodynamics of Ring-Opening Polymerisation Informatics Collection), an open-source database harnessing experimental and computational thermodynamic parameters for ring-opening polymerisation (ROP) from the academic literature. TROPIC links thermodynamic parameters with the experimental conditions or the computation methodologies used to determine them, to allow further analysis. TROPIC can be accessed via an interactive website or application programming interface (API) and presents a first step towards facilitating the data-driven discovery of novel functional polymers.
- Research Article
- 10.57238/csj.2025.1010
- Dec 31, 2025
- CyberSystem Journal
- Sarmad Jawad + 3 more
Although quantum computing poses a number of opportunities, it also represents an emergent threat to conventional cryptographic practices, particularly to digital forensics in cybercrime prosecution. While post-quantum cryptanalysis is still in its infancy, early stages of quantum supremacy are being achieved. As feature sets of quantum computing develop, substantial emerging technologies are expected to be available in the next five years, including Shor’s and Grover’s algorithms. While the former can be used for the fast factoring of large integers, this implies that public-key cryptographic systems will be efficiently calculable, facilitating the detection of digital communication channels. Simultaneously, this allows encrypted data storage to be broken, undermining claims protection. Meta data, for example, phone numbers dialed, contact lists or text length, is vulnerable through Grover’s algorithm, which provides a quadratic acceleration of brute force attacks.
- Research Article
1
- 10.1002/adsu.202500567
- Dec 26, 2025
- Advanced Sustainable Systems
- Verónica I Dumit + 43 more
Abstract Ensuring data quality, completeness, and interoperability is crucial for progressing safety research, Safe‐and‐Sustainable‐by‐Design approaches, and regulatory approval of nanoscale and advanced materials. While the FAIR (Findable, Accessible, Interoperable, and Re‐usable) principles aim to promote data re‐use, they do not address data quality, essential for data re‐use for advancing sustainable and safe innovation. Effective quality assurance procedures require (meta)data to conform to community‐agreed standards. Nanosafety data offer a key reference point for developing best practices in data management for advanced materials, as their large‐scale generation coincided with the emergence of dedicated data quality criteria and concepts such as FAIR data. This work highlights frameworks, methodologies, and tools that address the challenges associated with the multidisciplinary nature of nanomaterial safety data. Existing approaches to evaluating the reliability, relevance, and completeness of data are considered in light of their potential for integration into harmonized standards and adaptation to advance material requirements. The goal here is to emphasize the importance of automated tools to reduce manual labor in making (meta)data FAIR, enabling trusted data re‐use and fostering safer, more sustainable innovation of advanced materials. Awareness and prioritization of these challenges are critical for building robust data infrastructures.
- Research Article
- 10.1093/nar/gkaf1224
- Dec 17, 2025
- Nucleic acids research
- Shahram Saghaei + 9 more
High-throughput sequencing has generated an unprecedented volume of data. However, researcher-submitted data in repositories requires extensive curation and quality control for reuse. These tasks are hindered by the multiplicity of repositories, the sheer volume of the data, and the complexity of virus (meta)data curation. To address these challenges, VirJenDB offers a user-friendly platform to facilitate versioned, community-driven curation, and ontology development. Virus sequences were ingested from 16 sources, including ~200 fields of metadata or standards, covering taxonomy, sample, and host information. Up to 85 metadata fields have undergone at least one round of curation, and are linked to 15.4 million virus sequences, with 88 % from those infecting eukaryotes and the remaining infecting prokaryotes. Subsets were created, including a novel collection of 0.91 million viral operational taxonomic unit (vOTU) sequences across all viruses, while keeping the original sequences from each vOTU to facilitate downstream analyses, e.g. sequence variation. The VirJenDB web portal (https://www.virjendb.org) provides HTTPS and Application Programming Interface (API) access to the sequence datasets and metadata, offering a search engine, filtering, download, visualizations, and documentation. VirJenDB aims to connect the phage and eukaryotic virus research communities by supporting webtool integration, meta-analyses, and metadata schema extensions.
- Research Article
2
- 10.1145/3769771
- Dec 4, 2025
- Proceedings of the ACM on Management of Data
- Jie Tan + 8 more
Query optimization is a complex planning and decision-making problem within the exponentially growing plan space in database management systems (DBMS). Traditional optimization techniques have been extensively studied over decades, leaving limited room for further improvement along this track. Recent developments of Large Language Models (LLMs) have demonstrated their potential in solving complex planning and decision-making problems, such as arithmetic and programmatic tasks. In this paper, we try to explore the potential of LLMs in handling query optimization and propose a tentative LLM-based query optimizer dubbed LLM-QO, established on PostgreSQL's execution engine. In LLM-QO, we formulate query optimization in an autoregressive fashion which directly generates the execution plan without explicit plan enumeration. To investigate the essential input of LLM-QO, we design a customized data recipe named QInstruct to collect the training data from various optimizers and serialize the database's meta data, queries and corresponding plans into a textual format. Based on QInstruct, we implement a two-stage fine-tuning pipeline, Query Instruction Tuning (QIT) and Query Direct Preference Optimization (QDPO), to empower the capability of general-purpose LLMs in handling query optimization. In our experiments, LLM-QO can generate valid and high-quality plans and consistently outperforms both traditional and learned optimizers on three query workloads. Our findings verify that LLMs can be derived as query optimizers where generalization, efficiency and adaptivity deserve further research efforts.
- Abstract
- 10.1002/alz70855_107196
- Dec 1, 2025
- Alzheimer's & Dementia
- Jielin Xu + 7 more
BackgroundStudying Alzheimer's Disease (AD) at the single‐cell level is essential for uncovering its complex biology, such as identifying disease‐associated cell states and decoding their molecular mechanisms, also even designing personalized treatment strategies.MethodIn this study, we created a human brain single‐nuclei RNA‐seq (snRNA‐seq) atlas focused on neurological disorders. In addition to standard quality controls, we conducted meta clinical data harmonization which combined both pathological and clinical diagnoses to define disease groups. We performed snRNA‐seq data harmonization and cell type annotation using a generative AI‐based single‐cell foundation model. The snRNA‐seq data harmonization accounted for both categorical and continuous covariates, and cell type annotation was performed hierarchically using both supervised and un‐supervised learnings.ResultOur human brain snRNA‐seq atlas comprised approximately 14 million nuclei—making it, to the best of our knowledge, the largest human Alzheimer's brain‐cell atlas of its kind. After the metadata harmonization, our atlas included 2,239 human postmortem samples, including 1031 control, 658 AD, 110 AD‐resilience, 221 primary age‐related tauopathy (PART) samples, and 219 samples with other neurological diseases. The atlas covers 33 brain regions, broad age ranges from 19 to 100+, and various clinical factors such as APOEgenotype, Braak stage, clinical and pathological diagnoses. We identified over 50 cell types at the finest level of annotation (3rd level), such as astrocyte, T cell, microglia subtypes e.g., DAM, tau, and MHC‐microglia, excitatory neuron subtypes, such as L2/3‐intratelencephalic (IT), inhibitory neuron subtypes, e.g., Sst, also other neuron subtypes from non‐cortex regions, such as dopaminergic neuron. We found loss of Sst neuron has been consistently observed in AD compared with resilience samples. Our DEG analyses revealed proteins potentially linked to cognitive resilience in AD, e.g., HSP90AA1, CLU in Sst neuron, PLCG2, TSPAN14 in DAM, and ERCC1 in T cells. Additional analyses revealed decreased PLCG2 expression in DAM with aging in AD subjects, but not in cognitively healthy individuals, indicating potential role of PLCG2 in resilience.ConclusionThis study generated a large‐scale human brain snRNA‐seq atlas focused on neurodegenerative diseases, providing a valuable digital resource for the scientific community to explore multiple neurodegenerative conditions from diverse perspectives.
- Research Article
- 10.55524/ijirem.2025.12.6.19
- Dec 1, 2025
- International Journal of Innovative Research in Engineering and Management
- G Hariharan + 1 more
This?????? work presents a real-time, multi-input, multi-object tracking, and speed estimation system that uses YOLOv8 deep learning detector fused with Kalman Filter-based predictive tracking. By coupling cutting-edge detection accuracy with strong motion prediction, the article deals with problems of occlusions, noisy detections, variable motion, and heterogeneous video sources that are typical of smart environments. Kalman Filters are implemented to allow stable identity assignment, trajectory smoothing, and frame-consistent speed estimation while YOLOv8 is the primary detection tool due to its anchor-free, multi-scale design, and rapid inference. The multi-video-stream-compatible system, which can also be used in the area of intelligent surveillance, traffic monitoring, and automated analytics by CCTV, webcams, recorded footage, and image uploads, has different potential usages. Encrypted logging is present to guarantee the security of Meta data storage in privacy-sensitive environments. The?????? results of the research presented in the paper reveal the potential of the system to carry out object identity retention, multi-object trajectory management, and on-the-fly calculation of motion parameters, thus providing a scalable and flexible platform for smart cities and industrial applications. Such a system is a significant move forward to the eventual establishment of a real-time tracking system framework that is resilient, modular, and respectful of privacy, hence making a substantial incremental development towards the evolution of these systems in dynamic smart ?????environments.
- Research Article
- 10.3897/biss.9.179182
- Nov 24, 2025
- Biodiversity Information Science and Standards
- Cristina Di Muri + 3 more
Notwithstanding the global interest and ongoing efforts to manage Invasive Alien Species (IAS), limit their spread and/or their impact on native communities and ecosystems, the path to this goal remains lengthy and fraught with significant challenges. A variety of approaches to support an efficient management of IAS have been outlined, including effective data sharing, iterative collaboration between stakeholders, as well as scientific communication and engagement, among others (Onley et al. 2025). Specifically for data sharing, several guidelines have emerged to ensure the accurate management and communication of scientific information, including the FAIR (Findable, Accessible, Interoperable, Reusable) principles and open science practices. FAIR adoptions are vital to enable the interoperablity and reusability of digital research products (Wilkinson et al. 2016), and open science enhances their transparency and reliability (Bertram et al. 2023). In addition to such broadly applicable guidelines, specific recommendations and standards have emerged for IAS research and data management (Groom et al. 2017; Groom et al. 2019). Implementing effective data management practices alongside enhanced collaboration and communication between stakeholders can support the evidence-based management of IAS. The opportunity to test and validate this integrated approach was provided by USEit - Use of operational synergies for the study and integrated management of invasive alien species in Italy - a multidisciplinary, multi-ecosystem, and multi-taxonomic Italian project focusing on IAS. USEit combines expertise from scientists studying IAS in different ecosystems (aquatic and terrestrial) to find synergies across different data collection and data management methods, ultimately outlining operational and technical recommendations suitable for adoption within and beyond the project. The recommendations were drafted based on an evaluation of the responses to a national survey, subsequently reviewed by national experts of biological invasions, and validated through SWOT analysis, i.e. semi-quantitative analysis of Strengths, Weaknesses, Opportunities, and Threats identified for each recommendation (Azzurro et al. 2025). Such recommendations were adopted and further expanded within USEit. Through this process, detailed technical guidelines for IAS data management were collaboratively defined and incorporated into the project's Data Management Plan (DMP). The DMP was published as a machine-actionable, open access document describing standards, schemas, and repositories used for a diverse array of (meta)data, including occurrences, stable isotopes, elemental composition, remote sensing, DNA metabarcoding, citizen science, and acoustic telemetry (Rosati and Di Muri 2025). Overall, USEit data, services and related metadata are published into open-access repositories and available through the LifeWatch Italy Data Portal and Metadata Catalogue (LifeWatch Italy Data Portal 2025; Tarallo et al. 2025). All digital research products are assigned with persistent identifiers, described by clear metadata and enriched with controlled vocabularies to enable their interpretation, reuse and interoperation. In addition, all digital products generated and published within the project are shared within a unique landing page representing the main access point to USEit research outcomes as well as to communication material and events for public engagements (LifeWatch Italy 2025). These results could be readily translated into practical solutions for IAS management, showing that the implementation of community-shared FAIR practices across multidisciplinary contexts enhances our ability to manage natural resources effectively while fostering new opportunities for scientific and societal growth.
- Research Article
- 10.3897/biss.9.177966
- Nov 19, 2025
- Biodiversity Information Science and Standards
- Jutta Buschbom + 4 more
Effective decision-making on biodiversity restoration would greatly benefit from baseline data on intraspecific genetic diversity, the ability to integrate it across species for each location, and efficient systems for monitoring changes of genetic diversity in response to management interventions and environmental dynamics. This requires large sets of well-curated information-rich FAIR (Findable, Accessible, Interoperable, Reusable) Digital Objects (FDOs; Schultes and Wittenburg 2019), implemented as, for instance, Digital Extended Specimens (DES), which are representing digitized field samples, their derived genomic data and associated information (Hardisty et al. 2022). These use-case-driven sets of structured (meta)data from many providers need to be merged, modified and further extended on demand. Existing workflows and work environments have to be redesigned to accelerate this process and achieve seamless integration to be able to scale to the needs of efficient and effective worldwide monitoring under the United Nations Kunming-Montreal Global Biodiversity Framework. Contributing to data and infrastructure development processes in support of global monitoring is also the global effort to generate high-quality reference genomes. The Earth BioGenome communities, including the European Reference Genome Atlas (ERGA) and Biodiversity Genomics Europe (BGE) communities, drive the development of standardization and harmonization for well-designed sampling and comprehensive FAIR and CARE (Collective Benefit, Authority to Control, Responsibility, Ethics) (meta)data (Buzan et al. 2025). Their goal is reproducible analytical pipelines that can be assembled on the fly for implementing sophisticated statistical approaches and algorithms. Such research-focused pipelines prepare the development and application of globally adopted workflow templates for the calculation of Essential Biodiversity Variables (EBVs). These templates are under development within the Group on Earth Observations Biodiversity Observation Network (GEO BON) (Lumbierres et al. 2025). Their output can be used for globally aligned and interpretable planning, monitoring, reporting and reviewing (see CBD/COP/16/L.33). Achieving global agreement on a (small) set of jointly used interoperable vocabularies and ontologies for (meta)data and machine-actionable operations is a communication, community-capacity development, and negotiation process that requires significant resources, engagement across sociocultural groups and geographies, patience and time. Ongoing processes towards these goals continue to be organized and promoted by, e.g., Biodiversity Information Standards (TDWG), the Global Biodiversity Information Facility (GBIF), GEO BON, and the Ocean Biodiversity Information System (OBIS), as well as large continental networks. We propose to bridge the gap before standardization and harmonization are in place and thereby facilitate and accelerate such efforts. Taking a pragmatic approach, our objective is to contribute a lightweight "pocket" dataspace that provides interoperability and data governance in connection with a digital platform for global genetic monitoring (Fig. 1). The pocket dataspace would allow existing platforms and tools to be connected easily through community-provided mappings between workflow element-specific formats, terms, data and operations stored in an open repository. This approach would enable users to take advantage of the core strengths of existing software products and the expertise of their associated communities, while quickly sharing data and their work between specialized solutions. At the same time, data would be FAIRified and CAREd-for, promoting attribution, transparency and responsibility. The functions of the pocket dataspace can be prerequisites for a transition to machine-actionable operations usable to agentic AI. As a general-purpose interlinking and translation component, the pocket dataspace aims to be the missing link between distributed, federated, non-standard-compliant and undocumented data, governance regimes, provenance logs and software output, and the need for transparent, well-governed and versatile conservation applications. One of these conservation applications will be the proposed platform for monitoring global genetic diversity. The platform will aggregate and visualize externally-linked population-genetic data and summary metrics that are the results of analysis pipelines enabled by, e.g., the pocket dataspace. Its aim is to provide visualization and support dataset and analysis management for local to global conservation efforts. It would store uploaded or linked genetic diversity metrics, perform selected automated analyses for continuously updated genetic diversity measures, as well as provide a starting point for user-designed analyses. The objective of our initial use case is to analyze three basic measures of population-genetic diversity based on genome-wide sequencing data as a first step towards operationalizing genetic monitoring at scale. Together, the pocket dataspace and monitoring platform for genetic diversity data have the goal to support a digital ecosystem that is foremost flexible, requiring low investments by users, and be able to quickly integrate both inter- and transdisciplinary data as well as existing powerful platforms and well-tested analytical pipelines and functionality.
- Research Article
2
- 10.1016/j.dib.2025.112288
- Nov 15, 2025
- Data in Brief
- Synne Krekling Lien + 5 more
This article describes a dataset of hourly sub-metered energy use data from 48 school buildings located in Oslo, owned and managed by Oslobygg KF. The dataset consists of 1 comma-delimited file per building, each containing meta data about the building, time series data containing energy use measurements and local weather data. The length of the dataset varies by building, covering between 1 and 11 years of raw data. Raw data for each building was downloaded in 2023 from Oslobygg KF’s energy management system, “Energinet.” Only buildings with sufficiently reliable sub-metered heating data were included. This process included manual selection, quality-control, relabelling and cleaning to ensure consistency and accuracy. The dataset includes buildings with both electric heating (electric boilers and/or heat pumps) and district heating. All buildings have sub-metered heating data, and some also include sub-meters for domestic hot water heating and photovoltaic electricity generation. The data set can be used for several research and engineering purposes, including benchmarking and validation of building simulations, heating disaggregation, energy use time series classification, forecasting of energy loads and flexibility, grid planning and other modelling activities.
- Research Article
1
- 10.3390/math13223615
- Nov 11, 2025
- Mathematics
- Marcos Sergio Pacheco Dos Santos Lima Junior + 2 more
Few-shot image classification aims to recognize novel classes from only a handful of labeled examples, a challenge in domains where data collection is costly or impractical. Existing solutions often rely on meta learning, fine tuning, or data augmentation, introducing computational overhead, risk of overfitting, or are not highly efficient. This paper introduces ProbaCLIP, a simple training-free approach that leverages Kernel Density Estimation (KDE) within the embedding space of Contrastive Language-Image Pre-training (CLIP). Unlike other CLIP-based methods, the proposed approach operates solely on visual embeddings and does not require text labels. Class-conditional probability densities were estimated from few-shot support examples, and queries were classified by likelihood evaluation, where Principal Component Analysis (PCA) was used for dimensionality reduction, compressing the dissimilarities between classes on each episode. We further introduced an optional bandwidth optimization strategy and a consensus decision mechanism through cross-validation, while addressing the special case of one-shot classification with distance-based measures. Extensive experiments on multiple datasets demonstrated that our method achieved competitive or superior accuracy compared to the state-of-the-art few-shot classifiers, reaching up to 98.37% accuracy in five-shot tasks and up to 99.80% in a 16-shot framework with ViT-L/14@336px. We proved our methodology by achieving high performance without gradient-based training, text supervision, or auxiliary meta-training datasets, emphasizing the effectiveness of combining pre-trained embeddings with statistical density estimation for data-scarce classification.
- Research Article
1
- 10.1186/s12911-025-03188-0
- Oct 13, 2025
- BMC Medical Informatics and Decision Making
- Nelly Barret + 3 more
BackgroundClinicians are interested in better understanding complex diseases, such as cancer or rare diseases, so they need to produce and exchange data to mutualize sources and join forces. To do so and ensure privacy, a natural way consists in using a decentralized architecture and Federated Learning algorithms. This ensures that data stays in the organization in which it has been collected, but requires data to be collected in similar settings and similar models. In practice, this is often not the case because healthcare institutions work individually with different representations and raw data; they do not have means to normalize their data, and even less to do so across centers. For instance, clinicians have at hand phenotypic, clinical, imaging and genomic data (each individually collected) and want to better understand some diseases by analyzing them together. This example highlights the needs and challenges for a cooperative use of this wealth of information.MethodsWe designed and implemented a framework, named I-ETL, for integrating highly heterogeneous healthcare datasets of hospitals in interoperable databases. Our proposal is twofold: (i) we devise two general and extensible conceptual models for modeling both data and metadata and (ii) we propose an Extract-Transform-Load (ETL) pipeline ensuring and assessing interoperability from the start.ResultsBy conducting experiments on open-source datasets, we show that I-ETL succeeds in representing various health datasets in a unified way thanks to our two general conceptual models. Next, we demonstrate the importance of blending interoperability as a first-class citizen in integration pipelines, ensuring possible collaboration between different centers.ConclusionAs a framework, I-ETL contributes to integrate and improve interoperability between healthcare institutions. When used in a decentralized federated platform, it eases the federated analysis of the different hospital databases and helps clinicians to obtain insights and knowledge on medical conditions of interest.
- Research Article
2
- 10.3390/jrfm18100578
- Oct 12, 2025
- Journal of Risk and Financial Management
- Saurav Chandra Talukder + 2 more
Impact investing and sustainable finance are crucial in addressing social and environmental issues while developing a more resilient, equitable, and sustainable world. The purpose of this article is to analyze, synthesize, and evaluate the existing literature on the impact investing and sustainable finance research domain. Using PRISMA protocol, data was extracted from the Web of Science and Scopus databases, resulting in the compilation of 498 documents. Researchers use Biblioshiny and VOSviewer to analyze the bibliographic meta data. The findings show that the number of publications in this field has increased significantly over the last five years. In terms of journal productivity, Sustainability is the most prominent source, followed by Resources Policy and Journal of Cleaner Production. The results indicate that China published 189 articles, securing the first position, followed by India with 82 articles and the UK with 72 articles. Thematic map analysis underscores the significance of impact investing in renewable energy for sustainable economic growth. In addition, four research themes have emerged from the co-occurrence of keywords analysis. These themes are “sustainable finance for sustainable economic development”; “the rise of ESG investing in the changing world”; “corporate governance and CSR in enhancing firm performance”; and “mobilizing sustainable finance to tackle climate changes”. Furthermore, the research gives a complete summary of current research trends, future research directions and policy recommendations to assist academic researchers, investors, policymakers, business organizations and financial institutions in better understanding the impact investment and sustainable finance.
- Research Article
- 10.3390/cancers17193213
- Oct 1, 2025
- Cancers
- Olga Tsave + 4 more
Background/Objectives: Cancer remains a leading global cause of death, with breast, lung, colorectal, and prostate cancers being among the most prevalent. The integration of Artificial Intelligence (AI) into cancer imaging research offers opportunities for earlier diagnosis and personalized treatment. However, the effectiveness of AI models depends critically on the quality, standardization, and fairness of the input data. The EU-funded INCISIVE project aimed to create a federated, pan-European repository of imaging and clinical data for cancer cases, with a key objective to develop a robust framework for pre-validating data prior to its use in AI development. Methods: We propose a data validation framework to assess clinical (meta)data and imaging data across five dimensions: completeness, validity, consistency, integrity, and fairness. The framework includes procedures for deduplication, annotation verification, DICOM metadata analysis, and anonymization compliance. Results: The pre-validation process identified key data quality issues, such as missing clinical information, inconsistent formatting, and subgroup imbalances, while also demonstrating the added value of structured data entry and standardized protocols. Conclusions: This structured framework addresses common challenges in curating large-scale, multimodal medical data. By applying this approach, the INCISIVE project ensures data quality, interoperability, and equity, providing a transferable model for future health data repositories supporting AI research in oncology.