Discovery Logo
Sign In
Search
Paper
Search Paper
R Discovery for Libraries Pricing Sign In
  • Home iconHome
  • My Feed iconMy Feed
  • Search Papers iconSearch Papers
  • Library iconLibrary
  • Explore iconExplore
  • Ask R Discovery iconAsk R Discovery Star Left icon
  • Literature Review iconLiterature Review NEW
  • Chat PDF iconChat PDF Star Left icon
  • Citation Generator iconCitation Generator
  • Chrome Extension iconChrome Extension
    External link
  • Use on ChatGPT iconUse on ChatGPT
    External link
  • iOS App iconiOS App
    External link
  • Android App iconAndroid App
    External link
  • Contact Us iconContact Us
    External link
  • Paperpal iconPaperpal
    External link
  • Mind the Graph iconMind the Graph
    External link
  • Journal Finder iconJournal Finder
    External link
Discovery Logo menuClose menu
  • Home iconHome
  • My Feed iconMy Feed
  • Search Papers iconSearch Papers
  • Library iconLibrary
  • Explore iconExplore
  • Ask R Discovery iconAsk R Discovery Star Left icon
  • Literature Review iconLiterature Review NEW
  • Chat PDF iconChat PDF Star Left icon
  • Citation Generator iconCitation Generator
  • Chrome Extension iconChrome Extension
    External link
  • Use on ChatGPT iconUse on ChatGPT
    External link
  • iOS App iconiOS App
    External link
  • Android App iconAndroid App
    External link
  • Contact Us iconContact Us
    External link
  • Paperpal iconPaperpal
    External link
  • Mind the Graph iconMind the Graph
    External link
  • Journal Finder iconJournal Finder
    External link
features
  • Audio Papers iconAudio Papers
  • Paper Translation iconPaper Translation
  • Chrome Extension iconChrome Extension
Content Type
  • Journal Articles iconJournal Articles
  • Conference Papers iconConference Papers
  • Preprints iconPreprints
  • Seminars by Cassyni iconSeminars by Cassyni
More
  • R Discovery for Libraries iconR Discovery for Libraries
  • Research Areas iconResearch Areas
  • Topics iconTopics
  • Resources iconResources

Related Topics

  • Data Preprocessing
  • Data Preprocessing

Articles published on Raw data

Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
44614 Search results
Sort by
Recency
  • New
  • Research Article
  • 10.1016/j.softx.2026.102602
Do you speak LCA? FAULDIER: A framework for large language model assisted Life Cycle Inventories in Life Cycle Assessment
  • Jun 1, 2026
  • SoftwareX
  • Lukas Lazar

Advances in Life Cycle Assessment (LCA) toward greater automation and methodological integration have intensified challenges in standardizing heterogeneous raw Life Cycle Inventory (LCI) data, which rarely aligns with LCI database nomenclature. Rule-based mapping approaches struggle with linguistic variations, typographical errors, unit inconsistencies, and location granularity mismatches. Furthermore, they fail to adapt automatically when data or terminology change. FAULDIER (Framework for lArge langUage modeL assisteD lIfe cyclE inventoRy) is proposed as a framework to bridge heterogeneities between raw LCI data and LCI database requirements. It aims to automate data transformation by resolving naming inconsistencies, classifying flow types, and harmonizing locations and units. By using LLMs, FAULDIER supports handling multilingual inputs, correcting typographical errors, resolving location granularity mismatches, and choosing proxies for missing processes. In a test scenario using the open LCI database FORWAST and a use case characterized by non-standardized multilingual entries, unit inconsistencies, and typographical errors, FAULDIER achieved approximately 57% process and elementary flow mapping accuracy (single-expert validated), with unit conversion error rates below 1%. Current limitations include LCI database constraints, LLM token limitations, performance variability of open-weight LLMs, mapping ability, and reproducibility across runs. Within these limitations, FAULDIER indicates the feasibility of LLM-assisted LCI construction for LCA modeling, particularly for non-standardized raw LCI data. Future work could focus on developing confidence metrics for mapped LCI data, optimizing LLM query efficiency, and expanding testing across additional LCI databases, use cases, and LLMs.

  • New
  • Research Article
  • 10.1002/bmc.70470
GC/MS and LC/MS Metabolomic Analysis of Gefitinib in Liver Cancer.
  • Jun 1, 2026
  • Biomedical chromatography : BMC
  • Dan Li + 7 more

Hepatocellular carcinoma, the third leading cause of cancer-related deaths globally, presents a critical public health burden in China due to its high incidence and mortality. While targeted therapies and immunotherapies have improved survival in advanced HCC, drug resistance remains a major therapeutic challenge. Recent studies suggest that gefitinib, an EGFR inhibitor, overcomes lenvatinib resistance, yet its mechanistic underpinnings are incompletely understood. To investigate gefitinib's metabolic effects in HCC, we conducted untargeted metabolomic profiling using two separate platforms: gas chromatography-mass spectrometry (GC-MS) and liquid chromatography-mass spectrometry (LC-MS) with both hydrophilic interaction liquid chromatography (HILIC) and reversed-phase modes. Raw data were processed by Mass Hunter, normalized with internal standards, and analyzed via SIMCA for pattern recognition. Principal component analysis (PCA) of quality control samples and experimental groups (n = 6 each) confirmed system stability and clear inter-group separation. Orthogonal projections to latent structures discriminant analysis models were validated by 200 permutation tests. Analysis identified 42 metabolites with VIP > 1, of which 25 showed significant alterations (p < 0.05) post-gefitinib treatment. KEGG/RaMP-DB enrichment revealed perturbations in four key pathways: arginine-proline metabolism, nitrogen metabolism, branched-chain amino acid biosynthesis, and taurine metabolism. These results delineate gefitinib-induced metabolic reprogramming in HCC cells, providing a foundation for targeting metabolic vulnerabilities to overcome therapy resistance.

  • New
  • Research Article
  • 10.1016/j.prp.2026.156451
Integrated pan-cancer profiling and experimental validation identify CCDC59 as a key driver and therapeutic biomarker in liver hepatocellular carcinoma.
  • Jun 1, 2026
  • Pathology, research and practice
  • Kun Li + 8 more

Integrated pan-cancer profiling and experimental validation identify CCDC59 as a key driver and therapeutic biomarker in liver hepatocellular carcinoma.

  • New
  • Research Article
  • 10.1016/j.dib.2026.112719
Operational building datasets from multipurpose buildings in Denmark and Switzerland: High-resolution energy use and room measurements.
  • Jun 1, 2026
  • Data in brief
  • Simon Pommerencke Melgaard + 9 more

The dataset comprises Building Management System (BMS) data from an educational building located on the main campus of Aalborg University in Denmark, as well as from Empa's NEST (Next Evolution in Sustainable Building Technologies) demonstrator building in Switzerland. The buildings contain main and sub-meters for all equipment using electricity or thermal energy. The equipment using thermal energy includes air handling units (water-based heating coils), space heating (floor heating, radiators, and ceiling heating), and domestic hot water (heat exchangers). Besides the energy data, room data, such as temperature, CO2 concentration, occupant presence, radiator valve opening, and ventilation damper opening, are included for all rooms in the buildings. The data spans 6 to 28 months, depending on the building and the measurement points. The data was collected as raw data with a time resolution between 1 and 10 min. The dataset is expected to be useful for various applications, including model calibration, machine learning, and occupant analysis.

  • New
  • Research Article
  • 10.1016/j.dib.2026.112797
A multimodal EMG and IMU dataset for assessing the quality of exercises designed for spatially constrained environments.
  • Jun 1, 2026
  • Data in brief
  • Veronika Kotolová + 13 more

A multimodal EMG and IMU dataset for assessing the quality of exercises designed for spatially constrained environments.

  • New
  • Research Article
  • 10.1016/j.neunet.2026.108622
CBAM-ST-GCN: An enhanced DRL-based end-to-end visual navigation framework for mobile robot.
  • Jun 1, 2026
  • Neural networks : the official journal of the International Neural Network Society
  • Mingyang Xie + 4 more

CBAM-ST-GCN: An enhanced DRL-based end-to-end visual navigation framework for mobile robot.

  • New
  • Research Article
  • 10.1016/j.dib.2026.112728
Transcriptome dataset of different carrot genotypes during various developmental stages of callus.
  • Jun 1, 2026
  • Data in brief
  • Xinrong Wang + 6 more

Callus formation and differentiation are direct manifestation of plant cell totipotency, tightly regulated by key hormone signals and tissue-specific genes. Carrot (Daucus carota L.) is a widely cultivated vegetable with high economic and nutritional values, for which callus culture serves as the fundamental technique in rapid propagation and functional genomics. However, callus induction and differentiation efficiency vary substantially across carrot genotypes, and the underlying molecular regulatory networks remain poorly characterized. To investigate the molecular regulatory mechanisms governing callus development, a comparative transcriptomic dataset was generated from four Daucus carota lines characterized by distinct callus induction and differentiation capacities. Samples were collected at four time points during the callus culture process to obtain transcriptomic expression datasets. The RNA sequencing data were generated by MGI platform, yielded a total of 399.48 Gb of high-quality data (average 6 Gb per library, Q30 ≥ 97%). This transcriptome dataset serves as a core resource for elucidating the molecular mechanisms of dedifferentiation, stress adaptation, and totipotency in carrot callus, providing insights into plant cell plasticity and supporting comparative genomics and applications in crop regeneration. The raw sequencing data are publicly available in the NCBI Sequence Read Archive (SRA) under the BioProject accession number PRJNA1398431.

  • New
  • Research Article
  • 10.1016/j.mri.2026.110643
Accelerated reconstruction of 5D free-running MRI with variable projection-augmented Lagrangian (VPAL).
  • Jun 1, 2026
  • Magnetic resonance imaging
  • Yitong Yang + 7 more

Accelerated reconstruction of 5D free-running MRI with variable projection-augmented Lagrangian (VPAL).

  • New
  • Research Article
  • 10.1016/j.dib.2026.112675
Semantic data transformation, FAIRification and provenance for data spaces
  • Jun 1, 2026
  • Data in Brief
  • Georgios M Santipantakis + 2 more

Semantic data transformation, FAIRification and provenance for data spaces

  • New
  • Research Article
  • 10.1016/j.rineng.2026.110047
BCAG-Net: A multi-branch deep learning framework with Butterworth–Chebyshev filtering and attention mechanisms for accurate cellular network traffic prediction
  • Jun 1, 2026
  • Results in Engineering
  • Ameen Majid Shadhar + 2 more

BCAG-Net: A multi-branch deep learning framework with Butterworth–Chebyshev filtering and attention mechanisms for accurate cellular network traffic prediction

  • New
  • Research Article
  • 10.1002/pds.70391
Nexus: A Deterministic Linkage Framework for Constructing Longitudinal Real-World Data in Brazil's Public Health System.
  • Jun 1, 2026
  • Pharmacoepidemiology and drug safety
  • Julio Cesar Barbour Oliveira + 9 more

Brazil's Unified Health System (SUS) does not provide a person identifier in the publicly available inpatient system (SIH) microdata, limiting longitudinal analyses. We propose a deterministic linkage framework (Nexus) designed to construct longitudinal real-world datasets under these structural constraints. We first established a cloud-based preprocessing pipeline using a Lakehouse architecture to ingest, standardize, and harmonize raw administrative data from the Ministry of Health's public repository of national databases, 2008-2024, into reproducible inpatient and outpatient tables. Auxiliary sources were integrated to enrich metadata. SIA records containing the encrypted National Health Card (CNS) were curated using internal consistency filters for sex, date of birth (DOB), and postal code (CEP). Linkage with SIH applied a quasi-identifier (CEP, DOB, sex) with an α-shrinkage rule to exclude ambiguous high-density cells, prioritizing specificity by retaining only unique matches. Candidate cohorts varying by α and start year were compared using data quality diagnostics. From 224.7 M unique CNS in SIA (2008-2024), curation yielded 12.9 M patients. Exact-match linkage of SIH hospitalizations produced a final Nexus cohort of 9.2 M patients. Using α = 40 and a 2012 start was associated with improved temporal consistency and stable disease event fractions. Nexus demonstrates that a conservative, transparent deterministic linkage framework can be used to construct a longitudinal cohort under structural data constraints. Coverage is reduced-mainly because CEP is concentrated in high-complexity claims-resulting in a selected subpopulation enriched for specialized care. Accordingly, Nexus is suited for longitudinal analyses of treatment pathways and outcomes, but not for population-level inference or incidence estimation.

  • New
  • Research Article
  • 10.1002/mrc.70090
A Reproducible Workflow for Modelling of 1H to 13C Polarization Transfer Kinetics Using Solid-State NMR.
  • Jun 1, 2026
  • Magnetic resonance in chemistry : MRC
  • D Jacob + 4 more

Quantitative analysis of solid-state NMR data, based on magic-angle spinning with cross-polarization experiments (CP-MAS), often requires extensive signal processing, from the transformation of raw time-domain data (FIDs) to the extraction of quantitative data and the modelling of signal intensity kinetics. Many current workflows rely on semi-manual peak fitting and heterogeneous tools across laboratories for intensity curve modelling, limiting reproducibility and throughput. In this work, we propose a fully reproducible and open workflow combining two key methodological approaches: (1) an adaptive bucketing approach, extraction of relevant variables for analysis (ERVA), implemented in NMRProcFlow application, to automatically segment 13C spectra into chemically relevant spectral regions; and (2) an online modelling platform that allows users to fit intensity curves over contact time with multiple models, guided by objective indicators including fit quality scores and parameter sensitivity metrics. This integrated approach provides a fast, user-friendly and transparent path from FIDs to kinetic model parameters, opening new perspectives for reproducible quantitative solid-state NMR.

  • New
  • Research Article
  • 10.1016/j.dib.2026.112685
Mosquito environmental DNA metabarcoding dataset in water-holding containers in dengue fever-endemic areas of DKI Jakarta, Indonesia.
  • Jun 1, 2026
  • Data in brief
  • Nurhadi Eko Firmansyah + 4 more

Elucidating the transmission dynamics of mosquitoes, facilitating identification of non-invasive species, employing non-visual approaches for larval detection, and integrating next-generation surveillance techniques are pivotal for developing robust and sustainable strategies to prevent dengue and other mosquito-borne diseases. Illumina-based high-throughput sequencing was used to identify the presence of Aedes aegypti in dry and rainy seasons. The other identified mosquito species included Ae. albopictus, Culex pipiens, Cx. nigripalpus, and Armigeres subalbatus. To our knowledge, this represents the first comprehensive application of aquatic environmental DNA (eDNA) techniques to characterize mosquito biodiversity in Indonesia. The raw sequencing data generated in this study have been deposited in the Sequence Read Archive (SRA) of the National Center for Biotechnology Information (NCBI) under BioProject accession number PRJNA1248132. These datasets provide a valuable reference framework for future eDNA-based surveillance efforts and quantitative assessments of mosquito populations in Indonesia's aquatic habitats across different times and seasons.

  • New
  • Research Article
  • 10.1016/j.digbus.2026.100167
LLM-based JSON Mapping and Blockchain Integration for Digital Product Passports
  • Jun 1, 2026
  • Digital Business
  • David Rohrschneider + 3 more

The European Commission envisages Digital Product Passports (DPPs) as a mechanism to enable traceable, transparent, and standardized product data across supply chains. This work presents a modular pipeline that transforms raw sensor data into verifiable DPP records using large language models (LLMs) for data standardization and blockchain technology for tamper-proof storage. The system maps unstructured machine-level data to a standardized JSON format and stores it immutably on the Waves blockchain via smart contracts, thereby enabling auditable, machine-readable records suitable for regulatory use. A novel evaluation dataset is introduced to simulate daily production scenarios with varying mapping complexity. The performance of the system is assessed using both proprietary and open-weight LLMs. Results show that the proprietary model achieves the highest accuracy and lowest latency, while open-weight models perform worse as input complexity increases. Multiple prompting strategies were compared, revealing that direct mapping, via few-shot or zero-shot prompts, consistently delivered higher accuracy than approaches based on generating transformation functions. Structured output formatting was also assessed: While it ensured schema validity, it often compromised mapping reliability by introducing incorrect values, likely due to disruptions in model reasoning from output constraints. The proposed architecture demonstrates reliable end-to-end operation with low latency and is suitable for batch-level deployment in real-world production environments. From a practical perspective, the results clarify trade-offs between model choice, prompting strategy, and operational reliability in automated DPP generation. For policymakers, the findings highlight how choices around schema clarity and data granularity shape system design and operational effort in future DPP implementations. • Modular architecture integrates LLM mapping with blockchain for DPP generation. • LLMs used to map unstructured sensor data into standardized JSON format. • Blockchain ensures verifiable, persistent storage of product passport records. • Novel dataset simulates real-world production with varied mapping complexity. • Local LLMs yield significantly worse performance compared to proprietary LLMs. • Function-based prompting underperforms direct mapping in all tested settings.

  • New
  • Research Article
  • 10.1016/j.dib.2026.112794
A dataset on microbiome alterations in Drosophila melanogaster infected by entomopathogenic nematodes.
  • Jun 1, 2026
  • Data in brief
  • Sreeradha Mallick + 5 more

A dataset on microbiome alterations in Drosophila melanogaster infected by entomopathogenic nematodes.

  • New
  • Research Article
  • 10.1016/j.media.2026.104017
Artificial intelligence in microscopic hair imaging for scalp disorders: From image acquisition to clinical decisions.
  • Jun 1, 2026
  • Medical image analysis
  • Chenquan Gong + 3 more

Artificial intelligence in microscopic hair imaging for scalp disorders: From image acquisition to clinical decisions.

  • New
  • Research Article
  • 10.1016/j.synbio.2026.01.028
Highly biased DNA sequence reconstruction in DNA storage with multi-scale attention mechanism and contrast learning.
  • Jun 1, 2026
  • Synthetic and systems biotechnology
  • Xue Li + 8 more

Highly biased DNA sequence reconstruction in DNA storage with multi-scale attention mechanism and contrast learning.

  • New
  • Research Article
  • 10.1016/j.sysarc.2026.103781
Supporting efficient and verifiable keyword queries on dynamic blockchain data
  • Jun 1, 2026
  • Journal of Systems Architecture
  • Bo Yin + 1 more

Supporting efficient and verifiable keyword queries on dynamic blockchain data

  • New
  • Research Article
  • 10.1016/j.dib.2026.112718
Dataset of physiological signals in the use of advanced driver assistance systems (ADAS).
  • Jun 1, 2026
  • Data in brief
  • Gabriel Martins De Castro + 3 more

This article introduces a dataset that investigates the physiological responses of drivers when using advanced driver assistance systems (ADAS) in real-world traffic conditions. The study, conducted in the Federal District, Brazil, involved seven drivers in controlled driving sessions. The time of day and the days of the week were standardized to ensure comparable traffic conditions. The data collection was centered on ADAS Level 2 systems, specifically the Lane Keeping Assist System (LKAS) and the Forward Collision Warning System (FCWS). The dataset includes five physiological signals: respiration, heart rate, galvanic skin response (GSR), leg muscle activity, and brain activity. These signals were continuously acquired using a dedicated instrumentation system installed in the vehicle. Given the complexity of collecting data under real traffic conditions, the acquisition sessions generated a large volume of raw data. Considerable post-processing was conducted to identify and segment portions of the signals with sufficient integrity for subsequent analysis. The dataset is structured as time-stamped raw signal spreadsheets, each corresponding to a specific driver and direction of the pre-established route (outbound and return). Such organization enables researchers to navigate the dataset easily, explore specific segments of interest, and conduct comparative analyses across participants and varying traffic conditions. The dataset is relevant to researchers in biomedical signal processing, driver state monitoring, intelligent transportation systems, and human-machine interaction. It may be used by academic laboratories investigating physiological responses during driving tasks, as well as by engineers and developers working on advanced driver assistance systems (ADAS), including automotive manufacturers and ADAS technology suppliers. The dataset, which includes synchronized physiological and vehicle dynamics data collected under real traffic conditions may contribute to the study of human responses during semi-automated driving, supporting research and development of driver-centered mobility technologies.

  • New
  • Research Article
  • Cite Count Icon 3
  • 10.1016/j.amf.2025.200269
In-situ quality monitoring in LPBF via melt-pool radiation: Compressive sampling and deep feature extraction
  • Jun 1, 2026
  • Additive Manufacturing Frontiers
  • Hanxiang Zhou + 7 more

In-situ monitoring methods and deep learning models are increasingly being used for the quality assessment of parts fabricated using laser powder bed fusion to overcome the limitations of poor process repeatability. However, the massive data collection required for part-quality monitoring results in high transmission loads and storage costs. To address this problem, this study utilized the compressed sensing theory to acquire compressed photodiode signals. These signals were then used to train and test convolutional neural networks (CNN) to identify the lack-of-fusion, normal, and keyhole modes. At a compressive-sampling rate of 25%, the classification accuracy decreased from 93.1% (raw signals) to 79.3%. However, increasing the compression rate from 25% to 90% did not significantly decrease the classification accuracy. The linear mapping of the raw signal via a Gaussian measurement matrix causes coordinate information folding, thereby impairing the representation of latent features. Therefore, Gaussian process modeling was adopted for the features extracted using a pretrained CNN to mitigate the temporal information collapse and allow the compressed signals to achieve an accuracy comparable to that of the raw data. Furthermore, the sparsity and rank complexity of the melt-pool radiation signals were evaluated using sparse representation and principal component analysis.

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • .
  • .
  • .
  • 10
  • 1
  • 2
  • 3
  • 4
  • 5

Popular topics

  • Latest Artificial Intelligence papers
  • Latest Nursing papers
  • Latest Psychology Research papers
  • Latest Sociology Research papers
  • Latest Business Research papers
  • Latest Marketing Research papers
  • Latest Social Research papers
  • Latest Education Research papers
  • Latest Accounting Research papers
  • Latest Mental Health papers
  • Latest Economics papers
  • Latest Education Research papers
  • Latest Climate Change Research papers
  • Latest Mathematics Research papers

Most cited papers

  • Most cited Artificial Intelligence papers
  • Most cited Nursing papers
  • Most cited Psychology Research papers
  • Most cited Sociology Research papers
  • Most cited Business Research papers
  • Most cited Marketing Research papers
  • Most cited Social Research papers
  • Most cited Education Research papers
  • Most cited Accounting Research papers
  • Most cited Mental Health papers
  • Most cited Economics papers
  • Most cited Education Research papers
  • Most cited Climate Change Research papers
  • Most cited Mathematics Research papers

Latest papers from journals

  • Scientific Reports latest papers
  • PLOS ONE latest papers
  • Journal of Clinical Oncology latest papers
  • Nature Communications latest papers
  • BMC Geriatrics latest papers
  • Science of The Total Environment latest papers
  • Medical Physics latest papers
  • Cureus latest papers
  • Cancer Research latest papers
  • Chemosphere latest papers
  • International Journal of Advanced Research in Science latest papers
  • Communication and Technology latest papers

Latest papers from institutions

  • Latest research from French National Centre for Scientific Research
  • Latest research from Chinese Academy of Sciences
  • Latest research from Harvard University
  • Latest research from University of Toronto
  • Latest research from University of Michigan
  • Latest research from University College London
  • Latest research from Stanford University
  • Latest research from The University of Tokyo
  • Latest research from Johns Hopkins University
  • Latest research from University of Washington
  • Latest research from University of Oxford
  • Latest research from University of Cambridge

Popular Collections

  • Research on Reduced Inequalities
  • Research on No Poverty
  • Research on Gender Equality
  • Research on Peace Justice & Strong Institutions
  • Research on Affordable & Clean Energy
  • Research on Quality Education
  • Research on Clean Water & Sanitation
  • Research on COVID-19
  • Research on Monkeypox
  • Research on Medical Specialties
  • Research on Climate Justice
Discovery logo
FacebookTwitterLinkedinInstagram

Download the FREE App

  • Play store Link
  • App store Link
  • Scan QR code to download FREE App

    Scan to download FREE App

  • Google PlayApp Store
FacebookTwitterTwitterInstagram
  • Universities & Institutions
  • Publishers
  • R Discovery PrimeNew
  • Ask R Discovery
  • Blog
  • Accessibility
  • Topics
  • Journals
  • Open Access Papers
  • Year-wise Publications
  • Recently published papers
  • Pre prints
  • Questions
  • FAQs
  • Contact us
Lead the way for us

Your insights are needed to transform us into a better research content provider for researchers.

Share your feedback here.

FacebookTwitterLinkedinInstagram
Cactus Communications logo

Copyright 2026 Cactus Communications. All rights reserved.

Privacy PolicyCookies PolicyTerms of UseCareers