Related Topics
Articles published on Data structure
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
75507 Search results
Sort by Recency
- New
- Research Article
- 10.1016/j.jss.2026.112787
- Jun 1, 2026
- Journal of Systems and Software
- Carlos J Fernandez-Candel + 2 more
Towards the automated extraction and refactoring of NoSQL schemas from application code
- New
- Research Article
- 10.1016/j.knosys.2026.115800
- Jun 1, 2026
- Knowledge-Based Systems
- Parwinder Singh + 3 more
• Offers a new vision on automated data harmonization in Dataspace (DS) systems. • Introduces LLM-based methods for scalable DS ingestion of heterogeneous datasources. • Presents a system with Harmonizer, Transformer, Evaluator components for ingestion. • Demonstrated an automated data ingestion prototype using LLM agents. • Validates the system with healthcare use case harmonizing heterogeneous data sources. Dataspaces (DS) enable stakeholders to collaborate on innovative, data-driven services by integrating data across domains. However, the realization and adoption of DS remain challenging due to domain-specific heterogeneity at the system, service, and data levels. While system and service-level heterogeneity can often be addressed through standards, data-level heterogeneity, namely data structures and semantics variations, remains challenging. To effectively ingest data into the DS, two communication endpoints must correctly interpret each other’s data models, therefore, DS ecosystems rely on “harmonization”, the process of generating a unified target data model from heterogeneous sources and transforming incoming data accordingly. Currently, harmonization and transformation are performed manually whenever new data sources are integrated. This is time-consuming, costly, and difficult to scale, posing a critical barrier to the realization and adoption of DS in practice. This study proposes a novel methodology for automated data harmonization during ingestion into DS ecosystems. The approach integrates harmonization, transformation, and human-in-the-loop evaluation within an automated system powered by modern LLM-based AI agents. These agents address data-level heterogeneity and generate harmonized target data models, representing a substantial departure from current manually-handled data harmonization. The system is validated through a healthcare use case, demonstrating its practical feasibility for harmonization during data ingestion into the DS. Overall, this work provides a foundational step toward seamless, efficient, and scalable data integration in DS. By automating data harmonization, it delivers substantial value to industry digital solutions as well as domains where data heterogeneity persists, including IoT or Big Data platforms.
- New
- Research Article
- 10.1016/j.mib.2026.102755
- Jun 1, 2026
- Current opinion in microbiology
- Mark J Calcott + 3 more
Evolutionary insights and guidelines to achieve effective and high-yield non-ribosomal peptide and polyketide engineering.
- New
- Research Article
- 10.1093/nargab/lqag044
- Jun 1, 2026
- NAR genomics and bioinformatics
- Abdelraouf O Dapour + 9 more
RNA structure critically governs biological function in both physiological and pathological contexts, making high-resolution structural maps essential for RNA-targeted therapeutics. Yet, despite recent advances, well-validated structural targets for drug design remain limited. To help bridge this gap, we generated the first genome-scale map of the human RNA structurome by applying ScanFold to >230 000 annotated human pre-mRNA transcripts, identifying sequences likely evolved to form highly stable and functional secondary structures. We also performed a global analysis of regions with z-scores ≤ -2 and statistically characterized their two-dimensional folding patterns. In addition, we developed the RNA-Annotator Pipeline to integrate 20 diverse biological annotations, such as tissue-specific expression and protein interactions, with the structural data. Our results reveal local folding propensities and unusually stable structures with high-confidence architectures, providing insights for prioritizing RNA targets and guiding therapeutic design, including antisense oligonucleotides and small molecules. All ScanFold results are publicly available through RNAStructuromeDB. Using the RNA-Annotator Pipeline, analysis of SMN1 and SMN2 pre-mRNAs showed that a single C-to-T transition in SMN2 induces structural rearrangements that disrupt a critical splicing enhancer. This toolkit establishes an integrated workflow that enables researchers to explore RNA structure-function relationships and accelerate advances in RNA-targeted drug discovery and RNA biology.
- New
- Research Article
- 10.1093/jamiaopen/ooag070
- Jun 1, 2026
- JAMIA open
- Aman Mohapatra + 15 more
To develop a large-language-model (LLM)-centric workflow flow extraction and migration of clinician-documented colonoscopy recall recommendations from unstructured reports and letters during an enterprise-wide electronic health record (EHR) transition. A multi-stage workflow [Optical Character Recognition (OCR) -> LLM -> structured fields] was built around a central GPT-4 Turbo inference step following prompt optimization. Validation was performed on a held-out set (N = 326 notes) using 2-clinician consensus and then benchmarked against traditional rule-based natural-language-processing (NLP) (spaCy v3). Layered quality control-manual review, field validation, and anomaly detection-was used to assess workflow results prior to upload (N = 118181 total patients). Prompt optimization enabled GPT-4 Turbo to achieve perfect concordance with clinician review in a small test set (macro-F1 = 1.0; N = 100 patients). Expanded validation on a held-out set demonstrated improved F1 (0.89; CI = [0.65, 0.92], N = 326) relative to a traditional rule-based NLP approach (F1 = 0.78; CI = [0.58, 0.82]). The system processed 118181 records in ≈9 hours (≈2 s/record) at a direct implementation cost of ∼$12000. An LLM-driven workflow safely migrated preventive-care data at population scale, with potential accuracy improvements over traditional rule-based NLP approaches and substantial reductions in time and cost relative to manual review. LLMs can play a valuable role in high-quality structuring of clinical data, preserving longitudinal care continuity during EHR modernization.
- New
- Research Article
- 10.1016/j.artmed.2026.103392
- Jun 1, 2026
- Artificial intelligence in medicine
- Ruben Branco + 4 more
PatientFlow: Learning to generate mixed-type longitudinal clinical data with flow matching.
- New
- Research Article
- 10.1016/j.mex.2026.103882
- Jun 1, 2026
- MethodsX
- Hideaki Shima + 1 more
RefLaTEA: a robust visualization and analysis framework leveraging background data for enhanced insight.
- New
- Research Article
- 10.1016/j.automatica.2026.112949
- Jun 1, 2026
- Automatica
- Du Ho + 2 more
This paper concerns a particular property of the basic instrumental variable (IV) estimator that is useful for multiple-input multiple-output (MIMO) modeling problems where it is not obvious how to partition the available signals between the sets of inputs and outputs. In general, a repartitioning of the input and output signals will result in a different model compared to the original input–output choice. It is important to distinguish cases where a repartitioning results in an algebraically equivalent model and cases where the resulting model transformation is more complex and depends also on particular system and signal properties. The latter situation typically occurs when models are estimated from data. We here show that the basic IV estimator is an exception since it provides algebraically equivalent estimates regardless of true system structure, noise properties, or amount of data. This equivalence result is illustrated in two simulation examples.
- New
- Research Article
- 10.1016/j.bbr.2026.116218
- Jun 1, 2026
- Behavioural brain research
- Hongying Daisy Dai + 3 more
Neuroanatomical variability in brain surface area and cortical volumes associated with adolescent e-cigarette use.
- New
- Research Article
- 10.1016/j.jad.2026.121413
- Jun 1, 2026
- Journal of affective disorders
- Ting Zhang + 8 more
Transcranial direct current stimulation targeting dmPFC ameliorates somatic symptoms in depression: A randomized control trial.
- New
- Research Article
1
- 10.1016/j.compbiolchem.2025.108870
- Jun 1, 2026
- Computational biology and chemistry
- Simone Lucà + 2 more
Prefix-free parsing (Boucher et al., 2019) is a highly effective heuristic for computing text indexes for very large amounts of biological data. The algorithm constructs a data structure, the prefix-free parse (PFP) of the input, consisting of a dictionary and a parse, which is then used to speed up computation of the final index. In this paper, we study the size of the PFP, which we refer to as π, and show that it is a powerful tool in its own right. To show this, we present two use cases. We first study the application of π as a repetitiveness measure of the input text, and compare it to other currently used repetitiveness measures, including z (the number of Lempel-Ziv phrases), r (the number of runs of the Burrows-Wheeler Transform), and δ (the text's substring complexity). We then turn to the use of π as a measure for pangenome openness. In both applications, our results are similar to existing measures, but our tool, in almost all cases, is more efficient than those computing the other measures, both in terms of time and space, sometimes by orders of magnitude. We close the paper with a detailed systematic study of the parameter choice for PFP (window size w and modulus p). This gives rise to interesting open questions. AVAILABILITY AND IMPLEMENTATION:: The source code is available at https://github.com/simolucaa/piPFP. The accession codes for all the datasets used and the raw results are available at https://github.com/simolucaa/piPFP_experiments.
- New
- Research Article
- 10.1016/j.artmed.2026.103390
- Jun 1, 2026
- Artificial intelligence in medicine
- Haotian Jiang + 6 more
Precise estimation of tissue microstructure with hybrid graph transformer.
- New
- Research Article
- 10.1016/j.sysarc.2026.103781
- Jun 1, 2026
- Journal of Systems Architecture
- Bo Yin + 1 more
Supporting efficient and verifiable keyword queries on dynamic blockchain data
- New
- Research Article
- 10.1016/j.chiabu.2026.108072
- Jun 1, 2026
- Child abuse & neglect
- Cindy Blackstock + 4 more
Kids count, or do they? The importance of accurate data for First Nations children in Canada's child welfare system.
- New
- Research Article
- 10.1016/j.talanta.2026.129500
- Jun 1, 2026
- Talanta
- Jiamu Ma + 14 more
Multidimensional chromatographic fingerprint fusion with machine learning: Entropy-based feature evaluation for TCM quality marker discovery.
- New
- Research Article
- 10.1016/j.dche.2026.100300
- Jun 1, 2026
- Digital Chemical Engineering
- Yanguo Cheng + 2 more
Generalized linear mixed modeling for spatiotemporal data outlier detection of emerging contaminants: A multi-stage strategy
- New
- Research Article
- 10.1016/j.aap.2026.108508
- Jun 1, 2026
- Accident; analysis and prevention
- Junlan Chen + 6 more
Traffic crash data augmentation with multi-type variables using hybrid VAE-Diffusion generative neural networks for enhancing crash frequency modeling.
- New
- Research Article
- 10.1016/j.fusengdes.2026.115710
- Jun 1, 2026
- Fusion Engineering and Design
- S.Y Liu + 8 more
Verification of plasma equilibrium data mapping under the integrated modelling and analysis framework for the HL-3 tokamak
- New
- Research Article
- 10.1016/j.dib.2026.112721
- Jun 1, 2026
- Data in brief
- Indira R Guzman + 9 more
Introducing "ELLAS Survey Dataset" an open resource about factors that influence career interest and leadership in STEM in Bolivia, Brazil, and Peruo.
- New
- Research Article
- 10.1016/j.engstruct.2026.122503
- Jun 1, 2026
- Engineering Structures
- Rolando Chacón + 3 more
Semantic digital twins for masonry bridges: Structuring geometrical, material and assessment field data