Articles published on Binary data
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
7464 Search results
Sort by Recency
- New
- Research Article
- 10.1016/j.dark.2026.102315
- Jun 1, 2026
- Physics of the Dark Universe
- Faizuddin Ahmed + 2 more
Constraining Kalb-Ramond gravity with cloud of strings using EHT shadow observations and X-ray binary QPO data
- New
- Research Article
- 10.1021/acs.est.6c00777
- May 16, 2026
- Environmental science & technology
- Yaqi Wang + 5 more
Rare earth elements (REEs) are emerging contaminants with escalating environmental releases. However, REE health risk assessment faces critical challenges due to inconsistent cytotoxicity benchmarks and complex multielement exposures. Here, we developed a comprehensive framework to systematically assess the cytotoxic risks of 16 rare earth ions (REIs). The framework integrates single and binary exposures of 16 REIs across eight human cell lines, machine learning models, and population exposure scenario predictions. The high-throughput screening of single exposure on three end points revealed common disruption of cellular energy metabolism, identifying ATP depletion as a robust benchmark point of departure (BPOD). On the basis of the BPOD, 120 binary combinations of 16 REIs indicated predominantly antagonistic interactions among REEs. The consensus machine learning model trained on the single and binary exposure data sets demonstrated robust predictive performance for mixture cytotoxicity. Application of the model to human exposure scenarios showed high risks for the population in mining areas and negligible risks for the general population. The significant linear correlation of cytotoxic responses between cell lines and human primary hepatocytes confirmed the human health relevance and model reliability. Our study provides a toolkit to establish a standardized, scalable, and integrative risk assessment for REEs and other emerging multielement contaminants.
- New
- Research Article
- 10.1080/02664763.2026.2672564
- May 14, 2026
- Journal of Applied Statistics
- Stephen A Collins-Elliott
This paper provides a new method to construct and harmonize partial seriations from binary matrices. There have been many methods proposed to perform a seriation on a matrix of binary data, in which rows and columns are permuted to arrive at an optimal ordering of the matrix elements. However, no one method is assured to produce the best result, as routines are highly susceptible to the particular distribution of 1s in the matrix. In certain cases, the omission of rows and columns may yield in an improved seriation, but this leaves investigators with the tedious job of slicing matrices and re-running analyses. Working with several partial seriations raises the problem of how to combine such ‘strands’ (subsets of the data that have been seriated) to obtain a consensus out of potentially discrepant orderings. This paper therefore proposes an agglomerative routine involving simple linear regression, called Lakhesis. According to optimality measures, Lakhesis has the capacity to outperform conventional approaches. The R lakhesis package also provides a graphical interface. Investigators can explore binary matrices and seriate partial subsets using correspondence analysis (both simple CA and Procrustes-fit CA). The partial seriations are then ‘lakhesized’ into a single consensus seriation.
- Research Article
- 10.1186/s40359-026-04692-2
- May 11, 2026
- BMC psychology
- Zilong Ma + 5 more
Children with leukemia commonly experience internalizing and externalizing problems, yet their symptom-level interactions remain unclear. Network analysis provides a framework to examine associations among co-occurring symptoms and to explore potential intervention-relevant targets within a system-level context. A total of 1,126 children with leukemia in the rehabilitation phase were included. Emotional and behavioral problems were assessed using the parent-reported Strengths and Difficulties Questionnaire (SDQ). Symptom endorsement rates were calculated, and an Ising network model based on binary data was estimated. Central and bridge symptoms were identified using expected influence metrics. Simulation analyses were conducted using the NodeIdentifyR algorithm (NIRA) to model symptom perturbations within the network. These simulation-based perturbations were applied in the overall network and internalizing-externalizing subnetworks to examine system-level responses. "Motor fidgeting," "peer victimization," and "low mood" showed the highest centrality, with "motor fidgeting" and "peer victimization" also linking internalizing and externalizing problems. In simulation analyses, simulated alleviation of "impulsivity" was associated with the largest reduction in overall symptom burden (19.71%), whereas simulated aggravation of "motor fidgeting" showed the largest increase (22.76%). Externalizing-to-internalizing simulations showed patterns broadly consistent with the overall network structure. In internalizing-to-externalizing simulations, "low peer acceptance" showed the largest simulated reduction in externalizing symptoms (9.75%), whereas "low mood" showed the largest simulated increase (12.79%). Internalizing and externalizing problems in children with leukemia form an interconnected symptom system. Simulation findings highlight symptom-specific patterns related to impulse control, peer relationships, and emotion regulation, which may inform early psychosocial monitoring and provide a basis for mechanism-informed future intervention strategies in pediatric oncology. These findings should be interpreted as hypothesis-generating evidence derived from cross-sectional data rather than as evidence of causal intervention effects.
- Research Article
- 10.3390/jintelligence14050077
- May 2, 2026
- Journal of Intelligence
- Juyoung Jung + 2 more
In intelligence research, the sharing of item response data from cognitive ability assessments is often restricted by privacy concerns, while traditional parametric simulation methods frequently fail to capture complex response dependencies. This study proposes a neural network copula (NNC) framework for generating synthetic dichotomous item response data that preserves essential psychometric properties without revealing sensitive examinee information. By decoupling the modeling of marginal item probabilities from the dependence structure using a deep autoencoder and kernel density estimation, the framework accommodates the discrete nature of binary item response data while minimizing distributional assumptions. Validation against large-scale empirical data demonstrated high correspondence across multiple facets. At the data consistency level, the NNC-based synthetic data reproduced total score distributions and inter-item correlations. Psychometrically, the method yielded consistent item characteristic curve parameter estimates, item fit statistics, and test information functions. Furthermore, Monte Carlo replications demonstrated algorithmic stability and inferential precision.
- Research Article
- 10.1007/s00402-026-06325-0
- May 2, 2026
- Archives of orthopaedic and trauma surgery
- Omar Moussa + 9 more
Limb salvage centers have increased in number over time, but lack standardized defining criteria. This systematic review aimed to assess organizational features of limb salvage centers and determine whether orthoplastic centers, in comparison to vascular limb salvage centers, represent a distinct care model that may benefit from standardization. We conducted a systematic review of publications related to limb salvage centers by searching MEDLINE, Embase, Web of Science, and Cochrane databases from their inception through 2024. We quantified binary data extraction as a reporting score of 26 organizational features across six structural care domains for limb salvage centers, based on a validated quality measurement framework. Organizational features differentiating distinct center types were identified to establish a quality framework for orthoplastic centers. Statistical comparisons between center types were performed using appropriate tests (p < 0.05). Of 118 included studies, orthoplastic (n = 43) and vascular (n = 48) centers represented 77% of all studies. Recent increases in orthoplastic publications show substantial variability in organizational features. Orthoplastic center literature more frequently reported plastic surgery consultation criteria (p < 0.001), surgical outcomes (p < 0.001), and centralized network integration (p ≤ 0.006), highlighting acute reconstructive approaches. Vascular center studies documented significantly more organizational team features (p < 0.001) and quality systems (p = 0.033), reflecting established care frameworks for chronic disease management. Six organizational features characterized orthoplastic centers with > 70% prevalence, providing a benchmark framework with standardization priorities. Orthoplastic limb salvage centers demonstrate distinct care paradigms that benefit from standardization. Our findings suggest structural benchmarks to support the need for standardized development of orthoplastic limb salvage centers.
- Research Article
1
- 10.1111/bph.70391
- May 1, 2026
- British journal of pharmacology
- János Tibor Fekete + 2 more
Network meta-analysis (NMA) enables the simultaneous comparison of multiple treatments by combining direct and indirect evidence across a network of studies. While its application is rapidly expanding in pharmacological research and clinical guideline development, performing NMA typically requires advanced statistical knowledge and access to specialized software, limiting its broader adoption. Here, we present NetMetaEasy, a user-friendly, web-based platform that allows rapid execution and visualization of network meta-analyses from standard input formats, with minimal technical expertise required. NetMetaEasy is an R/Shiny-based platform supporting binary, continuous and summary-effect data (e.g. odds ratios, hazard ratios). It offers frequentist and Bayesian network meta-analysis with P-score and SUCRA-based treatment ranking. It generates standard outputs including network diagrams, forest plots, netleague tables, inconsistency assessments and funnel plots using fixed- and random-effects models implemented via the netmeta and gemtc R packages. We demonstrate the functionality of NetMetaEasy using a real-world pharmacological dataset evaluating the cardiovascular outcomes of sodium/glucose cotransporter 2 (SGLT2) inhibitors. The platform successfully generated all standard NMA outputs, identified treatments with statistically significant benefits over placebo and showed no evidence of small-study bias. The entire workflow, from data upload to interpretation-ready plots,was completed within minutes. The registration-free NetMetaEasy analysis platform is accessible at http://www.metaanalysisonline.com/netmetaeasy. NetMetaEasy provides an accessible solution for conducting network meta-analyses by streamlining data processing, analysis and visualization into a single online interface, thereby enabling more widespread and rapid evidence synthesis in pharmacology, particularly for researchers without formal statistical training.
- Research Article
- 10.1016/j.cpc.2026.110058
- May 1, 2026
- Computer Physics Communications
- Fabricio Ruiz + 2 more
Corrections to binary enthalpies and elemental data in HEAPS software for reliable high-entropy alloy design
- Research Article
- 10.1016/j.neuro.2026.103433
- May 1, 2026
- Neurotoxicology
- Caroline C Swain + 9 more
Paraquat-induced rodent models of Parkinson's disease: A PRISMA-compliant systematic review and meta-analysis.
- Research Article
- 10.3390/cancers18091427
- Apr 30, 2026
- Cancers
- Dwayne G Tally + 6 more
Digestive tract cancers, like most other cancers, are usually categorized based on cell or tissue of origin. Molecular clustering based on the transcriptome often produces the same classification. We developed a new method, Newmanization, to reduce underlying tissue signals from transcriptomic analysis. To test our method, we downloaded data on 1635 samples of digestive tract cancers from The Cancer Genome Atlas. The available data includes transcriptomic data by RNA-Seq, as well as binary mutation allele frequency data by whole exome sequencing. We compared, using silhouette widths and visualization by dimension reduction plots, the effectiveness of Newmanized transcriptome and mutation data to separate digestive tract cancers. The Newmanized transcriptome clusters have clearer separation and larger average silhouette widths. Feature analysis of each cluster for Newmanized transcriptomic data and mutation data revealed that clusters determined with Newmanized data contained more mRNAs present at higher frequencies than clusters defined by mutation data. This suggests that the Newmanized method holds great potential for advancing personalized transcriptomic medicine.
- Research Article
- 10.2196/81500
- Apr 30, 2026
- JMIR medical informatics
- Anoeska Schipper + 9 more
Most clinically relevant information in emergency department (ED) visits is documented in free text, limiting reuse for research and clinical decision support. Despite growing interest in large language model (LLM)-based feature extraction, very few studies have examined it directly on ED reports. Existing work has mainly addressed binary tasks and rarely evaluated their impact on downstream prediction models. Furthermore, evidence for small multilingual LLMs remains limited, especially for underrepresented languages such as Dutch. Locally deployable LLMs could enable automated feature extraction for decision support systems without increasing physician workload. We aim to evaluate whether a small open-source LLM (Qwen 2.5:14B) can automatically extract 16 clinical signs and symptoms from ED reports and use these as input for an appendicitis prediction model. LLM performance under minimal and optimized 0-shot prompts was assessed against researcher annotations (reference standard) and physician annotations. This retrospective study used 336 ED reports from patients presenting with acute abdominal pain to a Dutch teaching hospital (2016-2023). One hundred reports were randomly selected to develop a minimal and an optimized 0-shot prompt strategy. The remaining 236 reports, reserved for evaluation, were annotated by 2 ED physicians and processed by the LLM to extract 16 signs and symptoms, covering binary, multiclass, and multilabel classification tasks. These features were used as input to the HIVE (History, Intake, Vitals, Examination) appendicitis prediction model. LLM extraction accuracy, sensitivity, and specificity were measured against the researcher's (reference standard) and physician annotations. The HIVE model's area under the receiver operating characteristic curve was evaluated using LLM-extracted vs physician-annotated features. Among 336 ED reports from patients with acute abdominal pain (median age 41, IQR 22-62 years, 205/336, 61% female), 50% (167/336) had appendicitis. The LLM achieved weighted average accuracies of 0.910 (95% CI (0.018) with minimal prompts and 0.929 (95% CI ±0.016) with optimized prompts, vs 0.961 (95% CI ±0.012) and 0.951 (95% CI ±0.015) for physicians. Corresponding HIVE model area under the receiver operating characteristic curves were 0.871 (95% CI ±0.019) and 0.911 (95% CI ±0.014) with LLM inputs under the minimal and optimized prompts, compared to 0.917 (95% CI ±0.015) and 0.924 (95% CI ±0.018) for physician inputs. A small locally deployable multilingual LLM can approach physician-level accuracy in extracting structured binary, multiclass, and multilabel clinical data from free-text Dutch ED reports, while preserving patient privacy, interpretability, and statistical transparency for downstream diagnostic modeling.
- Research Article
- 10.1021/acs.est.5c18296
- Apr 28, 2026
- Environmental science & technology
- Yu Ma + 9 more
Reproductive toxicity is challenging to assess because of its diverse phenotypes and complex mechanisms; target-prediction models enable early identification of toxic potential. However, affinity prediction models suffer from limited training data and poor generalization to novel chemical scaffolds, which hinders their application in prospective toxicity assessment. Here, we propose a classification pretraining-regression fine-tuning framework that leverages large-scale binary activity data to learn generalizable compound-protein interaction patterns. Systematic evaluations across six dual-encoder models and three data-split strategies demonstrate consistent performance gains. Under the most stringent cluster-split scenario, our framework achieved an average R2 improvement of 0.324, demonstrating substantially enhanced generalization to novel chemotypes. Using the best-performing model (R2 = 0.797; MSE = 0.370), we predicted compound affinities across 81 reproductive targets, generating target affinity spectra that distinguished toxicity effects, identified both broad-spectrum and target-specific structural alerts, and revealed key targets for six sex-specific reproductive diseases. By predicting binding affinity, our framework not only enhances generalization but also enables interpretable, mechanism-guided reproductive toxicity assessment.
- Research Article
- 10.1080/01621459.2026.2662438
- Apr 25, 2026
- Journal of the American Statistical Association
- Davide Agnoletto + 2 more
This article is motivated by challenges in conducting Bayesian inferences on unknown discrete distributions, with a particular focus on count data. To avoid the computational disadvantages of traditional mixture models, we develop a novel Bayesian predictive approach. In particular, our Metropolis-adjusted Dirichlet (mad) sequence model characterizes the predictive measure as a mixture of a base measure and Metropolis-Hastings kernels centered on previous data points. The resulting mad sequence is asymptotically exchangeable and the posterior on the data generator takes the form of a martingale posterior. This structure leads to straightforward algorithms for inference on count distributions, with easy extensions to multivariate, regression, and binary data cases. We obtain a useful asymptotic Gaussian approximation and illustrate the methodology on a variety of applications.
- Research Article
- 10.55041/ijsrem61055
- Apr 24, 2026
- INTERNATIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT
- Dr.V Subbaramaiah + 6 more
Abstract - Deepfakes are artificially generated multimedia content that can convincingly mimic real human faces and voices using advanced AI techniques such as Generative Adversarial Networks (GANs). This poses serious ethical, social, and security challenges in digital communication. To address this issue, the proposed project presents a Multimodal Deepfake Detection System that integrates image, video, and audio analysis pipelines within a unified framework. The system employs Efficient Net-based CNN for image forgery detection, CNN combined with Bi-LSTM for temporal video analysis, and a 1D-CNN with LSTM for detecting manipulated or cloned audio. The predictions from these modalities are combined using a Fuzzy Fusion Engine, which intelligently weights each confidence score to produce a final verdict with high accuracy and interpretability. The model is trained using public Deepfake datasets such as Face Forensics++, Celeb-DF, and DFDC, with binary cross-entropy loss, data augmentation, and early stopping to ensure stable convergence and better generalization. The trained models are deployed on Hugging Face Spaces, while the web interface is hosted on Vercel, enabling real-time Deepfake detection for users through a browser interface. This approach enhances detection accuracy by leveraging multimodal evidence (visual, temporal, and auditory), improves generalization across datasets, and provides an explainable and efficient solution to combat the growing threat of Deepfakes. Key Words: Deepfake Detection, Multimodal Learning, Bi-LSTM, Fuzzy Fusion, Convolutional Neural Network (CNN), Face Forensics++, Celeb DF, DFDC, Machine Learning, Real-Time Detection, Hugging Face.
- Research Article
- 10.1080/01621459.2026.2625509
- Apr 22, 2026
- Journal of the American Statistical Association
- Shirong Xu + 2 more
Paired comparison data, where users evaluate items in pairs, play a central role in ranking and preference learning tasks. While ordinal comparison data intuitively offer richer information than binary comparisons, this article challenges that conventional wisdom. We propose a general parametric framework for modeling ordinal paired comparisons without ties. The model adopts a generalized additive structure, featuring a link function that quantifies the preference difference between two items and a pattern function that governs the distribution over ordinal response levels. This framework encompasses classical binary comparison models as special cases, by treating binary responses as binarized versions of ordinal data. Within this framework, we show that binarizing ordinal data can significantly improve the accuracy of ranking recovery. Specifically, we prove that under the counting algorithm, the ranking error associated with binary comparisons exhibits a faster exponential convergence rate than that of ordinal data. Furthermore, we characterize a substantial performance gap between binary and ordinal data in terms of a signal-to-noise ratio (SNR) determined by the pattern function. We identify the pattern function that minimizes the SNR and maximizes the benefit of binarization. Extensive simulations and a real application on the MovieLens dataset further corroborate our theoretical findings. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work.
- Research Article
- 10.1093/pasj/psag042
- Apr 22, 2026
- Publications of the Astronomical Society of Japan
- Shinjirou Kouzuma
Abstract The orbital inclination of an eclipsing binary is generally determined through light curve analysis. Binary parameters in the light curve analysis are typically constrained through the use of optimization and sampling techniques. We propose a new simple method, based on the derivatives of light curves, for estimating the orbital inclinations of overcontact systems. Our sample consists of 89670 synthetic light curves for overcontact binaries, covering a parameter space typical of overcontact systems. We classified the sample light curves on the basis of a recently proposed classification scheme: DP, SPp, SPb, SPf, and SPs types. For each type, we found that the orbital inclination is closely associated either with the time interval between local extrema in the derivatives of light curves or with the depth of the local minimum at phase 0.5 in the second derivative. Using regression analysis of the identified associations, we developed empirical formulae to estimate the orbital inclinations for each type of light curve. We also provide the associated uncertainties for the estimated inclinations. Application of the proposed method to real overcontact binary data demonstrated that our method can reasonably estimate both the inclinations and their uncertainties.
- Research Article
- 10.1007/s11222-026-10888-8
- Apr 22, 2026
- Statistics and Computing
- Francis K C Hui + 1 more
Abstract Multivariate binary data are widely collected in many disciplines, including in finance, psychometrics, and ecology. Often, many binary responses are driven by only a subset of predictors, and groups of binary responses may exhibit similar effects to a predictor. Motivated by a survey containing presence-absence records of 22 demersal fish species recorded across the U.S. Northeast shelf, we propose a novel method for simultaneous coefficient clustering and variable selection in multivariate binary data using a penalized Ising regression model. The Ising regression model formulates an explicit joint distribution for a binary response vector via main and pairwise interaction coefficients, where the former is modeled as a function of covariates and the latter captures conditional dependence relationships between responses. To cluster coefficients within each covariate, and encourage sparsity across both covariates and pairwise interaction coefficients, we propose to augment the Ising regression model with adaptive fused lasso and adaptive lasso penalties. Such a structured penalty to encourage simultaneous sparsity and groupings is aligned with goals of achieving sparse species-covariate relationships, and homogeneity of environmental responses across species, in the motivating demersal fish survey. Through a reparametrization, we show that the proposed estimator can be efficiently obtained by fitting a single, adaptive lasso logistic regression model. Simulation studies and an application to the demersal fish survey demonstrate competitive performance of the proposed method relative to several existing Ising regression models for multivariate binary data, and leads to an interpretable and parsimonious set of response-covariate relationships.
- Research Article
- 10.3390/e28040479
- Apr 21, 2026
- Entropy (Basel, Switzerland)
- Michel Broniatowski + 1 more
We propose a sequential design method aiming at the estimation of an extreme quantile based on a sample of binary data corresponding to peaks over a given threshold. This study is motivated by an industrial challenge in material reliability and consists of estimating a failure quantile from trials whose outcomes are reduced to indicators of whether the specimen has failed at the tested stress levels. The proposed approach relies on a splitting strategy that decomposes the target extreme probability into a product of higher-order conditional probabilities, enabling a progressive exploration of the tail of the distribution through sampling under truncated laws. We consider GEV and Weibull models for the underlying distribution, and the sequential estimation of their parameters is carried out using an enhanced maximum likelihood procedure specifically adapted to binary data, addressing the substantial uncertainty inherent to such limited information.
- Research Article
- 10.1021/acs.molpharmaceut.5c01889
- Apr 20, 2026
- Molecular pharmaceutics
- Jonas Habicht + 2 more
Predicting the solubility of active pharmaceutical ingredients (APIs) is essential throughout drug development. However, state-of-the-art modeling approaches require system-specific data sets for parameter estimation and are resource intensive. This work introduces a new method that integrates adaptive machine learning (ML) methods with PC-SAFT modeling to vastly reduce requirements of experimental data. Instead of extensive experimental campaigns of solubility measurements, only the molecular structure and melting properties of the API are needed - information onten available in literature or easily measured. The ML framework applied in this work supplies PC-SAFT parameters for APIs. With solvent parameters already available from literature, this novel approach provided highly accurate solubility estimations for 21 APIs in pure solvents (R2 = 0.83 without using any binary data and R2 = 0.98 using a single binary data point), as well as for mixed solvents, closely matching literature data. Compared to prior models, this hybrid method is more generalizable, consistent, and efficient, streamlining the workflow and providing reliable predictions with minimal experimental effort. By making a thermodynamic-based solubility assessment available early in process development, it outperforms state-of-the-art models that demand significantly more experimental input. The results of this work indicate that the newly developed ML framework can be efficiently applied to provide PC-SAFT parameters for APIs with minimal need of or even without using any experimental solubility data, which can be used to achieve thermodynamics-based access to API solubility in a very early stage of process development. This approach does not only provide solubility data in pure solvents but also in solvent mixtures.
- Research Article
- 10.1021/acsami.6c00593
- Apr 15, 2026
- ACS applied materials & interfaces
- Huadong Wen + 8 more
The escalating demand for secure optical communication requires integrated encryption strategies that balance security with detection performance. Bipolar photodetectors (BPDs) offer wavelength-tunable photocurrent polarity, enabling simultaneous spectral discrimination and signal encryption. Here, we report a self-powered BPD based on an FTO/MoOx/Sb2S3/SnTe/Au back-to-back heterostructure. The device generates negative/positive photocurrents under short/long wavelengths (620/735 nm) with responsivities of -40.7 and 76.1 μA/W and fast response times (88.8/59.3 μs rise/decay at 620 nm). The MoOx interlayer critically tunes heterojunction band alignment, shifting the polarity-switching threshold from 405 to 660 nm. Nonadditive photocurrents under dual-wavelength illumination create unique signal fingerprints, enabling an optical encryption system. Using 620 nm (valid signal) and 735 nm (encryption key) LEDs, binary data streams are encrypted such that hybrid outputs are indecipherable by unipolar detectors. This work demonstrates hardware-level security for next-generation optical networks.