String Representation Research Articles

The synthesis of nanoporous two-dimensional (2D) materials has revolutionized fields such as membrane separations, DNA sequencing, and osmotic power harvesting. Nanopores in 2D materials significantly modulate their optoelectronic, magnetic, and barrier properties. However, the large number of possible nanopore isomers makes their study onerous, while the lack of machine-learnable representations stymies progress toward structure-property relationships. Here, we develop a language for nanopores in 2D materials, called STring Representation Of Nanopore Geometry (STRONG), that opens the field of 2D nanopore informatics. We show that STRONGs are naturally suited for machine learning via recurrent neural networks, predicting formation energies/times of arbitrary nanopores and transport barriers for CO2, N2, and O2 gas molecules, enabling structure-property relationships. The machine learning models enable the discovery of specific nanopore topologies to separate CO2/N2, O2/CO2, and O2/N2 gas mixtures with high selectivity ratios. We also enable the rapid enumeration of unique configurations of stable, functionalized nanopores in 2D materials via STRONGs, allowing systematic searching of the vast chemical space of nanopores. Using the STRONGs approach, we find that a mix of hydrogen and quinone functionalization results in the most stable functionalized nanopore configuration in graphene, a discovery made feasible by expedited chemical space exploration. Additionally, we also unravel the STRONGs approach as ∼1000 times faster than graph theory algorithms to distinguish nanopore shapes. These advances in the language-based representation of 2D nanopores will accelerate the tailored design of nanoporous materials.

Read full abstract

Molecular property prediction plays a crucial role in drug discovery and development. However, traditional experimental measurements and Quantitative Structure-Activity Relationship (QSAR) models are often expensive, time-consuming, and data acquisition is challenging. To overcome these limitations and challenges, this study innovatively proposes a fusion molecular property prediction method called molecular property prediction model (MSSP) to address the non-uniqueness of Simplified Molecular Input Line Entry System (SMILES) string representation and the difficulty of capturing global information in molecular graphs. This method extracts multiple fingerprint features and utilizes graph neural network encoding to map different modalities of molecules into molecular sharing and molecular-specific representation spaces, achieving modal alignment and fusion of molecules by combining molecular invariance and representation specificity. To enhance the interpretability and visualization capabilities of the model, graph attention mechanisms are introduced, enabling the identification and inference of important chemical fragments within molecules. Experimental results on publicly available cell line phenotype and kinase activity datasets demonstrate that MSSP outperforms the current state-of-the-art methods in molecular property prediction. Additionally, MSSP exhibits strong competitiveness across nine benchmark molecular property prediction datasets. Furthermore, in the task of predicting SRC kinase data properties, this study successfully screens promising therapeutic compounds from compound libraries by validating the predictions of the MSSP model and combining them with traditional methods such as molecular docking and molecular dynamics simulations. Multiple potential Lyn inhibitors have been discovered through this approach. The application of MSSP model is helpful to discover new molecules with new drug properties or functions, accelerate the process of drug discovery, save time and resources, and provide important guidance for drug discovery.

Read full abstract

String Representation Research Articles

Related Topics

Articles published on String Representation

AbstractTrace: The Use of Execution Traces to Cluster, Classify, Prioritize, and Optimize a Bloated Test Suite

Generative Pretrained Transformer for Heterogeneous Catalysts.

Machine Learnable Language for the Chemical Space of Nanopores Enables Structure-Property Relationships in Nanoporous 2D Materials.

What can attribution methods show us about chemical language models?

Automated design of multi-target ligands by generative deep learning

Evolutionary algorithms simulating molecular evolution: a new field proposal.

Molecular sharing and molecular-specific representations for multimodal molecular property prediction

Nonmesonic Quantum Many-Body Scars in a 1D Lattice Gauge Theory.

Enhancing deep learning predictive models with HAPPY (Hierarchically Abstracted rePeat unit of PolYmers) representation

Efficient non-isomorphic graph enumeration algorithms for several intersection graph classes

Difficulty in chirality recognition for Transformer architectures learning chemical structures from string representations

Making the InChI FAIR and sustainable while moving to inorganics.

DEEPSIDE A DEEP LEARNING FRAMEWORK FOR DRUG SIDE EFFECT PREDICTION

DEEP SIDE-A DEEP LEARNING FRAMEWORK FOR DRUG SIDE EFFECT PREDICTION

Transforming data from the image to the text domain: benign versus malignant micro-calcification classification

Improved ant colony algorithm for the mixed-model parallel two-sided assembly lines balancing problem

MOF-GRU: A MOFid-Aided Deep Learning Model for Predicting the Gas Separation Performance of Metal-Organic Frameworks.

Modelling orthographic similarity effects in recognition memory reveals support for open bigram representations of letter coding

Study of Superconductivity in Restricted Quantum Chromo-Dynamics in non-Abelian Gauge Theory

Using alternative SMILES representations to identify novel functional analogues in chemical similarity vector searches

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

String Representation Research Articles

Related Topics

Articles published on String Representation

AbstractTrace: The Use of Execution Traces to Cluster, Classify, Prioritize, and Optimize a Bloated Test Suite

Generative Pretrained Transformer for Heterogeneous Catalysts.

Machine Learnable Language for the Chemical Space of Nanopores Enables Structure-Property Relationships in Nanoporous 2D Materials.

What can attribution methods show us about chemical language models?

Automated design of multi-target ligands by generative deep learning

Evolutionary algorithms simulating molecular evolution: a new field proposal.

Molecular sharing and molecular-specific representations for multimodal molecular property prediction

Nonmesonic Quantum Many-Body Scars in a 1D Lattice Gauge Theory.

Enhancing deep learning predictive models with HAPPY (Hierarchically Abstracted rePeat unit of PolYmers) representation

Efficient non-isomorphic graph enumeration algorithms for several intersection graph classes

Difficulty in chirality recognition for Transformer architectures learning chemical structures from string representations

Making the InChI FAIR and sustainable while moving to inorganics.

DEEPSIDE A DEEP LEARNING FRAMEWORK FOR DRUG SIDE EFFECT PREDICTION

DEEP SIDE-A DEEP LEARNING FRAMEWORK FOR DRUG SIDE EFFECT PREDICTION

Transforming data from the image to the text domain: benign versus malignant micro-calcification classification

Improved ant colony algorithm for the mixed-model parallel two-sided assembly lines balancing problem

MOF-GRU: A MOFid-Aided Deep Learning Model for Predicting the Gas Separation Performance of Metal-Organic Frameworks.

Modelling orthographic similarity effects in recognition memory reveals support for open bigram representations of letter coding

Study of Superconductivity in Restricted Quantum Chromo-Dynamics in non-Abelian Gauge Theory

Using alternative SMILES representations to identify novel functional analogues in chemical similarity vector searches