Different Sets Of Descriptors Research Articles

Aqueous solubility is one of the most important physicochemical properties of drug molecules and a major driving force for oral drug absorption. To date, the performance of in silico models for the estimation of solubility for novel chemical space is limited. To investigate possible reasons and remedies for this, the Johnson and Johnson in-house aqueous solubility data with over 40,000 compounds was leveraged. All data were generated through the same high-throughput assay, providing a unique opportunity to explore the relationship between data quality, quantity, and model estimations. Six intrinsic solubility data sets with different sizes and noise levels were generated by making use of three different approaches: (i) inclusion or exclusion of amorphous solid residue, (ii) measured or experimental log D to identify the intrinsic solubility, and (iii) adopting or omitting a quality check process in the data processing workflow. A random forest regressor was trained on the data sets with three different sets of descriptors calculated from RDKit, ADMET predictor, or Mordred, and the performances were evaluated with nested cross-validation as well as ten refined test sets. The models confirm, as expected, that with the same data set size, high-quality data leads to better model performance; however, also, models trained with larger data sets containing analytical variability can give equally accurate estimations compared to models trained with small, clean, and diverse data sets. However, noise introduced by including the presence of amorphous solid postsolubility measurement in the training data set cannot be overcome by increasing data size, as they are introducing a biased systematic positive error in the data set, confirming the importance of critical data review. Finally, two top-performing models were tested on the first test set from the second solubility challenge, achieving RMSE values of 0.74 and 0.72 and log S ± 0.5 of 46 and 48%, respectively. These results demonstrated improved performance compared to those reported in the findings of the competition, highlighting that a single-source curated data set can enhance the prediction of intrinsic solubility.

Read full abstract

Biodiversity is a complex and multidimensional concept that characterizes variation of life on Earth. Nonetheless, most studies have examined only a few, if not just one, dimension in isolation. Herein we conduct analyses that explicitly incorporate correlations among multiple dimensions of biodiversity by characterizing morphological, phylogenetic, and functional structure of bat communities from Atlantic Forest of South America and examine degree of redundancy among these sets of descriptors. Second, we examine dimensionality (i.e. number of orthogonal dimensions) of community structure by quantitatively determining if these different sets of descriptors correspond to unique dimensions. We assess if dimensionality measured from empirical communities differs from that based on communities randomly assembled from a regional species pool. Finally, we examine whether different indices of community structure respond differently to environmental gradients spanning Atlantic Forest. We find that Atlantic Forest bat communities are highly variable in terms of morphological, phylogenetic, and functional structure. Different sets of community structure indices exhibited substantive correlations. Accordingly, dimensionality was lower than the set of six different descriptors or even the three different biological dimensions represented. Nonetheless, observed dimensionality was greater than that expected from a null model of assembly. Only abundance‐based indices of phylogenetic structure exhibited significant environmental gradients. Temperature seasonality was the strongest predictor of phylogenetic structure, with overdispersed communities characterizing more seasonal environments and underdispersed communities occurring in areas of lower variation in temperature. Dimensionality of community structure is low with phylogenetic structure exhibiting the strongest patterns, probably because phylogeny reflects many different ecological aspects of the phenotype that are not restricted to just one index of structure. Temperature seasonality is an important determinant of phylogenetic structure of bat communities in Atlantic Forest. This research helps us to better understand the factors underlying the distribution of biodiversity, which is increasingly important for endangered ecoregions such as Atlantic Forest.

Read full abstract

Different Sets Of Descriptors Research Articles

Related Topics

Articles published on Different Sets Of Descriptors

Development of a robust Machine learning model for Ames test outcome prediction

Effect of Data Quality and Data Quantity on the Estimation of Intrinsic Solubility: Analysis Based on a Single-Source Data Set.

A QSAR study for predicting malformation in zebrafish embryo

How Granular Can a Dose Form Be Described? Considering EDQM Standard Terms for a Global Terminology

The role of feature space in atomistic learning

Minimal-uncertainty prediction of general drug-likeness based on Bayesian neural networks

A Quantitative Model for Alkane Nucleophilicity Based on C-H Bond Structural/Topological Descriptors.

A Quantitative Model for Alkane Nucleophilicity Based on C−H Bond Structural/Topological Descriptors

Instanton rate constant calculations using interpolated potential energy surfaces in nonredundant, rotationally and translationally invariant coordinates.

Folding a small protein using harmonic linear discriminant analysis

Mathematical structural descriptors and mutagenicity assessment: a study with congeneric and diverse datasets$

Quantitative structure–activity relationships study of potent pyridinone scaffold derivatives as HIV-1 integrase inhibitors with therapeutic applications

Themes in Judges' Sentencing Remarks for Male and Female Domestic Murderers

Theoretical modeling of HPV: QSAR and novodesign with fragment approach.

Dimensionality of community structure: phylogenetic, morphological and functional perspectives along biodiversity and environmental gradients

A Novel Strategy of Structural Similarity Based Consensus Modeling.

How survey design affects self-assessed health responses in the Survey of Health, Ageing, and Retirement in Europe (SHARE)

Mode of action prediction of ligands of steroid hormone receptors

Development of validated QSPR models for impact sensitivity of nitroaliphatic compounds

QSAR models to predict mutagenicity of acrylates, methacrylates and [formula omitted], [formula omitted]-unsaturated carbonyl compounds

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Different Sets Of Descriptors Research Articles

Related Topics

Articles published on Different Sets Of Descriptors

Development of a robust Machine learning model for Ames test outcome prediction

Effect of Data Quality and Data Quantity on the Estimation of Intrinsic Solubility: Analysis Based on a Single-Source Data Set.

A QSAR study for predicting malformation in zebrafish embryo

How Granular Can a Dose Form Be Described? Considering EDQM Standard Terms for a Global Terminology

The role of feature space in atomistic learning

Minimal-uncertainty prediction of general drug-likeness based on Bayesian neural networks

A Quantitative Model for Alkane Nucleophilicity Based on C-H Bond Structural/Topological Descriptors.

A Quantitative Model for Alkane Nucleophilicity Based on C−H Bond Structural/Topological Descriptors

Instanton rate constant calculations using interpolated potential energy surfaces in nonredundant, rotationally and translationally invariant coordinates.

Folding a small protein using harmonic linear discriminant analysis

Mathematical structural descriptors and mutagenicity assessment: a study with congeneric and diverse datasets$

Quantitative structure–activity relationships study of potent pyridinone scaffold derivatives as HIV-1 integrase inhibitors with therapeutic applications

Themes in Judges' Sentencing Remarks for Male and Female Domestic Murderers

Theoretical modeling of HPV: QSAR and novodesign with fragment approach.

Dimensionality of community structure: phylogenetic, morphological and functional perspectives along biodiversity and environmental gradients

A Novel Strategy of Structural Similarity Based Consensus Modeling.

How survey design affects self-assessed health responses in the Survey of Health, Ageing, and Retirement in Europe (SHARE)

Mode of action prediction of ligands of steroid hormone receptors

Development of validated QSPR models for impact sensitivity of nitroaliphatic compounds

QSAR models to predict mutagenicity of acrylates, methacrylates and [formula omitted], [formula omitted]-unsaturated carbonyl compounds