Published in last 50 years
Articles published on Profile-based Methods
- Research Article
6
- 10.1093/bib/bbaf078
- Nov 22, 2024
- Briefings in bioinformatics
- Aoyi Wang + 8 more
The biological targets of traditional Chinese medicine (TCM) are the core effectors mediating the interaction between TCM and the human body. Identification of TCM targets is essential to elucidate the chemical basis and mechanisms of TCM for treating diseases. Given the chemical complexity of TCM, both in silico high-throughput compound-target interaction predicting models and biological profile-based methods have been commonly applied for identifying TCM targets based on the structural information of TCM chemical components and biological information, respectively. However, the existing methods lack the integration of TCM chemical and biological information, resulting in difficulty in the systematic discovery of TCM action pathways. To solve this problem, we propose a novel target identification model NP-TCMtarget to explore the TCM target path by combining the overall chemical and biological profiles. First, NP-TCMtarget infers TCM effect targets by calculating associations between herb/disease inducible gene expression profiles and specific gene signatures for 8233 targets. Then, NP-TCMtarget utilizes a constructed binary classification model to predict binding targets of herbal ingredients. Finally, we can distinguish TCM direct and indirect targets by comparing the effect targets and binding targets to establish the action pathways of herbal component-direct target-indirect target by mapping TCM targets in the biological molecular network. We apply NP-TCMtarget to the formula XiaoKeAn to demonstrate the power of revealing the action pathways of herbal formula. We expect that this novel model could provide a systematic framework for exploring the molecular mechanisms of TCM at the target level. NP-TCMtarget is available at http://www.bcxnfz.top/NP-TCMtarget.
- Research Article
- 10.1186/s43008-024-00163-8
- Nov 1, 2024
- IMA Fungus
- Nichapat Yurayart + 5 more
Pythiosis is a severe disease in humans and animals globally, caused by the pathogenic oomycete Pythium insidiosum. Early and accurate detection is crucial for effective treatment, but traditional diagnostic methods have limitations. This study presents an alternative approach using Matrix-Assisted Laser Desorption Ionization Time of Flight Mass Spectrometry (MALDI-TOF MS) for lipid profiling to efficiently identify P. insidiosum. The study involved extracting microbial lipid components using optimized chloroform: methanol biphasic method and creating a lipid profile database with samples from 30 P. insidiosum isolates and 50 various fungi. The methodology was validated on 25 blinded samples for assay detection performance. Unique lipid profiles allowed species-specific identification with high efficiency: scores ≥ 2.682 indicated P. insidiosum, scores ≤ 2.512 suggested fungi, and scores in between pointed to other oomycetes. The assay demonstrated sensitivity, specificity, and accuracy of 100%, 80%, and 88%, respectively, for detecting P. insidiosum. The limited detection specificity was due to false positive samples from closely related Pythium species, which are not a significant clinical concern. The findings show that MALDI-TOF MS lipid profiling can efficiently identify P. insidiosum, offering significant advantages in sample preparation, stability, and reproducibility over protein profile-based methods. This study marks the first instance of lipid profiles being reported for P. insidiosum, paving the way for clinical use in improving accurate detection and facilitating timely treatment interventions.
- Research Article
25
- 10.1016/j.cemconres.2023.107358
- Oct 28, 2023
- Cement and Concrete Research
- Kai Zhang + 3 more
Development of more accurate methods for determining carbonation depth in cement-based materials
- Research Article
13
- 10.1093/bib/bbad173
- May 18, 2023
- Briefings in Bioinformatics
- Yidong Song + 5 more
Determining intrinsically disordered regions of proteins is essential for elucidating protein biological functions and the mechanisms of their associated diseases. As the gap between the number of experimentally determined protein structures and the number of protein sequences continues to grow exponentially, there is a need for developing an accurate and computationally efficient disorder predictor. However, current single-sequence-based methods are of low accuracy, while evolutionary profile-based methods are computationally intensive. Here, we proposed a fast and accurate protein disorder predictor LMDisorder that employed embedding generated by unsupervised pretrained language models as features. We showed that LMDisorder performs best in all single-sequence-based methods and is comparable or better than another language-model-based technique in four independent test sets, respectively. Furthermore, LMDisorder showed equivalent or even better performance than the state-of-the-art profile-based technique SPOT-Disorder2. In addition, the high computation efficiency of LMDisorder enabled proteome-scale analysis of human, showing that proteins with high predicted disorder content were associated with specific biological functions. The datasets, the source codes, and the trained model are available at https://github.com/biomed-AI/LMDisorder.
- Research Article
39
- 10.1038/s41598-022-11684-w
- May 9, 2022
- Scientific Reports
- Jaspreet Singh + 4 more
Protein language models have emerged as an alternative to multiple sequence alignment for enriching sequence information and improving downstream prediction tasks such as biophysical, structural, and functional properties. Here we show that a method called SPOT-1D-LM combines traditional one-hot encoding with the embeddings from two different language models (ProtTrans and ESM-1b) for the input and yields a leap in accuracy over single-sequence-based techniques in predicting protein 1D secondary and tertiary structural properties, including backbone torsion angles, solvent accessibility and contact numbers for all six test sets (TEST2018, TEST2020, Neff1-2020, CASP12-FM, CASP13-FM and CASP14-FM). More significantly, it has a performance comparable to profile-based methods for those proteins with homologous sequences. For example, the accuracy for three-state secondary structure (SS3) prediction for TEST2018 and TEST2020 proteins are 86.7% and 79.8% by SPOT-1D-LM, compared to 74.3% and 73.4% by the single-sequence-based method SPOT-1D-Single and 86.2% and 80.5% by the profile-based method SPOT-1D, respectively. For proteins without homologous sequences (Neff1-2020) SS3 is 80.41% by SPOT-1D-LM which is 3.8% and 8.3% higher than SPOT-1D-Single and SPOT-1D, respectively. SPOT-1D-LM is expected to be useful for genome-wide analysis given its fast performance. Moreover, high-accuracy prediction of both secondary and tertiary structural properties such as backbone angles and solvent accessibility without sequence alignment suggests that highly accurate prediction of protein structures may be made without homologous sequences, the remaining obstacle in the post AlphaFold2 era.
- Research Article
10
- 10.1186/s12859-021-04316-z
- Aug 12, 2021
- BMC Bioinformatics
- Alejandro A Schäffer + 7 more
BackgroundThe DNA sequences encoding ribosomal RNA genes (rRNAs) are commonly used as markers to identify species, including in metagenomics samples that may combine many organismal communities. The 16S small subunit ribosomal RNA (SSU rRNA) gene is typically used to identify bacterial and archaeal species. The nuclear 18S SSU rRNA gene, and 28S large subunit (LSU) rRNA gene have been used as DNA barcodes and for phylogenetic studies in different eukaryote taxonomic groups. Because of their popularity, the National Center for Biotechnology Information (NCBI) receives a disproportionate number of rRNA sequence submissions and BLAST queries. These sequences vary in quality, length, origin (nuclear, mitochondria, plastid), and organism source and can represent any region of the ribosomal cistron.ResultsTo improve the timely verification of quality, origin and loci boundaries, we developed Ribovore, a software package for sequence analysis of rRNA sequences. The ribotyper and ribosensor programs are used to validate incoming sequences of bacterial and archaeal SSU rRNA. The ribodbmaker program is used to create high-quality datasets of rRNAs from different taxonomic groups. Key algorithmic steps include comparing candidate sequences against rRNA sequence profile hidden Markov models (HMMs) and covariance models of rRNA sequence and secondary-structure conservation, as well as other tests. Nine freely available blastn rRNA databases created and maintained with Ribovore are used for checking incoming GenBank submissions and used by the blastn browser interface at NCBI. Since 2018, Ribovore has been used to analyze more than 50 million prokaryotic SSU rRNA sequences submitted to GenBank, and to select at least 10,435 fungal rRNA RefSeq records from type material of 8350 taxa.ConclusionRibovore combines single-sequence and profile-based methods to improve GenBank processing and analysis of rRNA sequences. It is a standalone, portable, and extensible software package for the alignment, classification and validation of rRNA sequences. Researchers planning on submitting SSU rRNA sequences to GenBank are encouraged to download and use Ribovore to analyze their sequences prior to submission to determine which sequences are likely to be automatically accepted into GenBank.
- Research Article
30
- 10.1016/j.wear.2021.203620
- Jan 19, 2021
- Wear
- J.J Ayerdi + 5 more
This work presents a critical assessment of wear volume determination methods for ball-on-flat linear reciprocating sliding tribological tests. It revealed that the ASTM D7755-11 standard leads to the highest relative errors (up to 106%) and deviations (up to 27%) depending on the regularity of the wear track shape. The present study suggests improvements for the ASTM D7755-11 wear computation, which can reduce errors from 106% to 17% when analysing irregularly shaped wear tracks. In addition, a five-year period review of two relevant tribology journals revealed that the most used methods for wear determination, namely three-dimensional (3D) profilometry (46%) and nonstandard profile-based methods (36%), are overall reported with incomplete procedural information for wear computation. Furthermore, as many as 8% of the papers specify no data regarding the computation method and only 3% explicitly cited and followed the existing standards (ASTM D7755-11 or G133-05). The present study highlights the importance of the correct selection, implementation, and reporting of wear volume computation method and quantifies the potential errors.
- Research Article
1
- 10.1504/ijcat.2020.10034808
- Jan 1, 2020
- International Journal of Computer Applications in Technology
- Mukti Routray + 1 more
Remote homology detection at amino acid level is a complex problem in the area of computational biology. We have used machine learning algorithms to predict the homology of un-annotated protein sequences which can save time and cost. This work is divided in three phases. Initially the features are extracted from protein sequences using Principal Component Analysis (PCA) to build a chromosome set with representative features of each protein based on physicochemical properties. Second stage involves GA for the construction of a set of chromosomes for classification based on PCA and initialises the classifier to build up an error matrix. Third stage uses NSGA-II, crossover and mutation, and tournament selection for the next set of chromosomes. The output of this experiment is a set of minimum classification error values and minimum number of features used for classification of protein families. This approach gives superior accuracy over the profile-based methods.
- Research Article
4
- 10.1504/ijcat.2020.112688
- Jan 1, 2020
- International Journal of Computer Applications in Technology
- Mukti Routray + 1 more
Remote homology detection at amino acid level is a complex problem in the area of computational biology. We have used machine learning algorithms to predict the homology of un-annotated protein sequences which can save time and cost. This work is divided in three phases. Initially the features are extracted from protein sequences using Principal Component Analysis (PCA) to build a chromosome set with representative features of each protein based on physicochemical properties. Second stage involves GA for the construction of a set of chromosomes for classification based on PCA and initialises the classifier to build up an error matrix. Third stage uses NSGA-II, crossover and mutation, and tournament selection for the next set of chromosomes. The output of this experiment is a set of minimum classification error values and minimum number of features used for classification of protein families. This approach gives superior accuracy over the profile-based methods.
- Research Article
47
- 10.1186/s40168-018-0532-2
- Aug 28, 2018
- Microbiome
- Ari Ugarte + 3 more
BackgroundBiochemical and regulatory pathways have until recently been thought and modelled within one cell type, one organism and one species. This vision is being dramatically changed by the advent of whole microbiome sequencing studies, revealing the role of symbiotic microbial populations in fundamental biochemical functions. The new landscape we face requires the reconstruction of biochemical and regulatory pathways at the community level in a given environment. In order to understand how environmental factors affect the genetic material and the dynamics of the expression from one environment to another, we want to evaluate the quantity of gene protein sequences or transcripts associated to a given pathway by precisely estimating the abundance of protein domains, their weak presence or absence in environmental samples.ResultsMetaCLADE is a novel profile-based domain annotation pipeline based on a multi-source domain annotation strategy. It applies directly to reads and improves identification of the catalog of functions in microbiomes. MetaCLADE is applied to simulated data and to more than ten metagenomic and metatranscriptomic datasets from different environments where it outperforms InterProScan in the number of annotated domains. It is compared to the state-of-the-art non-profile-based and profile-based methods, UProC and HMM-GRASPx, showing complementary predictions to UProC. A combination of MetaCLADE and UProC improves even further the functional annotation of environmental samples.ConclusionsLearning about the functional activity of environmental microbial communities is a crucial step to understand microbial interactions and large-scale environmental impact. MetaCLADE has been explicitly designed for metagenomic and metatranscriptomic data and allows for the discovery of patterns in divergent sequences, thanks to its multi-source strategy. MetaCLADE highly improves current domain annotation methods and reaches a fine degree of accuracy in annotation of very different environments such as soil and marine ecosystems, ancient metagenomes and human tissues.
- Research Article
47
- 10.1111/jora.12391
- Aug 18, 2018
- Journal of Research on Adolescence
- Lisa D Pearce + 3 more
The structure of adolescents' families, and thus parental forms, in the United States, have become more heterogeneous and fluid over the past several decades. These changes are due to increases in never-married, single parents, divorce, cohabitation, same-sex parenting, multi-partnered fertility, and co-residence with grandparents. We document current diversity and complexity in adolescents' families as important context for rethinking future parenting theory and research. We also discuss how understandings of adolescents' families are somewhat limited by current methods used to measure characteristics of families. We recommend social network and profile-based methods as alternatives to capturing key dimensions of family structure and processes. Understanding the diversity of households and families in which adolescents are raised can improve theory and research on parenting.
- Research Article
3
- 10.1109/taes.2017.2756741
- Feb 1, 2018
- IEEE Transactions on Aerospace and Electronic Systems
- Taihe Yi + 4 more
Boost-phase trajectory inference is one of the major objectives of the space-borne missile early-warning system. Traditional methods can be classified as either profile-based or profile-free methods. The profile-based methods are accurate but inadaptable to the types of missiles, whereas the profile-free methods are adaptable but inaccurate. To integrate the strengths of the profile-based and profile-free methods, a multimodel trajectory inference approach is proposed. First, a general net acceleration model (GNAM) containing only type-free prior information is constructed by the method of sieves. Then, a new kind of net acceleration profile is proposed by incorporating type-dependent prior information into the GNAM. After that, the multimodel approach is proposed following the Bayesian framework. Simulations indicate that the approach is accurate in estimation and capable for type identification.
- Research Article
21
- 10.1109/tcbb.2017.2765331
- Oct 23, 2017
- IEEE/ACM transactions on computational biology and bioinformatics
- Bin Liu + 3 more
Protein remote homology detection and fold recognition are two critical tasks for the studies of protein structures and functions. Currently, the profile-based methods achieve the state-of-the-art performance in these fields. However, the widely used sequence profiles, like position-specific frequency matrix (PSFM) and position-specific scoring matrix (PSSM), ignore the sequence-order effects along protein sequence. In this study, we have proposed a novel profile, called sequence-order frequency matrix (SOFM), to extract the sequence-order information of neighboring residues from multiple sequence alignment (MSA). Combined with two profile feature extraction approaches, top-n-grams and the Smith-Waterman algorithm, the SOFMs are applied to protein remote homology detection and fold recognition, and two predictors called SOFM-Top and SOFM-SW are proposed. Experimental results show that SOFM contains more information content than other profiles, and these two predictors outperform other state-of-the-art methods. It is anticipated that SOFM will become a very useful profile in the studies of protein structures and functions.
- Research Article
- 10.25100/iyc.v19i2.5289
- Sep 15, 2017
- INGENIERÍA Y COMPETITIVIDAD
- Oscar F Bedoya
En este artículo se presenta un nuevo método para la detección de homólogos remotos, llamado remote-3DD, que combina mapas de contacto predichos y una distribución de los valores en las matrices de interacción. Los mapas de contacto predichos son una aproximación de la forma 3D de proteína que se puede obtener a partir de su estructura primaria. Por su parte, una matriz de interacción permite representar una proteína a partir de las propiedades fisicoquímicas de los aminoácidos que la conforman. Remote-3DD se propone como una estrategia para mejorar la exactitud del método remote-C3D en el cual se utilizan solamente mapas de contacto. La hipótesis que se plantea en este artículo es que se puede mejorar la exactitud del método remote-C3D al incorporar las distribuciones de la matriz de interacción. Los resultados de las pruebas muestran que el método remote-3DD alcanza una exactitud mayor que los métodos basados en composición y en algunos casos una exactitud comparable con los métodos basados en perfiles. Además, las pruebas permiten demostrar que el método remote-3DD, en general, presenta exactitudes mayores que el método remote-C3D cuando se utiliza la misma cantidad de modelos y tamaños de submatrices.
- Research Article
1
- 10.11113/jt.v78.9717
- Sep 28, 2016
- Jurnal Teknologi
- Dini Handayani + 2 more
The ability to identify a subject is indispensable in affective computing research due to its wide range of applications. User profiling was created based on the strength of emotional patterns of the subject, which can be used for subject identification. Such system is made based on the emotional states of happiness and sadness, indicated by the electroencephalogram (EEG) data. In this paper, we examine several techniques used for subject profiling or identification purposed. Those techniques include feature extraction and classification techniques. In the experimental study, we compare three techniques for feature extraction namely, Power Spectral Density (PSD), Kernel Density Estimation (KDE), and Mel Frequency Cepstral Coefficients (MFCC). As for classification we compare three classification techniques, they are; Multilayer Perceptron (MLP), Naive Bayesian (NB), and Support Vector Machine (SVM). The best result achieved was 59.66%, using the MFCC and MLP-based techniques using 5-fold cross validation. The experiment results indicated that these profiles could be more accurate in identifying subject compared to NB and SVM. The comparisons demonstrated that profile-based methods for subject identification provide a viable and simple alternative to this problem.
- Research Article
21
- 10.1016/j.foreco.2016.09.006
- Sep 15, 2016
- Forest Ecology and Management
- Jan Paul Krüger + 4 more
Calculating carbon changes in peat soils drained for forestry with four different profile-based methods
- Research Article
6
- 10.1016/j.suscom.2016.04.003
- May 13, 2016
- Sustainable Computing: Informatics and Systems
- Teng Lu + 2 more
A dynamic, compiler guided DVFS mechanism to achieve energy-efficiency in multi-core processors
- Research Article
62
- 10.1007/s00438-015-1044-4
- Apr 21, 2015
- Molecular genetics and genomics : MGG
- Bin Liu + 2 more
Protein remote homology detection is one of the important tasks in computational proteomics, which is important for basic research and practical application. Currently, the SVM-based discriminative methods have shown superior performance. However, the existing feature vectors still cannot suitably represent the protein sequences, and often lack an interpretable model for analysis of characteristic features. Previous studies showed that sequence-order effects and physicochemical properties are important for representing protein sequences. However, how to use these kinds of information for constructing predictors is still a challenging problem. In this study, in order to incorporate the sequence-order information and physicochemical properties into the prediction, a method called disPseAAC is proposed, in which the feature vector is constructed by combining the occurrences of amino acid pairs within the Chou's pseudo amino acid composition (PseAAC) approach. The predictive performance and computational cost are further improved by employing the principal component analysis strategy. Various experiments are conducted on a benchmark dataset. Experimental results show that disPseAAC achieves an ROC score of 0.922, outperforming some existing state-of-the-art methods. Furthermore, the learnt model can easily be analyzed in terms of discriminative features, and the computational cost of the proposed method is much lower than that of other profile-based methods.
- Research Article
6
- 10.1016/j.compbiomed.2015.01.020
- Jan 31, 2015
- Computers in Biology and Medicine
- Oscar Bedoya + 1 more
Reducing dimensionality in remote homology detection using predicted contact maps
- Research Article
84
- 10.1093/bioinformatics/btu843
- Dec 23, 2014
- Bioinformatics
- Peter Meinicke
Motivation: With rapidly increasing volumes of biological sequence data the functional analysis of new sequences in terms of similarities to known protein families challenges classical bioinformatics.Results: The ultrafast protein classification (UProC) toolbox implements a novel algorithm (‘Mosaic Matching’) for large-scale sequence analysis. UProC is by three orders of magnitude faster than profile-based methods and in a metagenome simulation study achieved up to 80% higher sensitivity on unassembled 100 bp reads.Availability and implementation: UProC is available as an open-source software at https://github.com/gobics/uproc. Precompiled databases (Pfam) are linked on the UProC homepage: http://uproc.gobics.de/.Contact: peter@gobics.de.Supplementary information: Supplementary data are available at Bioinformatics online.