Stand-alone Software Package Research Articles

A new, open source, parallel, stand-alone software package (Fortnet) has been developed, which implements Behler-Parrinello neural networks. It covers the entire workflow from feature generation to the evaluation of generated potentials, coupled with higher-level analysis such as the analytic calculation of atomic forces. The functionality of the software package is demonstrated by driving the training for the fitted correction functions of the density functional tight binding (DFTB) method, which are commonly used to compensate the inaccuracies resulting from the DFTB approximations to the Kohn-Sham Hamiltonian. The usual two-body form of those correction functions limits the transferability of the parametrizations between very different structural environments. The recently introduced DFTB+ANN approach strives to lift these limitations by combining DFTB with a near-sighted artificial neural network (ANN). After investigating various approaches, we have found the combination of DFTB with an ANN acting on-top of some baseline correction functions (delta learning) the most accurate one. It allowed to introduce many-body corrections on top of two-body parametrizations, while excellent transferability to chemical environments with deviating energetics could be demonstrated. Program summaryProgram title: FortnetCPC Library link to program files:https://doi.org/10.17632/sjg3n9vr8p.1Developer's repository link:https://github.com/vanderhe/fortnetCode Ocean capsule:https://codeocean.com/capsule/3992747Licensing provisions: LGPLProgramming language: Fortran, PythonExternal routines/libraries: MPI, BLAS/LAPACK, HDF5, DFTB+Supplementary material: See supplementary material for exemplary Human-friendly Structured Data (HSD) input listings, as well as the basic usage of the Fortformat Python layer for generating datasets and extracting results.Nature of problem: Semi-empirical quantum mechanical methods like density functional tight binding (DFTB) rely on fitting empirical energy correction terms, often represented by two-body potentials, to ab initio references. Hereby empirical, beyond-pairwise contributions are inevitably incorporated and therefore inadequately covered by a purely two-body description.Solution method: The new, open source, parallel, stand-alone software package Fortnet provides a powerful, yet accessible tool to construct many-body correction terms by resorting to high-dimensional neural networks of Behler-Parrinello type. Fortnet is characterized by its modern infrastructure, complementing the landscape of available implementations by a robust combination of Fortran and Python based code.Additional comments including restrictions and unusual features: Fortnet's core is supplemented by two additional projects that are BSD 2-clause licensed, namely fortnet-python [2], a collection of Python based tools for generating compatible datasets and extracting results, and fortnet-ase [3], an interface to the Atomic Simulation Environment (ASE) [1]. Both projects are available via the Python Package Index (PyPI). The interaction of all components is explained in cookbook-like recipes (see: https://fortnet.readthedocs.io/en/latest/), meant to guide new users, while learning about various basic and more advanced features by using comprehensible examples with physical reference.

Read full abstract

BackgroundThe DNA sequences encoding ribosomal RNA genes (rRNAs) are commonly used as markers to identify species, including in metagenomics samples that may combine many organismal communities. The 16S small subunit ribosomal RNA (SSU rRNA) gene is typically used to identify bacterial and archaeal species. The nuclear 18S SSU rRNA gene, and 28S large subunit (LSU) rRNA gene have been used as DNA barcodes and for phylogenetic studies in different eukaryote taxonomic groups. Because of their popularity, the National Center for Biotechnology Information (NCBI) receives a disproportionate number of rRNA sequence submissions and BLAST queries. These sequences vary in quality, length, origin (nuclear, mitochondria, plastid), and organism source and can represent any region of the ribosomal cistron.ResultsTo improve the timely verification of quality, origin and loci boundaries, we developed Ribovore, a software package for sequence analysis of rRNA sequences. The ribotyper and ribosensor programs are used to validate incoming sequences of bacterial and archaeal SSU rRNA. The ribodbmaker program is used to create high-quality datasets of rRNAs from different taxonomic groups. Key algorithmic steps include comparing candidate sequences against rRNA sequence profile hidden Markov models (HMMs) and covariance models of rRNA sequence and secondary-structure conservation, as well as other tests. Nine freely available blastn rRNA databases created and maintained with Ribovore are used for checking incoming GenBank submissions and used by the blastn browser interface at NCBI. Since 2018, Ribovore has been used to analyze more than 50 million prokaryotic SSU rRNA sequences submitted to GenBank, and to select at least 10,435 fungal rRNA RefSeq records from type material of 8350 taxa.ConclusionRibovore combines single-sequence and profile-based methods to improve GenBank processing and analysis of rRNA sequences. It is a standalone, portable, and extensible software package for the alignment, classification and validation of rRNA sequences. Researchers planning on submitting SSU rRNA sequences to GenBank are encouraged to download and use Ribovore to analyze their sequences prior to submission to determine which sequences are likely to be automatically accepted into GenBank.

Read full abstract

Stand-alone Software Package Research Articles

Related Topics

Articles published on Stand-alone Software Package

DataPype: A Fully Automated Unified Software Platform for Computer-Aided Drug Design.

ProsperousPlus: a one-stop and comprehensive platform for accurate protease-specific substrate cleavage prediction and machine-learning model construction.

SHEPHARD: a modular and extensible software architecture for analyzing and annotating large protein datasets.

The Analytical Flory Random Coil Is a Simple-to-Use Reference Model for Unfolded and Disordered Proteins.

An ensemble method for prediction of phage-based therapy against bacterial infections.

EPViz: A flexible and lightweight visualizer to facilitate predictive modeling for multi-channel EEG.

Prediction of celiac disease associated epitopes and motifs in a protein.

Pdif-mediated antibiotic resistance genes transfer in bacteria identified by pdifFinder.

Fortnet, a software package for training Behler-Parrinello neural networks

HemI 2.0: an online service for heatmap illustration.

BOVIDS: A deep learning-based software package for pose estimation to evaluate nightly behavior and its application to common elands (Tragelaphus oryx) in zoos.

Automated modeling of protein accumulation at DNA damage sites using qFADD.py.

Bayes factor testing of equality and order constraints on measures of association in social research

MechAnalyze: An Algorithm for Standardization and Automation of Compression Test Analysis.

Minhee Analysis Package: an integrated software package for detection and management of spontaneous synaptic events

DLpTCR: an ensemble deep learning framework for predicting immunogenic peptide recognized by T cell receptor.

Ribovore: ribosomal RNA sequence analysis for GenBank submissions and database curation

Yet Another Quick Assembly, Analysis and Trimming Tool (YAQAAT): A Server for the Automated Assembly and Analysis of Sanger Sequencing Data.

Yet Another Quick Assembly, Analysis and Trimming Tool (YAQAAT): A Server for the Automated Assembly and Analysis of Sanger Sequencing Data.

AScan: A Novel Method for the Study of Allele Specific Expression in Single Individuals

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Stand-alone Software Package Research Articles

Related Topics

Articles published on Stand-alone Software Package

DataPype: A Fully Automated Unified Software Platform for Computer-Aided Drug Design.

ProsperousPlus: a one-stop and comprehensive platform for accurate protease-specific substrate cleavage prediction and machine-learning model construction.

SHEPHARD: a modular and extensible software architecture for analyzing and annotating large protein datasets.

The Analytical Flory Random Coil Is a Simple-to-Use Reference Model for Unfolded and Disordered Proteins.

An ensemble method for prediction of phage-based therapy against bacterial infections.

EPViz: A flexible and lightweight visualizer to facilitate predictive modeling for multi-channel EEG.

Prediction of celiac disease associated epitopes and motifs in a protein.

Pdif-mediated antibiotic resistance genes transfer in bacteria identified by pdifFinder.

Fortnet, a software package for training Behler-Parrinello neural networks

HemI 2.0: an online service for heatmap illustration.

BOVIDS: A deep learning-based software package for pose estimation to evaluate nightly behavior and its application to common elands (Tragelaphus oryx) in zoos.

Automated modeling of protein accumulation at DNA damage sites using qFADD.py.

Bayes factor testing of equality and order constraints on measures of association in social research

MechAnalyze: An Algorithm for Standardization and Automation of Compression Test Analysis.

Minhee Analysis Package: an integrated software package for detection and management of spontaneous synaptic events

DLpTCR: an ensemble deep learning framework for predicting immunogenic peptide recognized by T cell receptor.

Ribovore: ribosomal RNA sequence analysis for GenBank submissions and database curation

Yet Another Quick Assembly, Analysis and Trimming Tool (YAQAAT): A Server for the Automated Assembly and Analysis of Sanger Sequencing Data.

Yet Another Quick Assembly, Analysis and Trimming Tool (YAQAAT): A Server for the Automated Assembly and Analysis of Sanger Sequencing Data.

AScan: A Novel Method for the Study of Allele Specific Expression in Single Individuals