Large-scale Prediction Research Articles

Over half a million Holsteins are being genotyped annually in the United States. The computational cost of including all genotypes in single-step genomic (ssG)BLUP is high, although it is feasible to conduct large-scale genomic prediction using an efficient algorithm such as APY (algorithm for proven and young). An effective method to further reduce the computing cost could be the use of indirect genomic predictions (IGP) for genotyped animals when they have neither progeny nor phenotypes. These young genotyped animals have no effect on the other genotyped animals and could have their genomic prediction done indirectly. The main objective of this study was to calculate IGP for various groups of genotyped animals and investigate the reduction in computing time as well as bias and accuracy of the IGP. We compared IGP with genomic (G)EBV for 18 linear type traits in US Holsteins, including 2.3 million (M) genotyped animals. The full data set consisted of 10.9M records for 18 linear type traits up to 2018 calving, 13.6M animals in the pedigree, and 2.3M animals genotyped for 79K SNP. For IGP, ssGBLUP included all genotyped animals except those with neither progeny nor phenotypes by year from 2014 to 2018 (i.e., the target animals). The SNP marker effects were computed based on GEBV for genotyped animals that had progeny, or phenotypes, or both. Further, IGP were calculated for target genotyped animals in each year group. For all genotyped animal groups from 2014 to 2018, the coefficients of determination (R2) of a linear regression of GEBV on IGP were 0.960 for males and 0.954 for females for 18 traits on average. To reduce computing costs, the SNP marker effects were calculated based on GEBV from randomly selected genotyped animals from 15K to 60K. By randomly selecting a small number of genotyped animals, the computing time was dramatically reduced. As more genotyped animals were randomly selected to calculate SNP effects, R2 was higher (more accurate) and the regression coefficient was lower (more inflated IGP). In a practical genomic evaluation in US Holsteins, to get sufficient contributions from GEBV, 25K to 35K is a rational number of genotyped animals that can be randomly selected to compute SNP effects and obtain accurate and unbiased IGP. Considering the computing time and both unbiasedness and accuracy of IGP, genomic evaluation can be conducted separately in GEBV for genotyped animals with phenotypes or progeny and in IGP for young genotyped animals. This can be a practical solution when conducting a large-scale genomic evaluation and would enable more frequent evaluation at lower cost, especially when many genotyped animals have neither phenotypes nor progeny.

Read full abstract

Enhancers are important functional elements in genome sequences. The identification of enhancers is a very challenging task due to the great diversity of enhancer sequences and the flexible localization on genomes. Till now, the interactions between enhancers and genes have not been fully understood yet. To speed up the studies of the regulatory roles of enhancers, computational tools for the prediction of enhancers have emerged in recent years. Especially, thanks to the ENCODE project and the advances of high-throughput experimental techniques, a large amount of experimentally verified enhancers have been annotated on the human genome, which allows large-scale predictions of unknown enhancers using data-driven methods. However, except for human and some model organisms, the validated enhancer annotations are scarce for most species, leading to more difficulties in the computational identification of enhancers for their genomes. In this study, we propose a deep learning-based predictor for enhancers, named CrepHAN, which is featured by a hierarchical attention neural network and word embedding-based representations for DNA sequences. We use the experimentally supported data of the human genome to train the model, and perform experiments on human and other mammals, including mouse, cow and dog. The experimental results show that CrepHAN has more advantages on cross-species predictions, and outperforms the existing models by a large margin. Especially, for human-mouse cross-predictions, the area under the receiver operating characteristic (ROC) curve (AUC) score of ROC curve is increased by 0.033∼0.145 on the combined tissue dataset and 0.032∼0.109 on tissue-specific datasets. bcmi.sjtu.edu.cn/∼yangyang/CrepHAN.html. Supplementary data are available at Bioinformatics online.

Read full abstract

Large-scale Prediction Research Articles

Related Topics

Articles published on Large-scale Prediction

Comparing recurrent convolutional neural networks for large scale bird species classification

Reducing computational cost of large-scale genomic evaluation by using indirect genomic prediction

Functional materials exploration through evolutionary searching and large-scale crystal structure prediction

Research on Actual Road Emission Prediction Model of Heavy-Duty Diesel Vehicles Based on OBD Remote Method and Artificial Neural Network

Highly accurate protein structure prediction for the human proteome

A co-fractionation mass spectrometry-based prediction of protein complex assemblies in the developing rice aleurone-subaleurone.

Protein Interaction Network-based Deep Learning Framework for Identifying Disease-Associated Human Proteins

Large-Scale Membrane Permeability Prediction of Cyclic Peptides Crossing a Lipid Bilayer Based on Enhanced Sampling Molecular Dynamics Simulations.

A Novel Feature Extraction Model for Large-Scale Workload Prediction in Cloud Environment

The efficacy of ethnic specific blood groups genotyping for routine donor investigation and rare donor identification in Taiwan.

E-Pedigrees: a large-scale automatic family pedigree prediction application.

Large-scale radio propagation path loss measurements and predictions in the VHF and UHF bands

NetGO 2.0: improving large-scale protein function prediction with massive sequence, text, domain, family and network information.

Structure determination of an amorphous drug through large-scale NMR predictions

ConsRM: collection and large-scale prediction of the evolutionarily conserved RNA methylation sites, with implications for the functional epitranscriptome.

CrepHAN: cross-species prediction of enhancers by using hierarchical attention networks.

Linking functional traits and demography to model species-rich communities

What do we need to predict groundwater nitrate recovery trajectories?

Large-Scale Road Network Congestion Pattern Analysis and Prediction Using Deep Convolutional Autoencoder

Super Sites for Advancing Understanding of the Oceanic and Atmospheric Boundary Layers

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Large-scale Prediction Research Articles

Related Topics

Articles published on Large-scale Prediction

Comparing recurrent convolutional neural networks for large scale bird species classification

Reducing computational cost of large-scale genomic evaluation by using indirect genomic prediction

Functional materials exploration through evolutionary searching and large-scale crystal structure prediction

Research on Actual Road Emission Prediction Model of Heavy-Duty Diesel Vehicles Based on OBD Remote Method and Artificial Neural Network

Highly accurate protein structure prediction for the human proteome

A co-fractionation mass spectrometry-based prediction of protein complex assemblies in the developing rice aleurone-subaleurone.

Protein Interaction Network-based Deep Learning Framework for Identifying Disease-Associated Human Proteins

Large-Scale Membrane Permeability Prediction of Cyclic Peptides Crossing a Lipid Bilayer Based on Enhanced Sampling Molecular Dynamics Simulations.

A Novel Feature Extraction Model for Large-Scale Workload Prediction in Cloud Environment

The efficacy of ethnic specific blood groups genotyping for routine donor investigation and rare donor identification in Taiwan.

E-Pedigrees: a large-scale automatic family pedigree prediction application.

Large-scale radio propagation path loss measurements and predictions in the VHF and UHF bands

NetGO 2.0: improving large-scale protein function prediction with massive sequence, text, domain, family and network information.

Structure determination of an amorphous drug through large-scale NMR predictions

ConsRM: collection and large-scale prediction of the evolutionarily conserved RNA methylation sites, with implications for the functional epitranscriptome.

CrepHAN: cross-species prediction of enhancers by using hierarchical attention networks.

Linking functional traits and demography to model species-rich communities

What do we need to predict groundwater nitrate recovery trajectories?

Large-Scale Road Network Congestion Pattern Analysis and Prediction Using Deep Convolutional Autoencoder

Super Sites for Advancing Understanding of the Oceanic and Atmospheric Boundary Layers