MantaID: a machine learning-based tool to automate the identification of biological database IDs.

Zhengpeng Zeng,Feng Yu,Miyuan Cao,Longfei Mao,Xiting Wang,Bingbing Li,Jiamin Hu

doi:10.1093/database/baad028

Abstract

The number of biological databases is growing rapidly, but different databases use different identifiers (IDs) to refer to the same biological entity. The inconsistency in IDs impedes the integration of various types of biological data. To resolve the problem, we developed MantaID, a data-driven, machine learning-based approach that automates identifying IDs on a large scale. The MantaID model's prediction accuracy was proven to be 99%, and it correctly and effectively predicted 100,000 ID entries within 2 min. MantaID supports the discovery and exploitation of ID from large quantities of databases (e.g. up to 542 biological databases). An easy-to-use freely available open-source software R package, a user-friendly web application and application programming interfaces were also developed for MantaID to improve applicability. To our knowledge, MantaID is the first tool that enables an automatic, quick, accurate and comprehensive identification of large quantities of IDs and can therefore be used as a starting point to facilitate the complex assimilation and aggregation of biological data across diverse databases.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Database	Publication Date: May 9, 2023
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

MantaID: a machine learning-based tool to automate the identification of biological database IDs.

Abstract

Talk to us

Similar Papers

More From: Database

Lead the way for us

Similar Papers

On the Impact of Granularity in Extracting Knowledge from Bioinformatics Data
Sean West ... Hesham Ali
-
Sean West, et. al.Sean West ... Hesham Ali
01 Jan 2015
01 Jan 2015

A Brief Overview on Intelligent Computing-Based Biological Data and Image Analysis
Mousomi Roy
-
Mousomi RoyMousomi Roy
29 Dec 2023
29 Dec 2023

A Brief Overview on Intelligent Computing-Based Biological Data and Image Analysis
Mousomi Roy
-
Mousomi RoyMousomi Roy
01 Jan 2020
01 Jan 2020

Construction of an ortholog database using the semantic web technology for integrative analysis of genomic data.
Hirokazu Chiba ... Hiroyo Nishide
PloS one | VOL. 10
Hirokazu Chiba, et. al.Hirokazu Chiba ... Hiroyo Nishide
13 Apr 2015
PloS one | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

MantaID: a machine learning-based tool to automate the identification of biological database IDs.

Abstract

Talk to us

Similar Papers

More From: Database