A Method of Improving the Efficiency of Mining Sub-structures in Molecular Structure Databases

Haibo Li,Yuanzhen Wang,Kevin Lü

doi:10.1007/978-3-540-73390-4_20

Abstract

One problem exists in current substructure mining algorithms is that when the sizes of molecular structure databases increase, the costs in terms of both time and space increase to a level that normal PCs are not powerful enough to perform substructure data mining tasks. After examining a number of well known molecular structure databases, we found that there exist a large number of common loop substructures within molecular structure databases, and repeatedly mining these same substructures costs the system resources significantly. In this paper, we introduce a new method: (1) to treat these common loop substructures as some kinds of atom structures; (2) to maintain the links of the new atom structures with the rest of the molecular structures, and to reorganize the original molecular structures. Therefore we avoid repeat many same operations during mining process and produce less redundant results. We tested the method using four real molecular structure databases: AID2DA'99/CA, AID2DA'99/CM, AID2DA'99 and NCI'99. The results indicated that (1) the speed of substructure mining has been improved due to the reorganization; (2) the number of patterns obtained by mining has been reduced with less redundant information.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Method of Improving the Efficiency of Mining Sub-structures in Molecular Structure Databases

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Chemical Structure Recognition (CSR) System: Automatic Analysis of 2D Chemical Structures in Document Images
Syed Saqib Bukhari ... Zaryab Iftikhar
-
Syed Saqib Bukhari, et. al.Syed Saqib Bukhari ... Zaryab Iftikhar
01 Sep 2019
01 Sep 2019

MetExpert: An expert system to enhance gas chromatography‒mass spectrometry-based metabolite identifications
Feng Qiu ... Lloyd W Sumner
Analytica Chimica Acta | VOL. 1037
Feng Qiu, et. al.Feng Qiu ... Lloyd W Sumner
06 Apr 2018
Analytica Chimica Acta | VOL. 1037

SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information.
Kai Dührkop ... Pieter C Dorrestein
Nature Methods | VOL. 16
Kai Dührkop, et. al.Kai Dührkop ... Pieter C Dorrestein
18 Mar 2019
Nature Methods | VOL. 16

MAD HATTER Correctly Annotates 98% of Small Molecule Tandem Mass Spectra Searching in PubChem.
Martin A Hoffmann ... Sebastian Böcker
Metabolites | VOL. 13
Martin A Hoffmann, et. al.Martin A Hoffmann ... Sebastian Böcker
21 Feb 2023
Metabolites | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Method of Improving the Efficiency of Mining Sub-structures in Molecular Structure Databases

Abstract

Talk to us

Similar Papers