We have developed the Transcription Regulatory Regions Database (TRRD, http://www.bionet.nsc.ru/trrd/) designed for storage of data on the structural-functional organization of transcriptional regulatory regions of the eukaryotic genes and their expression patterns. TRRD contains experimentally supported data only. Knowledge extraction from scientific literature, storage and further application of the results are all stepwise, conforming to the Data Mining technology: i) knowledge extraction from scientific publications; ii) preprocessing (data cleaning, syntactic and semantic analysis; iii) data transformation; iv) application for prediction; v) interpretation of the obtained knowledge to resolve the timely issues in bioinformatics. TRRD contains a compilation of data on 2 344 genes, their 14 407 expression patterns, 3 490 regulatory units, and 10 135 transcription factor binding sites. TRRD is filled in by manual annotation of scientific publications. The information incorporated into TRRD is the result of annotations of 7 609 scientific papers. Sequence Retrieval System (SRS) is the main tool for search and navigation in TRRD. A large number of indexed fields in its SRS version allow the user to generate queries both within and between libraries. TRRD has thesauruses and search systems that provide additional options for data access. TRRD is currently linked to 20 worldwide information resources, including EMBL/GeneBank, Ensembl, EPD, SWISS-PROT, TRANSFAC, GDB, GeneCards, MGD, RGD, GO. The links serve as a framework for integration in a distributed network environment.
Read full abstract