Abstract

BackgroundIntegration of heterogeneous data types is a challenging problem, especially in biology, where the number of databases and data types increase rapidly. Amongst the problems that one has to face are integrity, consistency, redundancy, connectivity, expressiveness and updatability.DescriptionHere we present a system (Biozon) that addresses these problems, and offers biologists a new knowledge resource to navigate through and explore. Biozon unifies multiple biological databases consisting of a variety of data types (such as DNA sequences, proteins, interactions and cellular pathways). It is fundamentally different from previous efforts as it uses a single extensive and tightly connected graph schema wrapped with hierarchical ontology of documents and relations. Beyond warehousing existing data, Biozon computes and stores novel derived data, such as similarity relationships and functional predictions. The integration of similarity data allows propagation of knowledge through inference and fuzzy searches. Sophisticated methods of query that span multiple data types were implemented and first-of-a-kind biological ranking systems were explored and integrated.ConclusionThe Biozon system is an extensive knowledge resource of heterogeneous biological data. Currently, it holds more than 100 million biological documents and 6.5 billion relations between them. The database is accessible through an advanced web interface that supports complex queries, "fuzzy" searches, data materialization and more, online at .

Highlights

  • Integration of heterogeneous data types is a challenging problem, especially in biology, where the number of databases and data types increase rapidly

  • The database is accessible through an advanced web interface that supports complex queries, "fuzzy" searches, data materialization and more, online at http://biozon.org

  • In the five months InterPro version 8.0 was active, SwissProt advanced from version 43.5 to 45.1, with similar advances in TrEMBL.] In response to these conflicts, Biozon employs various methods to map identifiers to concrete objects, including retrieval of archived entries or the use of CRC keys to search for possible matches, followed by comparison of the sequence entries

Read more

Summary

Conclusion

The Biozon system is an extensive knowledge resource of heterogeneous biological data. It holds more than 100 million biological documents and 6.5 billion relations between them. The database is accessible through an advanced web interface that supports complex queries, "fuzzy" searches, data materialization and more, online at http://biozon.org

Background
47 Interactions
23. Wong L
45. Kleinberg JM
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call