Abstract
AbstractIntegration of the scientific literature into a biomedical research infrastructure requires the processing of the literature, identification of the contained named entities (NEs) and concepts, and to represent the content in a standardised way.The CALBC project partners (PPs) have produced a large-scale annotated biomedical corpus with four different semantic groups through the harmonisation of annotations from automatic text mining solutions (Silver Standard Corpus, SSC). The four semantic groups were chemical entities and drugs (CHED), genes and proteins (PRGE), diseases and disorders (DISO) and species (SPE). The content of the SSC has been fully integrated into RDF Triple Store (4,568,678 triples) and has been aligned with content from the GeneAtlas (182,840 triples), UniProtKb (12,552,239 triples for human) and the lexical resource LexEBI (BioLexicon). RDF Triple Store enables querying the scientific literature and bioinformatics resources at the same time for evidence of genetic causes, such as drug targets and disease involvement.
Highlights
Normalization of CALBC named entities Disambiguation of CALBC named entities Term collocation at the sentence level e.g
Frequency count for the occurrence of the term in British National Corpus (BNC) or in MEDLINE Disambiguation
Checking consistency of bioinformatics resources from literature
Summary
Normalization of CALBC named entities Disambiguation of CALBC named entities Term collocation at the sentence level e.g. The CALBC RDF Triple store: retrieval over large literature content EBI is an Outstation of the European Molecular Biology Laboratory. Primary data resource reporting novel scientific findings
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.