Abstract

Unstructured data refers to information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Many studies confirm that around 80--90% of all produced information is in unstructured form. So this kind of content, rich and most importantly too precious, must be integrated and taken into consideration for processing and exploitation: extraction of relevant information from heterogeneous textual data. The goal of the research described here is to present an approach for automating the detection and the extraction of meaning from unstructured Web using its normalized part: Web of data & Linked Open data (LOD) such as RDF WordNet, DBpedia, etc. The follows a cyclical process that consists of two phases (a) creating & generating normalized smart data by the experts or automatically, (b) exploiting the created data in (a), as validated expert data, to analyze the Big Data and generate automatically new ones by learning from Linked Open Data (LOD). The approach is based on a range of linguistic and ontological techniques, in the context of Big Data. A software, EC3, is being implemented and at LIP6. EC3 is actually tested on very large corpuses on electronic supports, provided by the labex OBVIL (http://obvil.paris-sorbonne.fr) and the BNF (National Library of France).

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.