Morphosyntactic Information Research Articles

BackgroundCurrent biomedical research needs to leverage and exploit the large amount of information reported in scientific publications. Automated text mining approaches, in particular those aimed at finding relationships between entities, are key for identification of actionable knowledge from free text repositories. We present the BeFree system aimed at identifying relationships between biomedical entities with a special focus on genes and their associated diseases.ResultsBy exploiting morpho-syntactic information of the text, BeFree is able to identify gene-disease, drug-disease and drug-target associations with state-of-the-art performance. The application of BeFree to real-case scenarios shows its effectiveness in extracting information relevant for translational research. We show the value of the gene-disease associations extracted by BeFree through a number of analyses and integration with other data sources. BeFree succeeds in identifying genes associated to a major cause of morbidity worldwide, depression, which are not present in other public resources. Moreover, large-scale extraction and analysis of gene-disease associations, and integration with current biomedical knowledge, provided interesting insights on the kind of information that can be found in the literature, and raised challenges regarding data prioritization and curation. We found that only a small proportion of the gene-disease associations discovered by using BeFree is collected in expert-curated databases. Thus, there is a pressing need to find alternative strategies to manual curation, in order to review, prioritize and curate text-mining data and incorporate it into domain-specific databases. We present our strategy for data prioritization and discuss its implications for supporting biomedical research and applications.ConclusionsBeFree is a novel text mining system that performs competitively for the identification of gene-disease, drug-disease and drug-target associations. Our analyses show that mining only a small fraction of MEDLINE results in a large dataset of gene-disease associations, and only a small proportion of this dataset is actually recorded in curated resources (2%), raising several issues on data prioritization and curation. We propose that joint analysis of text mined data with data curated by experts appears as a suitable approach to both assess data quality and highlight novel and interesting information.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-015-0472-9) contains supplementary material, which is available to authorized users.

Read full abstract

AbstractWhile human annotation is crucial for many natural language processing tasks, it is often very expensive and time-consuming. Inspired by previous work on crowdsourcing, we investigate the viability of using non-expert labels instead of gold standard annotations from experts for a machine learning approach to automatic readability prediction. In order to do so, we evaluate two different methodologies to assess the readability of a wide variety of text material: A more traditional setup in which expert readers make readability judgments and a crowdsourcing setup for users who are not necessarily experts. To this purpose two assessment tools were implemented: a tool where expert readers can rank a batch of texts based on readability, and a lightweight crowdsourcing tool, which invites users to provide pairwise comparisons. To validate this approach, readability assessments for a corpus of written Dutch generic texts were gathered. By collecting multiple assessments per text, we explicitly wanted to level out readers' background knowledge and attitude. Our findings show that the assessments collected through both methodologies are highly consistent and that crowdsourcing is a viable alternative to expert labeling. This is a good news as crowdsourcing is more lightweight to use and can have access to a much wider audience of potential annotators. By performing a set of basic machine learning experiments using a feature set that mainly encodes basic lexical and morpho-syntactic information, we further illustrate how the collected data can be used to perform text comparisons or to assign an absolute readability score to an individual text. We do not focus on optimising the algorithms to achieve the best possible results for the learning tasks, but carry them out to illustrate the various possibilities of our data sets. The results on different data sets, however, show that our system outperforms the readability formulas and a baseline language modelling approach. We conclude that readability assessment by comparing texts is a polyvalent methodology, which can be adapted to specific domains and target audiences if required.

Read full abstract

Morphosyntactic Information Research Articles

Related Topics

Articles published on Morphosyntactic Information

Grammatical number processing and anticipatory eye movements are not tightly coordinated in English spoken language comprehension.

Do grammatical-gender distinctions learned in the second language influence native-language lexical processing?

Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research.

The differential time course for consonant and vowel processing in Arabic: implications for language learning and rehabilitation.

The use of case marking for predictive processing in second language Japanese

Semantics and morphosyntax in predictive L2 sentence processing

Assessing document and sentence readability in less resourced languages and across textual genres

MORPHOSYNTAX IN THE BILINGUAL MENTAL LEXICON

An Arabic CCG approach for determining constituent types from Arabic Treebank

Morphosyntax can modulate the N400 component: Event related potentials to gender-marked post-nominal adjectives

On modular approaches to grammar: Evidence from Polish

A New Approach to Tagging in Indian Languages

Izgradnja modelov za prepoznavanje imenskih entitet za hrvaščino in slovenščino

Suffix -mente Adverbs in DAELE, A Spanish Learners' Dictionary

Animacy information outweighs morphological cues in Russian

Integrating meaning and structure in L1–L2 and L2–L1 translations

Individual differences in the second language processing of object–subject ambiguities

Syntactic computation in the human brain: the degree of merger as a key factor.

Using the crowd for readability prediction

The Authorship of the Disputed Federalist Papers with an Annotated Corpus

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Morphosyntactic Information Research Articles

Related Topics

Articles published on Morphosyntactic Information

Grammatical number processing and anticipatory eye movements are not tightly coordinated in English spoken language comprehension.

Do grammatical-gender distinctions learned in the second language influence native-language lexical processing?

Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research.

The differential time course for consonant and vowel processing in Arabic: implications for language learning and rehabilitation.

The use of case marking for predictive processing in second language Japanese

Semantics and morphosyntax in predictive L2 sentence processing

Assessing document and sentence readability in less resourced languages and across textual genres

MORPHOSYNTAX IN THE BILINGUAL MENTAL LEXICON

An Arabic CCG approach for determining constituent types from Arabic Treebank

Morphosyntax can modulate the N400 component: Event related potentials to gender-marked post-nominal adjectives

On modular approaches to grammar: Evidence from Polish

A New Approach to Tagging in Indian Languages

Izgradnja modelov za prepoznavanje imenskih entitet za hrvaščino in slovenščino

Suffix -mente Adverbs in DAELE, A Spanish Learners' Dictionary

Animacy information outweighs morphological cues in Russian

Integrating meaning and structure in L1–L2 and L2–L1 translations

Individual differences in the second language processing of object–subject ambiguities

Syntactic computation in the human brain: the degree of merger as a key factor.

Using the crowd for readability prediction

The Authorship of the Disputed Federalist Papers with an Annotated Corpus