Sub-document Level Research Articles

Clinical de-identification aims to identify Protected Health Information in clinical data, enabling data sharing and publication. First automatic de-identification systems were based on rules or on machine learning methods, limited by language changes, lack of context awareness and time consuming feature engineering. Newer deep learning techniques for sequence labeling have shown better results with a reduction in feature engineering efforts and the use of word representation techniques in vector space. However, they are not able to jointly represent the polysemic and context-dependent nature of words, as well as their morpho-syntactic mutations characteristic of handwriting. To address these limitations, a new de-identification approach based on deep learning techniques for Named Entity Recognition has been proposed, whose key factors are: (i) a Bidirectional Long Short-Term Memory + Conditional Random Field architecture for sequence labeling that takes advantage of the widest possible representation context; (ii) a contextualized language model, working at character level, to capture the polysemy of words and manage the morpho-syntactic variations typical of handwritten notes; (iii) more word representations stacked to better capture latent syntactic and semantic similarities. This approach has been tested on the official Informatics for Integrating Biology & the Bedside 2014 de-identification dataset, showing similar or higher performance than state of the art with respect to category and binary recognition, but without any feature engineering or handcrafted rules. The experiments demonstrate the effectiveness of the proposed approach, in particular with regard to category level recognition which is essential to correctly replace entities with surrogates for anonymization purposes.

Mobile devices enable people to look for information at the moment when their information needs are triggered. While experiencing complex information needs that require multiple search sessions, users may utilize desktop computers to fulfill information needs started on mobile devices. Under the context of mobile-to-desktop web search, this article analyzes users’ behavioral patterns and compares them to the patterns in desktop-to-desktop web search. Then, we examine several approaches of using Mobile Touch Interactions (MTIs) to infer relevant content so that such content can be used for supporting subsequent search queries on desktop computers. The experimental data used in this article was collected through a user study involving 24 participants and six properly designed cross-device web search tasks. Our experimental results show that (1) users’ mobile-to-desktop search behaviors do significantly differ from desktop-to-desktop search behaviors in terms of information exploration, sense-making and repeated behaviors. (2) MTIs can be employed to predict the relevance of click-through documents, but applying document-level relevant content based on the predicted relevance does not improve search performance. (3) MTIs can also be used to identify the relevant text chunks at a fine-grained subdocument level. Such relevant information can achieve better search performance than the document-level relevant content. In addition, such subdocument relevant information can be combined with document-level relevance to further improve the search performance. However, the effectiveness of these methods relies on the sufficiency of click-through documents. (4) MTIs can also be obtained from the Search Engine Results Pages (SERPs). The subdocument feedbacks inferred from this set of MTIs even outperform the MTI-based subdocument feedback from the click-through documents.

Sub-document Level Research Articles

Articles published on Sub-document Level

Combining contextualized word representation and sub-document level analysis through Bi-LSTM+CRF architecture for clinical de-identification

Understanding and Supporting Cross-Device Web Search for Exploratory Tasks with Mobile Touch Interactions

Exploring Co‐training strategies for opinion detection

Modeling object characteristics of dynamic Web content

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Sub-document Level Research Articles

Articles published on Sub-document Level

Combining contextualized word representation and sub-document level analysis through Bi-LSTM+CRF architecture for clinical de-identification

Understanding and Supporting Cross-Device Web Search for Exploratory Tasks with Mobile Touch Interactions

Exploring Co‐training strategies for opinion detection

Modeling object characteristics of dynamic Web content