Learning Low-dimensional Representations Research Articles

IntroductionHealthcare settings are becoming increasingly technological. Interactions/events involving healthcare providers and the patients they service are captured as digital text. Healthcare organizations are amassing increasingly large/complex collections of biomedical text data. Researchers and policy makers are beginning to explore these text data holdings for structure, patterns, and meaning. Objectives and ApproachEMRALD is a primary care electronic medical record (EMR) database, comprised of over 40 family medicine clinics, nearly 400 primary care physicians and over 500,000 patients. EMRALD includes full-chart extractions, including all clinical narrative information/data in a variety of fields. The input data (raw text strings) are discrete, sparse and high dimensional. We assessed scalable statistical models for high dimensional discrete data, including fitting, assessing and exploring models from three broad statistical areas: i) matrix factorization/decomposition models ii) probabilistic topic models and iii) word-vector embedding models. ResultsEMRALD is comprised of 12 text data streams. EMRALD text data is structured into 84 million clinical notes (3.5 billion word/language tokens) and is approximately 18Gb in storage size. We employ a “text as data” pipeline, i) mapping raw strings to sequences of word/language tokens, ii) mapping token sequences to numeric arrays, and finally iii) using numeric arrays as inputs to statistical models. Fitted topic models yield useful thematic summaries of the EMRALD corpora. Topics discovered reflect core responsibilities of primary care physicians (e.g. women’s health, pain management, nutrition/diet, etc.). Fitted vector embedding models capture structure of discourse/syntax. Related words are mapped to similar locations of vector spaces. Analogical reasoning is possible in the embedding space. Conclusion/Implications“Text as data” requires an understanding of statistical models for discrete, sparse, high dimensional data. We fit a variety of unsupervised statistical models to biomedical text data. Preliminary results suggest that the learned low dimensional representations of the biomedical text data are effective at uncovering meaningful patterns/structure.

Read full abstract

We search a variety of things over the Internet in our daily lives, and numerous search engines are available to get us more relevant results. With the rapid technological advancement, the internet has become a major source of obtaining information. Further, the advent of the Web2.0 era has led to an increased interaction between the user and the website. It has become challenging to provide information to users as per their interests. Because of copyright restrictions, most of existing research studies are confronting the lack of availability of the content of candidates recommending articles. The content of such articles is not always available freely and hence leads to inadequate recommendation results. Moreover, various research studies base recommendation on user profiles. Therefore, their recommendation needs a significant number of registered users in the system. In recent years, research work proves that Knowledge graphs have yielded better in generating quality recommendation results and alleviating sparsity and cold start issues. Network embedding techniques try to learn high quality feature vectors automatically from network structures, enabling vector-based measurers of node relatedness. Keeping the strength of Network embedding techniques, the proposed citation-based recommendation approach makes use of heterogeneous network embedding in generating recommendation results. The novelty of this paper is in exploiting the performance of a network embedding approach i.e., matapath2vec to generate paper recommendations. Unlike existing approaches, the proposed method has the capability of learning low-dimensional latent representation of nodes (i.e., research papers) in a network. We apply metapath2vec on a knowledge network built by the ACL Anthology Network (all about NLP) and use the node relatedness to generate item (research article) recommendations.

Read full abstract

Learning Low-dimensional Representations Research Articles

Related Topics

Articles published on Learning Low-dimensional Representations

Learning Low-Dimensional Temporal Representations with Latent Alignments.

EHR phenotyping via jointly embedding medical concepts and words into a unified vector space

Content-aware attributed entity embedding for synonymous named entity discovery

Computing low-dimensional representations of speech from socio-auditory structures for phonetic analyses

Learning Unsupervised Representations from Biomedical Text

Adversarial Network Embedding

Neural word and entity embeddings for ad hoc retrieval

Knowledge Driven Paper Recommendation Using Heterogeneous Network Embedding Method

Learning Low-Dimensional Representation of Bivariate Histogram Data

Entity mention aware document representation

Joint Learning of Representations of Medical Concepts and Words from EHR Data.

Joint Feature Selection and Classification for Multilabel Learning.

A-Optimal Projection for Image Representation.

Kernel Pooled Local Subspaces for Classification

Principal manifolds and probabilistic subspaces for visual recognition

Learning low-dimensional representations via the usage of multiple-class labels

Learning low-dimensional representations via the usage of multiple-class labels

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Learning Low-dimensional Representations Research Articles

Related Topics

Articles published on Learning Low-dimensional Representations

Learning Low-Dimensional Temporal Representations with Latent Alignments.

EHR phenotyping via jointly embedding medical concepts and words into a unified vector space

Content-aware attributed entity embedding for synonymous named entity discovery

Computing low-dimensional representations of speech from socio-auditory structures for phonetic analyses

Learning Unsupervised Representations from Biomedical Text

Adversarial Network Embedding

Neural word and entity embeddings for ad hoc retrieval

Knowledge Driven Paper Recommendation Using Heterogeneous Network Embedding Method

Learning Low-Dimensional Representation of Bivariate Histogram Data

Entity mention aware document representation

Joint Learning of Representations of Medical Concepts and Words from EHR Data.

Joint Feature Selection and Classification for Multilabel Learning.

A-Optimal Projection for Image Representation.

Kernel Pooled Local Subspaces for Classification

Principal manifolds and probabilistic subspaces for visual recognition

Learning low-dimensional representations via the usage of multiple-class labels

Learning low-dimensional representations via the usage of multiple-class labels