Average Cosine Similarity Research Articles

Abstract Since the last decade, the collective intelligent behavior of groups of animals, birds or insects have attracted the attention of researchers. Swarm intelligence is the branch of artificial intelligence that deals with the implementation of intelligent systems by taking inspiration from the collective behavior of social insects and other societies of animals. Many meta-heuristic algorithms based on aggregative conduct of swarms through complex interactions with no supervision have been used to solve complex optimization problems. Data clustering organizes data into groups called clusters, such that each cluster has similar data. It also produces clusters that could be disjoint. Accuracy and efficiency are the important measures in data clustering. Several recent studies describe bio-inspired systems as information processing systems capable of some cognitive ability. However, existing popular bio-inspired algorithms for data clustering ignored good balance between exploration and exploitation for producing better clustering results. In this article, we propose a bio-inspired algorithm, namely social spider optimization (SSO), for clustering that maintains a good balance between exploration and exploitation using female and male spiders, respectively. We compare results of the proposed algorithm SSO with K means and other nature-inspired algorithms such as particle swarm optimization (PSO), ant colony optimization (ACO) and improved bee colony optimization (IBCO). We find it to be more robust as it produces better clustering results. Although SSO solves the problem of getting stuck in the local optimum, it needs to be modified for locating the best solution in the proximity of the generated global solution. Hence, we hybridize SSO with K means, which produces good results in local searches. We compare proposed hybrid algorithms SSO+K means (SSOKC), integrated SSOKC (ISSOKC), and interleaved SSOKC (ILSSOKC) with K means+PSO (KPSO), K means+genetic algorithm (KGA), K means+artificial bee colony (KABC) and interleaved K means+IBCO (IKIBCO) and find better clustering results. We use sum of intra-cluster distances (SICD), average cosine similarity, accuracy and inter-cluster distance to measure and validate the performance and efficiency of the proposed clustering techniques.

Read full abstract

ObjectiveData in electronic health records (EHRs) is being increasingly leveraged for secondary uses, ranging from biomedical association studies to comparative effectiveness. To perform studies at scale and transfer knowledge from one institution to another in a meaningful way, we need to harmonize the phenotypes in such systems. Traditionally, this has been accomplished through expert specification of phenotypes via standardized terminologies, such as billing codes. However, this approach may be biased by the experience and expectations of the experts, as well as the vocabulary used to describe such patients. The goal of this work is to develop a data-driven strategy to (1) infer phenotypic topics within patient populations and (2) assess the degree to which such topics facilitate a mapping across populations in disparate healthcare systems. MethodsWe adapt a generative topic modeling strategy, based on latent Dirichlet allocation, to infer phenotypic topics. We utilize a variance analysis to assess the projection of a patient population from one healthcare system onto the topics learned from another system. The consistency of learned phenotypic topics was evaluated using (1) the similarity of topics, (2) the stability of a patient population across topics, and (3) the transferability of a topic across sites. We evaluated our approaches using four months of inpatient data from two geographically distinct healthcare systems: (1) Northwestern Memorial Hospital (NMH) and (2) Vanderbilt University Medical Center (VUMC). ResultsThe method learned 25 phenotypic topics from each healthcare system. The average cosine similarity between matched topics across the two sites was 0.39, a remarkably high value given the very high dimensionality of the feature space. The average stability of VUMC and NMH patients across the topics of two sites was 0.988 and 0.812, respectively, as measured by the Pearson correlation coefficient. Also the VUMC and NMH topics have smaller variance of characterizing patient population of two sites than standard clinical terminologies (e.g., ICD9), suggesting they may be more reliably transferred across hospital systems. ConclusionsPhenotypic topics learned from EHR data can be more stable and transferable than billing codes for characterizing the general status of a patient population. This suggests that EHR-based research may be able to leverage such phenotypic topics as variables when pooling patient populations in predictive models.

Read full abstract

Average Cosine Similarity Research Articles

Related Topics

Articles published on Average Cosine Similarity

Підхід до виявлення аномалій в потоках тектових даних

A Study on the Synthetic ECG Generation for User Recognition

Pulmonary nodule segmentation with CT sample synthesis using adversarial networks.

Feasibility of Diagnosing Both Severity and Features of Diabetic Retinopathy in Fundus Photography

A text-Image feature mapping algorithm based on transfer learning

Repeatability of electromyography recordings and muscle synergies during gait among children with cerebral palsy

Predicting Mission Alignment and Preventing Mission Drift: Do Revenue Sources Matter?

A Novel Bio-Inspired Algorithm Based on Social Spiders for Improving Performance and Efficiency of Data Clustering

Intra-Subject Consistency during Locomotion: Similarity in Shared and Subject-Specific Muscle Synergies.

Predicting Mission Alignment and Preventing Mission Drift: How Revenue Sources Matter?

Building bridges across electronic health record systems through inferred phenotypic topics

Matching Medical Websites to Medical Guidelines through Clinical Vocabularies in the Context of Website Quality Assessment

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Average Cosine Similarity Research Articles

Related Topics

Articles published on Average Cosine Similarity

Підхід до виявлення аномалій в потоках тектових даних

A Study on the Synthetic ECG Generation for User Recognition

Pulmonary nodule segmentation with CT sample synthesis using adversarial networks.

Feasibility of Diagnosing Both Severity and Features of Diabetic Retinopathy in Fundus Photography

A text-Image feature mapping algorithm based on transfer learning

Repeatability of electromyography recordings and muscle synergies during gait among children with cerebral palsy

Predicting Mission Alignment and Preventing Mission Drift: Do Revenue Sources Matter?

A Novel Bio-Inspired Algorithm Based on Social Spiders for Improving Performance and Efficiency of Data Clustering

Intra-Subject Consistency during Locomotion: Similarity in Shared and Subject-Specific Muscle Synergies.

Predicting Mission Alignment and Preventing Mission Drift: How Revenue Sources Matter?

Building bridges across electronic health record systems through inferred phenotypic topics

Matching Medical Websites to Medical Guidelines through Clinical Vocabularies in the Context of Website Quality Assessment