Manhattan Distance Measure Research Articles

The likelihood ratio paradigm for quantifying the strength of evidence has been researched in many fields of forensic science. Within this paradigm, score-based approaches for estimating likelihood ratios are becoming more prevalent in the forensic science literature. In this study, a score-based approach for estimating likelihood ratios is implemented for linguistic text evidence. Text data are represented via a bag-of-words model with the Z-score normalised relative frequencies of selected most-frequent words (the number of the most-frequent words = N), and the Euclidean, Manhattan and Cosine distance measures are trialled as the score-generating functions for comparing paired text samples. The score-to-likelihood-ratio conversion model was built using a common source method, and the best fitting model was selected from the parametric models of the Normal, Log-normal, Gamma and Weibull distributions. With the Amazon Product Data Authorship Verification Corpus, two groups of documents (each group including documents of approximately 700, 1400 and 2100 words) were synthesised for each author, allowing 720 same-author comparisons and 517,680 different-author comparisons to test the validity of the system. A series of experiments was conducted using combinations of the following conditions: the three score functions, the different values of N for the feature vector and the different document lengths. The validity of the system was assessed using the log-likelihood-ratio cost (Cllr), and the strength of the derived likelihood ratios was charted in the form of Tippett plots. It was demonstrated that 1) the Cosine measure consistently outperforms the other measures—the best performance is achieved with N = 260, regardless of the document length (e.g., Cllr values of 0.70640, 0.45314 and 0.30692, respectively, for 700, 1400 and 2100 words)—and 2) the derived likelihood ratios are very well calibrated irrespective of the distance measures and document lengths. A follow-up experiment showed that the described score-based approach is relatively robust and stable for a limited quantity of background data. The derived likelihood ratios that were estimated separately to the three distance measures were logistic regression fused; and the fusion achieved a further improvement in performance—for example, a Cllr of 0.23494 for 2100 words. This study demonstrates the possibility of designing likelihood ratio–based systems that discriminate between same-author and different-author documents.

Read full abstract

Background and objectives: Assessment of drugs toxicity and associated biomarker genes is one of the most important tasks in the pre-clinical phase of drug development pipeline as well as in toxicogenomic studies. There are few statistical methods for the assessment of doses of drugs (DDs) toxicity and their associated biomarker genes. However, these methods consume more time for computation of the model parameters using the EM (expectation-maximization) based iterative approaches. To overcome this problem, in this paper, an attempt is made to propose an alternative approach based on hierarchical clustering (HC) for the same purpose. Methods and materials: There are several types of HC approaches whose performance depends on different similarity/distance measures. Therefore, we explored suitable combinations of distance measures and HC methods based on Japanese Toxicogenomics Project (TGP) datasets for better clustering/co-clustering between DDs and genes as well as to detect toxic DDs and their associated biomarker genes. Results: We observed that Word’s HC method with each of Euclidean, Manhattan, and Minkowski distance measures produces better clustering/co-clustering results. For an example, in the case of the glutathione metabolism pathway (GMP) dataset LOC100359539/Rrm2, Gpx6, RGD1562107, Gstm4, Gstm3, G6pd, Gsta5, Gclc, Mgst2, Gsr, Gpx2, Gclm, Gstp1, LOC100912604/Srm, Gstm4, Odc1, Gsr, Gss are the biomarker genes and Acetaminophen_Middle, Acetaminophen_High, Methapyrilene_High, Nitrofurazone_High, Nitrofurazone_Middle, Isoniazid_Middle, Isoniazid_High are their regulatory (associated) DDs explored by our proposed co-clustering algorithm based on the distance and HC method combination Euclidean: Word. Similarly, for the peroxisome proliferator-activated receptor signaling pathway (PPAR-SP) dataset Cpt1a, Cyp8b1, Cyp4a3, Ehhadh, Plin5, Plin2, Fabp3, Me1, Fabp5, LOC100910385, Cpt2, Acaa1a, Cyp4a1, LOC100365047, Cpt1a, LOC100365047, Angptl4, Aqp7, Cpt1c, Cpt1b, Me1 are the biomarker genes and Aspirin_Low, Aspirin_Middle, Aspirin_High, Benzbromarone_Middle, Benzbromarone_High, Clofibrate_Middle, Clofibrate_High, WY14643_Low, WY14643_High, WY14643_Middle, Gemfibrozil_Middle, Gemfibrozil_High are their regulatory DDs. Conclusions: Overall, the methods proposed in this article, co-cluster the genes and DDs as well as detect biomarker genes and their regulatory DDs simultaneously consuming less time compared to other mentioned methods. The results produced by the proposed methods have been validated by the available literature and functional annotation.

Read full abstract

Manhattan Distance Measure Research Articles

Related Topics

Articles published on Manhattan Distance Measure

Semantic differences and psychological behavior in multi-criteria group decision-making: Do they need consideration?

A unique approach for protein secondary structure comparison under TOPS representation

Multi-dimensional multi-round minimum cost consensus models with iterative mechanisms involving reward and punishment measures

Developing Viola Jones' algorithm for detecting and tracking a human face in video file

Detection of DoS Attacks in Smart City Networks With Feature Distance Maps: A Statistical Approach

An approach to linguistic q-rung orthopair fuzzy multi-attribute decision making with LINMAP based on Manhattan distance measure

The Promise of MOOCs Revisited? Demographics of Learners Preparing for University

Research on the success of unsupervised learning algorithms in indoor location prediction

3-Tuple Linguistic Distance-Based Model for a New Product go/no-go Evaluation

The Fingerprint-Like Pattern of Nocturnal Brain Activity Demonstrated in Young Individuals is Also Present in Senior Adulthood.

Score-based likelihood ratios for linguistic text evidence with a bag-of-words model

Dual Phase CBIR Model using Hybrid Feature Extraction and Manhattan Distance Measure

Password policy characteristics and keystroke biometric authentication

Optimized multimodal biometric system based fusion technique for human identification

Probabilistic Unsupervised Machine Learning Approach for a Similar Image Recommender System for E-Commerce

Approximate relations between Manhattan and Euclidean distance regarding Latin hypercube experimental design

Optimal Classification of Lung Cancer Related Genes using Enhanced reliefF Algorithm and Multiclass Support Vector Machine

Assessment of Drugs Toxicity and Associated Biomarker Genes Using Hierarchical Clustering.

Enhanced Manhattan-based Clustering using Fuzzy C-Means Algorithm for High Dimensional Datasets

Cuckoo Search Algorithm Based Feature Selection in Image Retrieval System

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Manhattan Distance Measure Research Articles

Related Topics

Articles published on Manhattan Distance Measure

Semantic differences and psychological behavior in multi-criteria group decision-making: Do they need consideration?

A unique approach for protein secondary structure comparison under TOPS representation

Multi-dimensional multi-round minimum cost consensus models with iterative mechanisms involving reward and punishment measures

Developing Viola Jones' algorithm for detecting and tracking a human face in video file

Detection of DoS Attacks in Smart City Networks With Feature Distance Maps: A Statistical Approach

An approach to linguistic q-rung orthopair fuzzy multi-attribute decision making with LINMAP based on Manhattan distance measure

The Promise of MOOCs Revisited? Demographics of Learners Preparing for University

Research on the success of unsupervised learning algorithms in indoor location prediction

3-Tuple Linguistic Distance-Based Model for a New Product go/no-go Evaluation

The Fingerprint-Like Pattern of Nocturnal Brain Activity Demonstrated in Young Individuals is Also Present in Senior Adulthood.

Score-based likelihood ratios for linguistic text evidence with a bag-of-words model

Dual Phase CBIR Model using Hybrid Feature Extraction and Manhattan Distance Measure

Password policy characteristics and keystroke biometric authentication

Optimized multimodal biometric system based fusion technique for human identification

Probabilistic Unsupervised Machine Learning Approach for a Similar Image Recommender System for E-Commerce

Approximate relations between Manhattan and Euclidean distance regarding Latin hypercube experimental design

Optimal Classification of Lung Cancer Related Genes using Enhanced reliefF Algorithm and Multiclass Support Vector Machine

Assessment of Drugs Toxicity and Associated Biomarker Genes Using Hierarchical Clustering.

Enhanced Manhattan-based Clustering using Fuzzy C-Means Algorithm for High Dimensional Datasets

Cuckoo Search Algorithm Based Feature Selection in Image Retrieval System