Element weighted Kemeny distance for ranking data
Preference data are a particular type of ranking data that arise when n individuals express their preferences over a finite set of items. Within this framework, the main issue concerns the aggregation of the preferences to identify a compromise or a “consensus”, defined as the closest ranking (i.e. with the minimum distance or maximum correlation) to the whole set of preferences. Many approaches have been proposed, but they are not sensitive to the importance of items: i.e. changing the rank of a highly-relevant element should result in a higher penalty than changing the rank of a negligible one. The goal of this paper is to investigate the consensus between rankings taking into account the importance of items (element weights). For this purpose, we present: i) an element weighted rank correlation coefficient tau_ew as an extension of the Emond and Mason’s tau, and ii) an element weighted rank distance d_ew as an extension of the Kemeny distance d. The one-to-one correspondence between the weighted distance and the rank correlation coefficient is analytically proved. Moreover, a procedure to obtain the consensus ranking among n individuals is described and its performance is studied both by simulation and by the application to real datasets.
- Research Article
9
- 10.1007/s11634-021-00442-x
- May 28, 2021
- Advances in Data Analysis and Classification
Preference data are a particular type of ranking data where some subjects (voters, judges,...) express their preferences over a set of alternatives (items). In most real life cases, some items receive the same preference by a judge, thus giving rise to a ranking with ties. An important issue involving rankings concerns the aggregation of the preferences into a “consensus”. The purpose of this paper is to investigate the consensus between rankings with ties, taking into account the importance of swapping elements belonging to the top (or to the bottom) of the ordering (position weights). By combining the structure of tau _x proposed by Emond and Mason (J Multi-Criteria Decis Anal 11(1):17–28, 2002) with the class of weighted Kemeny-Snell distances, a position weighted rank correlation coefficient is proposed for comparing rankings with ties. The one-to-one correspondence between the weighted distance and the rank correlation coefficient is proved, analytically speaking, using both equal and decreasing weights.
- Research Article
5
- 10.1016/j.spl.2018.08.010
- Sep 7, 2018
- Statistics & Probability Letters
Estimation of minimum and maximum correlation coefficients
- Research Article
47
- 10.1364/oe.17.010025
- May 29, 2009
- Optics Express
We investigate the dynamics of two semiconductor lasers with separate optical feedback when they are driven by a common signal injected from a chaotic laser under the condition of non-identical drive and response. We experimentally and numerically show conditions under which the outputs of the two lasers can be highly correlated with each other even though the correlation with the drive signal is low. In particular, the effects of the phase of the feedback light on the correlation characteristics are described. The maximum correlation between the two response lasers is obtained when the phase of the feedback light is matched between the two response lasers, while the minimum correlation is observed when the difference in the optical phase is pi. On the other hand, the correlation between the drive and response is not sensitive to the phase of the feedback light, unlike the previously studied case of identical drive and response. We numerically examine the difference between the maximum and minimum cross correlations over a wide range of parameters, and show that it is largest when there is a balance between the injection strength and the feedback strength.
- Research Article
73
- 10.1509/jmr.09.0467
- Jun 1, 2012
- Journal of Marketing Research
The authors present a general consumer preference model for experience products that overcomes the limitations of consumer choice models, especially when it is not easy to consider some qualitative attributes of a product or when there are too many attributes relative to the available amount of preference data, by capturing the effects of unobserved product attributes with the residuals of reference consumers for the same product. They decompose the deterministic component of product utility into two parts: that accounted for by observed attributes and that due to nonobserved attributes. The authors estimate the unobserved component by relating it to the corresponding residuals of virtual experts representing homogeneous groups of people who experienced the product earlier and evaluated it. Their methodology involves identifying such virtual experts and determining the relative importance they should be given in the estimation of the target person's residuals. Using Bayesian estimation methods and Markov chain Monte Carlo simulation inference, the authors apply their approach to two types of consumer preference data: (1) online consumer ratings (stated preferences) data for Internet recommendation services and (2) offline consumer viewership (revealed preferences) data for movies. The results empirically show that this new approach outperforms several alternative collaborative filtering and attribute-based preference models with both in- and out-of-sample fits. The model is applicable to both Internet recommendation services and consumer choice studies.
- Research Article
8
- 10.2139/ssrn.1954001
- Nov 3, 2011
- SSRN Electronic Journal
A General Consumer Preference Model for Experience Products: Application to Internet Recommendation Services
- Research Article
10
- 10.1111/j.2042-3306.2011.00414.x
- Jun 2, 2011
- Equine Veterinary Journal
Clinical studies utilising ordinal data: Pitfalls in the analysis and interpretation of clinical grading systems
- Conference Article
44
- 10.1145/1390334.1390382
- Jul 20, 2008
Designing effective ranking functions is a core problem for information retrieval and Web search since the ranking functions directly impact the relevance of the search results. The problem has been the focus of much of the research at the intersection of Web search and machine learning, and learning ranking functions from preference data in particular has recently attracted much interest. The objective of this paper is to empirically examine several objective functions that can be used for learning ranking functions from preference data. Specifically, we investigate the roles of ties in the learning process. By ties, we mean preference judgments that two documents have equal degree of relevance with respect to a query. This type of data has largely been ignored or not properly modeled in the past. In this paper, we analyze the properties of ties and develop novel learning frameworks which combine ties and preference data using statistical paired comparison models to improve the performance of learned ranking functions. The resulting optimization problems explicitly incorporating ties and preference data are solved using gradient boosting methods. Experimental studies are conducted using three publicly available data sets which demonstrate the effectiveness of the proposed new methods.
- Research Article
1
- 10.17654/0972361724048
- May 16, 2024
- Advances and Applications in Statistics
Procedures are required which are robust (insensitive to changes in extraneous factors not under test) as well as powerful (sensitive to specific factors under test). Our objective is to review some weighted and unweighted measures of rank correlation and compare their power, through a simulation study. A weighted rank correlation is one that emphasizes items with low rankings and de-emphasizes those with high rankings, while unweighted rank correlation assigns equal weight to all levels. In this paper, we aimed to compare the power of 13 different weighted rank correlation coefficients and 9 unweighted rank correlation coefficients for various sample sizes in the presence of outliers. The results show that within the weighted measures, Mature-Abdelfattah 0.9, Blest, Mango, and Costa-Soares have the highest power. It shows also that, within the unweighted measures, the coefficients average slope, median slope, and Spearman Rho have the highest power values. In general, we note that weighted measures own highest power values in the presence of outliers in compare with the unweighted measures, and that quadrant association, Fechner, and Gideon-Hollister have the lowest power values among all the coefficients tested, while in our previous study (Abdelfattah [1]), we found that unweighted measures are more robust than weighted measures. Received: April 9, 2024Revised: April 26, 2024Accepted: May 11, 2024
- Research Article
2
- 10.1108/jfmm-02-2023-0032
- Aug 24, 2023
- Journal of Fashion Marketing and Management: An International Journal
PurposeThis study was carried out to analyze the importance of consumer preference data in forecasting demand in apparel retailing.MethodologyTo collect preference data, 729 hypothetical stock keeping units (SKU) were derived using a full factorial design, from a combination of six attributes and three levels each. From the hypothetical SKU's, 63 practical SKU's were selected for further analysis. Two hundred two responses were collected from a store intercept survey. Respondents' utility scores for all 63 SKUs were calculated using conjoint analysis. In estimating aggregate demand, to allow for consumer substitution and to make the SKU available when a consumer wishes to buy more than one item in the same SKU, top three highly preferred SKU's utility scores of each individual were selected and classified using a decision tree and was aggregated. A choice rule was modeled to include substitution; by applying this choice rule, aggregate demand was estimated.FindingsThe respondents' utility scores were calculated. The value of Kendall's tau is 0.88, the value of Pearson's R is 0.98 and internal predictive validity using Kendall's tau is 1.00, and this shows the high quality of data obtained. The proposed model was used to estimate the demand for 63 SKUs. The demand was estimated at 6.04 per cent for the SKU cotton, regular style, half sleeve, medium priced, private label. The proposed model for estimating demand using consumer preference data gave better estimates close to actual sales than expert opinion data. The Spearman's rank correlation between actual sales and consumer preference data is 0.338 and is significant at 5 per cent level. The Spearman's rank correlation between actual sales and expert opinion is −0.059, and there is no significant relation between expert opinion data and actual sales. Thus, consumer preference model proves to be better in estimating demand than expert opinion data.Research implicationsThere has been a considerable amount of work done in choice-based models. There is a lot of scope in working in deterministic models.Practical implicationThe proposed consumer preference-based demand estimation model can be beneficial to the apparel retailers in increasing their profit by reducing stock-out and overstocking situations. Though conjoint analysis is used in demand estimation in other industries, it is not used in apparel for demand estimations and can be greater use in its simplest form.Originality/valueThis research is the first one to model consumer preferences-based data to estimate demand in apparel. This research was practically tested in an apparel retail store. It is original.
- Research Article
19
- 10.1080/00949655.2014.895354
- Mar 18, 2014
- Journal of Statistical Computation and Simulation
In most of the regression problems the first task is to select the most influential predictors explaining the response, and removing the others from the model. These problems are usually referred to as the variable selection problems in the statistical literature. Numerous methods have been proposed in this field, most of which address linear models. In this study we propose two variable selection criteria for regression based on two powerful dependence measures, maximal correlation and distance correlation. We focus on these two measures since they fully or partially satisfy the Rényi postulates for dependence measures, and thus they are able to detect nonlinear dependence structures. Therefore, our methods are considered to be appropriate in linear as well as nonlinear regression models. Both methods are easy to implement and they perform well. We illustrate the performances of the proposed methods via simulations, and compare them with two benchmark methods, stepwise Akaike information criterion and lasso. In several cases with linear dependence all four methods turned out to be comparable. In the presence of nonlinear or uncorrelated dependencies, we observed that our proposed methods may be favourable. An application of the proposed methods to a real financial data set is also provided.
- Research Article
159
- 10.1002/mcda.313
- Jan 1, 2002
- Journal of Multi-Criteria Decision Analysis
The consensus ranking problem has received much attention in the statistical literature. Given m rankings of n objects the objective is to determine a consensus ranking. The input rankings may contain ties, be incomplete, and may be weighted. Two solution concepts are discussed, the first maximizing the average weighted rank correlation of the solution ranking with the input rankings and the second minimizing the average weighted Kemeny–Snell distance. A new rank correlation coefficient called τx is presented which is shown to be the unique rank correlation coefficient which is equivalent to the Kemeny‐Snell distance metric. The new rank correlation coefficient is closely related to Kendall's tau but differs from it in the way ties are handled. It will be demonstrated that Kendall's τb is flawed as a measure of agreement between weak orderings and should no longer be used as a rank correlation coefficient. The use of τx in the consensus ranking problem provides a more mathematically tractable solution than the Kemeny–Snell distance metric because all the ranking information can be summarized in a single matrix. The methods described in this paper allow analysts to accommodate the fully general consensus ranking problem with weights, ties, and partial inputs. Copyright © 2002 John Wiley & Sons, Ltd.
- Conference Article
5
- 10.1109/isit.2015.7282681
- Jun 1, 2015
Given low order moment information over the random variables X = (X <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sub> , X <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sub> , ..., X <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">p</sub> ) and Y, what distribution minimizes the Hirschfeld-Gebelein-Rényi (HGR) maximal correlation coefficient between X and Y, while remains faithful to the given moments? The answer to this question is important especially in order to fit models over (X, Y) with minimum dependence among the random variables X and Y. In this paper, we investigate this question first in the continuous setting by showing that the jointly Gaussian distribution achieves the minimum HGR correlation coefficient among distributions with the given first and second order moments. Then, we pose a similar question in the discrete scenario by fixing the pairwise marginals of the random variables X and Y. Subsequently, we derive a lower bound for the HGR correlation coefficient over the class of distributions with fixed pairwise marginals. Then we show that this lower bound is tight if there exists a distribution with certain additive structure satisfying the given pairwise marginals. Moreover, the distribution with the additive structure achieves the minimum HGR correlation coefficient. Finally, we conclude by showing that the event of obtaining pairwise marginals containing an additive structured distribution has a positive Lebesgue measure over the probability simplex.
- Research Article
1
- 10.1142/s0218001411008452
- Feb 1, 2011
- International Journal of Pattern Recognition and Artificial Intelligence
Computer face recognition promises to be a powerful tool and is becoming important in our security-heightened world. Several research works on face recognition based on appearance, features like intensity, color, textures or shape have been done over the last decade. In those works, mostly the classification is achieved by finding the minimum distance or maximum variance among the training and testing feature set. This leads to the wrong classification when presenting the untrained image or unknown image, since the classification process locates at least one winning cluster having minimum distance or maximum variance among the existing clusters. But for the security related applications, these new facial image should be reported and necessary action has to be taken accordingly. In this paper, we propose the following two techniques for this purpose: (i) Use a threshold value calculated by finding the average of the minimum matching distances of the wrong classifications encountered during the training phase. (ii) Use the fact that the wrong classification increases the ratio of within-class distance and between-class distance. Experiments have been conducted using the ORL facial database and a fair comparison is made with the conventional feature spaces to show the efficiency of these techniques.
- Research Article
3
- 10.3760/cma.j.issn.1673-0860.2013.05.014
- May 1, 2013
- Chinese journal of otorhinolaryngology head and neck surgery
To discuss the relationship between structural change in nasal cavity and the change of nasal ventilatory function after outfracture of the inferior turbinate. The inferior turbinate outfracture surgery was performed on 50 chronic hypertrophic rhinitis patients who suffered inferior turbinate hypertrophy according to endoscopy and CT scan. Preoperative and postoperative nasal endoscopy was carried out on all patients, by which the distance from the inferior turbinate front mucous membrane to nasal septum (DTNS) was measured. In addition, CT was used to measure the minimal distance between the inside edges of the bilateral inferior turbinate soft tissue (MDTT) and the minimal distance between the bilateral inferior turbinate bones (MDTB) at the central layer of coronal sectional infundibulum; the minimal distance between the inferior turbinate at asial nasal limen (NLDT); inferior turbinate thickness (ITT). In this way, the change in the structure of nasal cavity was evaluated. Acoustic rhinometry and rhinomanometry were utilized to evaluate the ventilatory function of the nasal cavity objectively. Visual analogue scale (VAS) was applied to evaluate the severity of preoperative and postoperative nasal obstruction subjectively. The test data were used to perform match t-test; Spearman rank correlation was adopted to evaluate the relationship between patients' bilateral VAS and nasal inspiratory effective resistance (IER),nasal expiratory effective resistance (EER) and DTNS. The relationship between the total resistance of nasal inspiratory phase as well as the total resistance of nasal expiratory phase and MDTT and MDTB was analyzed. SPSS 20.0 software was used to analyze the data. The preoperative data showed that rightward DTNS was (0.12 ± 0.07) cm, leftward DTNS was (0.10 ± 0.07) cm and MDTT was (0.70 ± 0.13) cm, and postoperative data showed that rightward DTNS was (0.47 ± 0.27) cm, leftward DTNS was (0.43 ± 0.15) cm, and MDTT was (1.05 ± 0.15) cm. Significant differences existed in rightward DTNS, leftward DTNS and MDTT between pre-and post operative data (t values were -8.827, -8.590, -17.525, all P < 0.05). According to the preoperative and postoperative comparison, the difference in MDTB, NLDT, rightward ITT, leftward ITT, IER, EER, 0-5 cm nasal cavity volume (0-5 cm NCV), nasal minimal cross-sectional area (NMCA), rightward VAS and leftward VAS had statistical significance (t values were -23.562, -8.374, 8.693, 6.684, 12.021, 14.510, -6.074, -2.285, 14.042 and 9.925, respectively, all P < 0.05). Patients' bilateral VAS grades had a positive relationship with IER and EER (left side: r values were 0.541 and 0.660, respectively,right side: r values were 0.940 and 0.688, respectively, all P < 0.05). Additionally, patients' VAS had a negative relationship with DTNS (r value was -0.861, P < 0.05). Besides,the total resistance of nasal inspiratory phase had a negative relationship with both MDTT and MDTB (r values were -0.565 and -0.546,respectively, all P < 0.05). The total resistance of nasal expiratory phase had a negative relationship with both MDTT and MDTB (r values were -0.562 and -0.546, all P <0.05). The inferior turbinate outfracture surgery was an ideal surgical method by which nasal cavity could be broadened and nasal ventilatory function improved.
- Research Article
3
- 10.1186/s12863-020-00899-3
- Aug 26, 2020
- BMC Genetics
BackgroundGenome-wide association studies (GWAS) have successfully identified genetic susceptible variants for complex diseases. However, the underlying mechanism of such association remains largely unknown. Most disease-associated genetic variants have been shown to reside in noncoding regions, leading to the hypothesis that regulation of gene expression may be the primary biological mechanism. Current methods to characterize gene expression mediating the effect of genetic variant on diseases, often analyzed one gene at a time and ignored the network structure. The impact of genetic variant can propagate to other genes along the links in the network, then to the final disease. There could be multiple pathways from the genetic variant to the final disease, with each having the chain structure since the first node is one specific SNP (Single Nucleotide Polymorphism) variant and the end is disease outcome. One key but inadequately addressed question is how to measure the between-node connection strength and rank the effects of such chain-type pathways, which can provide statistical evidence to give the priority of some pathways for potential drug development in a cost-effective manner.ResultsWe first introduce the maximal correlation coefficient (MCC) to represent the between-node connection, and then integrate MCC with K shortest paths algorithm to rank and identify the potential pathways from genetic variant to disease. The pathway importance score (PIS) was further provided to quantify the importance of each pathway. We termed this method as “MCC-SP”. Various simulations are conducted to illustrate MCC is a better measurement of the between-node connection strength than other quantities including Pearson correlation, Spearman correlation, distance correlation, mutual information, and maximal information coefficient. Finally, we applied MCC-SP to analyze one real dataset from the Religious Orders Study and the Memory and Aging Project, and successfully detected 2 typical pathways from APOE genotype to Alzheimer’s disease (AD) through gene expression enriched in Alzheimer’s disease pathway.ConclusionsMCC-SP has powerful and robust performance in identifying the pathway(s) from the genetic variant to the disease. The source code of MCC-SP is freely available at GitHub (https://github.com/zhuyuchen95/ADnet).