Analyzing Imputed Financial Data: A New Approach to Cluster Analysis

Ramon P Degennaro,Halima Bensmail

doi:10.2139/ssrn.594383

Ramon P Degennaro, Halima Bensmail

Open Access

https://doi.org/10.2139/ssrn.594383

Copy DOI

Journal: SSRN Electronic Journal	Publication Date: Jan 1, 2004
Citations: 10	License type: other-oa

Affiliation: University of Tennessee at Knoxville

Abstract

Abstract: The authors introduce a novel statistical modeling technique to cluster analysis and apply it to financial data. Their two main goals are to handle missing data and to find homogeneous groups within the data. Their approach is flexible and handles large and complex data structures with missing observations and with quantitative and qualitative measurements. The authors achieve this result by mapping the data to a new structure that is free of distributional assumptions in choosing homogeneous groups of observations. Their new method also provides insight into the number of different categories needed for classifying the data. The authors use this approach to partition a matched sample of stocks. One group offers dividend reinvestment plans, and the other does not. Their method partitions this sample with almost 97 percent accuracy even when using only easily available financial variables. One interpretation of their result is that the misclassified companies are the best candidates either to adopt a dividend reinvestment plan (if they have none) or to abandon one (if they currently offer one). The authors offer other suggestions for applications in the field of finance. JEL classification: G20, G29, G35 Key words: dividend reinvestment, Bayesian analysis, Gibbs sampler, clustering Analyzing Imputed Financial Data: A New Approach to Cluster Analysis 1. Introduction We introduce and apply a novel statistical approach to cluster analysis for financial data in this paper. We have two main goals. First, we wish to handle cases in which a subset of variables is missing for some observations. Second, we wish to find homogeneous groups within the data. Put differently, we want to determine the most likely number of categories comprising the data, and to assign observations to those categories optimally. Our approach is flexible in that it handles large and complex data structures with missing observations and with both quantitative and qualitative measurements. We achieve this by mapping the data to a new structure that is free of distributional assumptions in choosing homogeneous groups of observations. For example, when processing credit card transactions of customers, a company may want to explore the possibility of encouraging different or additional transactions by those customers. In this case, the task is to find homogeneous transactions and to forecast the willingness of a new customer to use the credit card to make a different or additional transaction, even if the data are not continuous and even if there are missing data. Our new method also provides the researcher with insight into the number of different categories needed for classifying the data. Classification methods have a long history of productive uses in business and finance. Perhaps the most common are discrete choice models. Among these, the multinomial logit approach has been used at least as far back as Holman and Marley (in Luce and Suppes, 1965). McFadden (1978) introduced the Generalized Extreme Value model in his study of residential location, and Koppelman and Wen (1997) have recently developed newer variations. The nested logit model of Ben-Akiva (1973) is designed to handle correlations among alternatives. Yet another variation of multinomial logic has been developed or used by Bierlaire, Lotan and Toint (1997). More recently, Calhoun and Deng (2000) use multinomial logit models to study loan terminations. Another form of discrete choice model is cluster analysis. Shaffer (1991) offers one example. He studies federal deposit insurance funding and considers its influence on taxpayers. Dalhstedt, Salmi, Luoma, and Laakkonen (1994) use cluster analysis to demonstrate that comparing financial ratios across firms is problematic. They argue that care is necessary even when the firms belong to the same official International Standard Industrial Classification category. von Altrock (1995) explains how fuzzy logic, a variation of cluster analysis, can be useful in practical business applications. …

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Analyzing Imputed Financial Data: A New Approach to Cluster Analysis

Abstract

Talk to us

Similar Papers

More From: SSRN Electronic Journal

Lead the way for us

Similar Papers

Cluster Analysis of Imputed Financial Data Using an Augmentation-Based Algorithm
H Bensmail ... R DeGennaro
-
H Bensmail, et. al.H Bensmail ... R DeGennaro
29 Jul 2003
29 Jul 2003

Estimating the Potential Modal Split of Any Future Mode Using Revealed Preference Data
Gijsbert Koen De Clercq ... Bart Van Arem
Journal of Advanced Transportation | VOL. 2022
Gijsbert Koen De Clercq, et. al.Gijsbert Koen De Clercq ... Bart Van Arem
16 Dec 2022
Journal of Advanced Transportation | VOL. 2022

Price Pressure from Dividend Reinvestment Activity: Evidence from Closed-End Funds
Jennifer L Blouin ... C Bryan Bryan Cloyd
SSRN Electronic Journal | VOL. -
Jennifer L Blouin, et. al.Jennifer L Blouin ... C Bryan Bryan Cloyd
09 Mar 2005
SSRN Electronic Journal | VOL. -

A Longitudinal Study of Possible Links between Tax Imputation and Dividend Reinvestment Plans: Australian Evidence
Mathew Abraham
SSRN Electronic Journal | VOL. -
Mathew AbrahamMathew Abraham
30 Apr 2014
SSRN Electronic Journal | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Analyzing Imputed Financial Data: A New Approach to Cluster Analysis

Abstract

Talk to us

Similar Papers

More From: SSRN Electronic Journal