Artificial neural networks, due to their ability to find the underlying model even in complex highly nonlinear and highly coupled problems, have found significant use as prediction engines in many domains. However, in problems where the input space is of high dimensionality, there is the unsolved problem of reducing dimensionality in some optimal way such that Shannon information important to the prediction is preserved. The important Shannon information may be a subset of total information with an unknown partition, unknown coupling and linear or nonlinear in nature. Solving this problem is an important step in classes of machine learning problems and many data mining applications. This paper describes a semi-automatic algorithm that was developed over a 5-year period while solving problems with increasing dimensionality and difficulty in (a) flow prediction for a magnetically levitated artificial heart (13 dimensions), (b) simultaneous chemical identification/concentration in gas chromatography (22 detection dimensions with wavelet compressed time series of 180,000 points), and finally in (c) financial analytics portfolio prediction in credit card and sub-prime debt problems (80 to 300 dimensions of sparse data with a portfolio value of approximately US$300,000,000.00). The algorithm develops a map of input space combinations and their importance to the prediction. This information is used directly to construct the optimal neural network topology for a given error performance. Importantly, the algorithm also produces information that shows whether the space between input nodes is linear or nonlinear; an important parameter in determining the number of training points required in the reduced dimensionality of the training set. Software was developed in the MatLAB environment using the Artificial Neural Network Toolbox, Parallel and Distributed Computing toolboxes, and runs on Windows or Linux based supercomputers. Trained neural networks can be compiled and linked to server applications and run on normal servers or clusters for transaction or web based processing. In this paper, application of the algorithm to two separate financial analytics prediction problems with large dimensionality and sparse data sets are shown. The algorithm is an important development in machine learning for an important class of problems in prediction, clustering, image analysis, and data mining. In the first example application for subprime debt portfolio analysis, performance of the neural network provided a 98.4% prediction rate, compared to 33% rate using traditional linear methods. In the second example application regarding credit card debt, performance of the algorithm provided a 95% accurate prediction (in terms of match rate), and is 10% better than other methods we have compared against, primarily logistic regression.
Read full abstract