Approaches of data analysis in the context of Business Intelligence solutions are presented, when the data is scarce with respect to the needs of performing an analysis. Several scenarios are presented: usage of an initial dataset obtained from primary data as a reference for the quality of the results, enriching the dataset through decoration with derived attributes and enriching the dataset with external data. Each type of dataset decoration is used to improve the quality of the analysis ' results. After being subject to improvement using the presented methods, the improved dataset contains a large number of attributes regarding a subject. As some attributes refer to sensitive information or imply sensitive information about the subject, therefore dataset storage needs to prevent unwanted analysis that could reveal such information. A method for dataset partitioning is presented with respect to the predictive capacity of a set of attributes over a sensitive attribute. The proposed partitioning includes also means to hide the link between the real subject and stored data.Keywords: Business Intelligence, Data Mining, Security, Privacy, Dataset Partitioning, Secret Sharing1 IntroductionBusiness Intelligence (BI) helps the de- cision making process. It relies on various data to offer reports, estimations and support [1], At the base of BI, there are mechanisms for data processing. The results of data anal- ysis are highly dependent on the hypothesis made for the analysis, the quality of the data, the algorithms used for processing. There are areas where for a proper analysis, available data is not sufficient or there is room for im- provement. Such cases are discussed in the following. Presented methods improve the results of analysis. Some derived information is sensitive. We take into account that unau- thorized access to the dataset could trigger the case when analysis may disclose sensitive information about the subject represented by a certain instance in the dataset. Security must be assessed and a solution is needed to cover this risk.2 Problem Formulation of Security Issues Generated by Improved Prediction Using Enlarged DatasetsVarious situations arise when BI tries to solve difficult problems that heavily impact the organization. In telecom industry, efforts are made to predict and to prevent existing customers to migrate to another operator from the market. In the electronic commerce, efforts are made to predict what clients need and to send them incentives, offers and bo- nuses. In stock markets and currency mar- kets, predictions are made to estimate quotes and exchange rates.Business intelligence has its power based on several aspects:* the available data used to derive useful information for the business;* the instruments used to process the available data;* the instruments used to present the re- sults to the decision maker.The available data are mainly taken from the records made by the organization, from in- ternal documents such as contracts, orders, invoices etc., activity history. The available data depends on the degree of how much electronic support is used over manual opera- tions. Ideally, data is stored in files in data- bases and readily available when needed.The instruments used to process data are used to:* transform primary data according to var- ious needs;* apply algorithms in order to obtain re- sults.The power of such instruments is given by the flexibility in accessing heterogeneous da- ta and the quality and variety of algorithms used to obtain results.The instruments used to present the results to the decision maker have to hide the complex processing from the previous type of instru- ments and show useful information in a handful way. Reports have to be easy to un- derstand and conclusions must be highlighted in order to bring business value to the organ- ization.A common case is when a certain variable Y is studied. …
Read full abstract