Symmetric Data Sets Research Articles

Whole Slide Image (WSI) datasets are giga-pixel resolution, unstructured histopathology datasets that consist of extremely big files (each can be as large as multiple GBs in compressed format). These datasets have utility in a wide range of diagnostic and investigative pathology applications. However, the datasets present unique challenges: The size of the files, propriety data formats, and lack of efficient parallel data access libraries limit the scalability of these applications. Commercial clouds provide dynamic, cost-effective, scalable infrastructure to process these datasets, however, we lack the tools and algorithms that will transfer/transform them onto the cloud seamlessly, providing faster speeds and scalable formats. In this study, we present novel algorithms that transfer these datasets onto the cloud while at the same time transforming them into symmetric scalable formats. Our algorithms use intelligent file size distribution, and pipelining transfer and transformation tasks without introducing extra overhead to the underlying system. The algorithms, tested in the Amazon Web Services (AWS) cloud, outperform the widely used transfer tools and algorithms, and also outperform our previous work. The data access to the transformed datasets provides better performance compared to the related work. The transformed symmetric datasets are fed into three different analytics applications: a distributed implementation of a content-based image retrieval (CBIR) application for prostate carcinoma datasets, a deep convolutional neural network application for classification of breast cancer datasets, and to show that the algorithms can work with any spatial dataset, a Canny Edge Detection application on satellite image datasets. Although different in nature, all of the applications can easily work with our new symmetric data format and performance results show near-linear speed-ups as the number of processors increases.

Read full abstract

The present paper aims to improve the collective interpretation realized by compressing multi-layered neural networks and to make the interpretation as natural and stable as possible. We collectively interpret the final representations by maximizing mutual information between inputs and neurons, expecting that mutual information maximization can disentangle complex features into simpler ones. However, we have had difficulty in increasing mutual information and in obtaining interpretable features for several data sets. By examining closely the processes of information maximization, we found that, in addition to the information maximization, we need to consider the cost associated with this information maximization. Thus, we try to maximize not simply mutual information but the ratio of mutual information to the cost, and this method can be called “cost-conscious mutual information maximization.” The cost-conscious method aims to extend Linsker’s maximum information preservation principle to a variety of data sets by more directly taking into account the cost associated with the process of information maximization. The method was applied to two data sets: the artificial and symmetric data set and the credit default data set. First, by using the symmetric data set injected with random noises, the cost-conscious information maximization method could extract the symmetric property almost perfectly against the random noises. In the experimental results on the credit default data set, the present method could make it possible to interpret the final results the most naturally, showing why and how the credit default could occur very naturally. The experimental results show that the neural networks can be used to interpret data sets more naturally than the conventional methods such as the logistic regression analysis.

Read full abstract

Symmetric Data Sets Research Articles

Related Topics

Articles published on Symmetric Data Sets

Calibrated EWMA estimators for time-scaled surveys with diverse applications

A flexible generalized XLindley distribution with application to engineering

Analysis, Estimation, and Practical Implementations of the Discrete Power Quasi‐Xgamma Distribution

Optimizing Mean Estimators with Calibrated Minimum Covariance Determinant in Median Ranked Set Sampling

A Novel Odd Beta Prime-Logistic Distribution: Desirable Mathematical Properties and Applications to Engineering and Environmental Data

Calibration Estimation of Cumulative Distribution Function Using Robust Measures

Innovations in Urdu Sentiment Analysis Using Machine and Deep Learning Techniques for Two-Class Classification of Symmetric Datasets

A Generalization of Burr Type XII Distribution with Properties, Copula and Modeling Symmetric and Skewed Real Data Sets

Cost-forced and repeated selective information minimization and maximization for multi-layered neural networks1

A Deep Learning Model for Network Intrusion Detection with Imbalanced Data

Performance-efficient distributed transfer and transformation of big spatial histopathology datasets in the cloud

A discrete analogue of odd Weibull-G family of distributions: properties, classical and Bayesian estimation with applications to count data

A Flexible Extension to an Extreme Distribution

A New Parametric Life Family of Distributions: Properties, Copula and Modeling Failure and Service Times

A Generalization of Binomial Exponential-2 Distribution: Copula, Properties and Applications

A Generalization of Reciprocal Exponential Model: Clayton Copula, Statistical Properties and Modeling Skewed and Symmetric Real Data Sets

Cost-conscious mutual information maximization for improving collective interpretation of multi-layered neural networks

A New Flexible Three-Parameter Model: Properties, Clayton Copula, and Modeling Real Data

Generalized inverted Kumaraswamy generated family of distributions: theory and applications

Axially Symmetric Data Clustering Through Dirichlet Process Mixture Models of Watson Distributions.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Symmetric Data Sets Research Articles

Related Topics

Articles published on Symmetric Data Sets

Calibrated EWMA estimators for time-scaled surveys with diverse applications

A flexible generalized XLindley distribution with application to engineering

Analysis, Estimation, and Practical Implementations of the Discrete Power Quasi‐Xgamma Distribution

Optimizing Mean Estimators with Calibrated Minimum Covariance Determinant in Median Ranked Set Sampling

A Novel Odd Beta Prime-Logistic Distribution: Desirable Mathematical Properties and Applications to Engineering and Environmental Data

Calibration Estimation of Cumulative Distribution Function Using Robust Measures

Innovations in Urdu Sentiment Analysis Using Machine and Deep Learning Techniques for Two-Class Classification of Symmetric Datasets

A Generalization of Burr Type XII Distribution with Properties, Copula and Modeling Symmetric and Skewed Real Data Sets

Cost-forced and repeated selective information minimization and maximization for multi-layered neural networks1

A Deep Learning Model for Network Intrusion Detection with Imbalanced Data

Performance-efficient distributed transfer and transformation of big spatial histopathology datasets in the cloud

A discrete analogue of odd Weibull-G family of distributions: properties, classical and Bayesian estimation with applications to count data

A Flexible Extension to an Extreme Distribution

A New Parametric Life Family of Distributions: Properties, Copula and Modeling Failure and Service Times

A Generalization of Binomial Exponential-2 Distribution: Copula, Properties and Applications

A Generalization of Reciprocal Exponential Model: Clayton Copula, Statistical Properties and Modeling Skewed and Symmetric Real Data Sets

Cost-conscious mutual information maximization for improving collective interpretation of multi-layered neural networks

A New Flexible Three-Parameter Model: Properties, Clayton Copula, and Modeling Real Data

Generalized inverted Kumaraswamy generated family of distributions: theory and applications

Axially Symmetric Data Clustering Through Dirichlet Process Mixture Models of Watson Distributions.