Sort by
Improved Techniques for Training Tabular GANs Using Cramer’s V Statistics

Considering the growing global demand for machine learning training data, synthetic data generation is a reasonable way to address the versatile challenges in data acquisition. Conditional Tabular Generative Adversarial Network (CTGAN), an extension of the widely used Generative Adversarial Network (GAN), is considered one of the most promising techniques in the field of tabular data generation. Despite numerous successes of CTGAN, a lack of preserving categorical dependencies within the data has been identified. In prior work, the Cramer’s V (CV) as a natural metric for representing the correlation of categorical dependencies was proposed for hyperparameter tuning of CTGAN models. In this paper, we explore two novel strategies to directly integrate CV statistics of data batches within CTGAN training. The first approach is a generator loss term that penalizes differences between the CV statistics of the original and generated data. The second innovation is the extraction of the CV matrix as an additional feature for the critic. By applying our proposed methods to three benchmark datasets, we improve the averaged accuracy of supervised learning models trained on synthesized data by 11 % compared to the legacy CTGAN. We also outline the impact of CV statistics on preserving dependencies between categorical data columns in terms of integrity and contingency similarity, discuss existing challenges, and identify potential improvements.

Open Access
Relevant
Liver Segmentation in Ultrasound Images Using Self-Supervised Learning with Physics-inspired Augmentation and Global-Local Refinement

Shear Wave Elastography (SWE) is a non-invasive ultrasound method that evaluates changes in liver stiffness, serving as a useful biomarker for liver fibrosis. The proper placement of a region of interest (ROI) on the liver in the B-mode image is imperative for obtaining accurate and dependable results in SWE. In order to develop an automated system for liver fibrosis measurement utilizing SWE, the initial crucial phase involves the segmentation of the liver capsule. This paper presents a novel approach for liver segmentation in ultrasound images using a contrastive self-supervised learning approach. The proposed method leverages a large dataset of unannotated abdominal ultrasound images to learn the feature representations, which are then fine-tuned on the downstream task of liver segmentation. The algorithm is trained in two stages: in the first stage a SimCLR model is trained to learn the feature representations from non-labeled data, and in the second stage these representations are fine-tuned with a smaller annotated dataset of liver segmentation masks. Finally, this is followed by a refinement step using CascadePSP. The study also investigates the use of physics-inspired augmentations, such as sector angle and penetration to improve the performance of the deep learning model on ultrasound images. The proposed approach of SimCLR+ENet was compared against the state-of-the-art method U-Net. Evaluation of the average Dice similarity showed that SimCLR+ENet outperformed U-Net with a result of 90.58% compared to 89.77%. Similarly, the average Huasdorff distance evaluation demonstrated that SimCLR+ENet achieved superior performance with a value of 21.71 compared to U-Net’s 29.53. This highlights the effectiveness of the proposed approach, with performance improvements of 0.9% and 26.5% for the average Dice coefficient and average Hausdorff distance, respectively. The study provides insights into the use of physics-inspired augmentations in the medical ultrasound imaging field and highlights the potential for self-supervised learning in improving segmentation results.

Open Access
Relevant
Detecting Malicious .NET Files Using CLR Header Features and Machine Learning

The .Net Framework has made writing windows applications easier than ever. Several programming languages can be used to write software using the .Net Framework, the most common one being C#. Due to the abundance of modules and pre-built functionalities that allow programmers to easily manipulate the windows operating system with high abstraction and no need for low-level coding, the .Net framework has also become a desirable environment for malicious actors to write their malware. To best of our knowledge, researchers have been treating .NET malware and other malware the same way by utilizing features from the PE header to classify the files. This is not possible for.Net files because their PE headers are nearly identical. In this paper, we tackle the problem of detecting malicious .Net files by extracting features from the CLR header. As far as we know, we are the first ones to explore this approach. Furthermore, we create a new dataset comprised of.Net malware and benign files, which we freely distribute to the research community. Finally, we assess the performance of several machine learning algorithms to detect malicious .NET files. The random forest model was the best solution among the set of algorithms tested, exhibiting a performance of 92% for this predictive task.

Open Access
Relevant