Benchmark Datasets Research Articles

Evaluating deformable image registration (DIR) algorithms is vital for enhancing algorithm performance and gaining clinical acceptance. However, there is a notable lack of dependable DIR benchmark datasets for assessing DIR performance except for lung images. To address this gap, we aim to introduce our comprehensive liver computed tomography (CT) DIR landmark dataset library. This dataset is designed for efficient and quantitative evaluation of various DIR methods for liver CTs, paving the way for more accurate and reliable image registration techniques. Forty CT liver image pairs were acquired from several publicly available image archives and authors' institutions under institutional review board (IRB) approval. The images were processed with a semi-automatic procedure to generate landmark pairs: (1) for each case, liver vessels were automatically segmented on one image; (2) landmarks were automatically detected at vessel bifurcations; (3) corresponding landmarks in the second image were placed using two deformable image registration methods to avoid algorithm-specific biases; (4) a comprehensive validation process based on quantitative evaluation and manual assessment was applied to reject outliers and ensure the landmarks' positional accuracy. This workflow resulted in an average of ∼56 landmark pairs per image pair, comprising a total of 2220 landmarks for 40 cases. The general landmarking accuracy of this procedure was evaluated using digital phantoms and manual landmark placement. The landmark pair target registration errors (TRE) on digital phantoms were 0.37±0.26 and 0.55±0.34mm respectively for the two selected DIR algorithms used in our workflow, with 97% of landmark pairs having TREs below 1.5mm. The distances from the calculated landmarks to the averaged manual placement were 1.27±0.79mm. All data, including image files and landmark information, are publicly available at Zenodo (https://zenodo.org/records/13738577). Instructions for using our data can be found on our GitHub page at https://github.com/deshanyang/Liver-DIR-QA. The landmark dataset generated in this work is the first collection of large-scale liver CT DIR landmarks prepared on real patient images. This dataset can provide researchers with a dense set of ground truth benchmarks for the quantitative evaluation of DIR algorithms within the liver.

Read full abstract

In today’s digital era, the abundance of online services presents users with a daunting array of choices, spanning from streaming platforms to e-commerce websites, leading to decision fatigue. Recommendation algorithms play a pivotal role in aiding users in navigating this plethora of options, among which collaborative filtering (CF) stands out as a prevalent technique. However, CF encounters several challenges, including scalability issues, privacy implications, and the well-known cold start problem. This study endeavors to mitigate the cold start problem by harnessing the capabilities of natural language processing (NLP) applied to user-generated reviews. A unique methodology is introduced, integrating both supervised and unsupervised NLP approaches facilitated by sci-kit learn, utilizing benchmark datasets across diverse domains. This study offers scientific contributions through its novel approach, ensuring rigor, precision, scalability, and real-world relevance. It tackles the cold start problem in recommendation systems by combining natural language processing (NLP) with machine learning and collaborative filtering techniques, addressing data sparsity effectively. This study emphasizes reproducibility and accuracy while proposing an advanced solution that improves personalization in recommendation models. The proposed NLP-based strategy enhances the quality of user-generated content, consequently refining the accuracy of Collaborative Filtering-Based Recommender Systems (CFBRSs). The authors conducted experiments to test the performance of the proposed approach on benchmark datasets like MovieLens, Jester, Book-Crossing, Last.fm, Amazon Product Reviews, Yelp, Netflix Prize, Goodreads, IMDb (Internet movie Database) Data, CiteULike, Epinions, and Etsy to measure global accuracy, global loss, F-1 Score, and AUC (area under curve) values. Assessment through various techniques such as random forest, Naïve Bayes, and Logistic Regression on heterogeneous benchmark datasets indicates that random forest is the most effective method, achieving an accuracy rate exceeding 90%. Further, the proposed approach received a global accuracy above 95%, a global loss of 1.50%, an F-1 Score of 0.78, and an AUC value of 92%. Furthermore, the experiments conducted on distributed and global differential privacy (GDP) further optimize the system’s efficacy.

Read full abstract

Benchmark Datasets Research Articles

Articles published on Benchmark Datasets

A molecular video-derived foundation model for scientific drug discovery

MACHINE LEARNING, DEEP LEARNING AND ENSEMBLE LEARNING BASED APPROACHES FOR INTRUSION DETECTION ENHANCEMENT

Deep emotion recognition in textual conversations: a survey

LEVERAGING CORPUS LINGUISTICS AND DATA-DRIVEN DEEP LEARNING FOR TEXTUAL EMOTION ANALYSIS

Fusing semantic and syntactic information for aspect sentiment triplet extraction

Take good care of your fish: fish re-identification with synchronized multi-view camera system

A vessel bifurcation liver CT landmark pair dataset for evaluating deformable image registration algorithms.

ChemXTree: A Feature-Enhanced Graph Neural Network-Neural Decision Tree Framework for ADMET Prediction.

StreaMD: the toolkit for high-throughput molecular dynamics simulations.

A tied-weight autoencoder for the linear dimensionality reduction of sample data

Large-Kernel Central Block Masked Convolution and Channel Attention-Based Reconstruction Network for Anomaly Detection of High-Resolution Hyperspectral Imagery

Positive Unlabeled Learning Selected Not At Random (PULSNAR): class proportion estimation without the selected completely at random assumption

SimCDL: A Simple Framework for Contrastive Dictionary Learning

Structure-aware annotation of leucine-rich repeat domains.

Natural Language Processing and Machine Learning-Based Solution of Cold Start Problem Using Collaborative Filtering Approach

AI4ACEIP: A Computing Tool to Identify Food Peptides with High Inhibitory Activity for ACE by Merged Molecular Representation and Rich Intrinsic Sequence Information Based on an Ensemble Learning Strategy.

AdaGIN: Adaptive Graph Interaction Network for Click-Through Rate Prediction

Revisiting representation learning of color information: Color medical image segmentation incorporating quaternion

NFSA-DTI: A Novel Drug–Target Interaction Prediction Model Using Neural Fingerprint and Self-Attention Mechanism

Two-Stream Modality-Based Deep Learning Approach for Enhanced Two-Person Human Interaction Recognition in Videos

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Benchmark Datasets Research Articles

Articles published on Benchmark Datasets

A molecular video-derived foundation model for scientific drug discovery

MACHINE LEARNING, DEEP LEARNING AND ENSEMBLE LEARNING BASED APPROACHES FOR INTRUSION DETECTION ENHANCEMENT

Deep emotion recognition in textual conversations: a survey

LEVERAGING CORPUS LINGUISTICS AND DATA-DRIVEN DEEP LEARNING FOR TEXTUAL EMOTION ANALYSIS

Fusing semantic and syntactic information for aspect sentiment triplet extraction

Take good care of your fish: fish re-identification with synchronized multi-view camera system

A vessel bifurcation liver CT landmark pair dataset for evaluating deformable image registration algorithms.

ChemXTree: A Feature-Enhanced Graph Neural Network-Neural Decision Tree Framework for ADMET Prediction.

StreaMD: the toolkit for high-throughput molecular dynamics simulations.

A tied-weight autoencoder for the linear dimensionality reduction of sample data

Large-Kernel Central Block Masked Convolution and Channel Attention-Based Reconstruction Network for Anomaly Detection of High-Resolution Hyperspectral Imagery

Positive Unlabeled Learning Selected Not At Random (PULSNAR): class proportion estimation without the selected completely at random assumption

SimCDL: A Simple Framework for Contrastive Dictionary Learning

Structure-aware annotation of leucine-rich repeat domains.

Natural Language Processing and Machine Learning-Based Solution of Cold Start Problem Using Collaborative Filtering Approach

AI4ACEIP: A Computing Tool to Identify Food Peptides with High Inhibitory Activity for ACE by Merged Molecular Representation and Rich Intrinsic Sequence Information Based on an Ensemble Learning Strategy.

AdaGIN: Adaptive Graph Interaction Network for Click-Through Rate Prediction

Revisiting representation learning of color information: Color medical image segmentation incorporating quaternion

NFSA-DTI: A Novel Drug–Target Interaction Prediction Model Using Neural Fingerprint and Self-Attention Mechanism

Two-Stream Modality-Based Deep Learning Approach for Enhanced Two-Person Human Interaction Recognition in Videos