Terms Of F1 Research Articles

ContextPrevious studies have shown that steered training data or dataset selection can lead to better performance for cross project defect prediction(CPDP). On the other hand, feature selection and data quality are issues to consider in CPDP. ObjectiveWe aim at utilizing the Nearest Neighbor (NN)-Filter, embedded in genetic algorithm to produce validation sets for generating evolving training datasets to tackle CPDP while accounting for potential noise in defect labels. We also investigate the impact of using different feature sets. MethodWe extend our proposed approach, Genetic Instance Selection (GIS), by incorporating feature selection in its setting. We use 41 releases of 11 multi-version projects to assess the performance GIS in comparison with benchmark CPDP (NN-filter and Naive-CPDP) and within project (Cross-Validation(CV) and Previous Releases(PR)). To assess the impact of feature sets, we use two sets of features, SCM+OO+LOC(all) and CK+LOC(ckloc) as well as iterative info-gain subsetting(IG) for feature selection. ResultsGIS variant with info gain feature selection is significantly better than NN-Filter (all,ckloc,IG) in terms of F1 (p=values≪0.001, Cohen’s d={0.621,0.845,0.762}) and G (p=values≪0.001, Cohen’s d={0.899,1.114,1.056}), and Naive CPDP (all,ckloc,IG) in terms of F1 (p=values≪0.001, Cohen’s d={0.743,0.865,0.789}) and G (p=values≪0.001, Cohen’s d={1.027,1.119,1.050}). Overall, the performance of GIS is comparable to that of within project defect prediction (WPDP) benchmarks, i.e. CV and PR. In terms of multiple comparisons test, all variants of GIS belong to the top ranking group of approaches. ConclusionsWe conclude that datasets obtained from search based approaches combined with feature selection techniques is a promising way to tackle CPDP. Especially, the performance comparison with the within project scenario encourages further investigation of our approach. However, the performance of GIS is based on high recall in the expense of a loss in precision. Using different optimization goals, utilizing other validation datasets and other feature selection techniques are possible future directions to investigate.

Read full abstract

The identification of semantic relationships, as expressed between named entities in text, is an important step for extracting knowledge from large document collections, such as the Web. Previous works have addressed this task for the English language through supervised learning techniques for automatic classification. The current state of the art involves the use of learning methods based on string kernels. However, such approaches require manually annotated training data for each type of semantic relationship, and have scalability problems when tens or hundreds of different types of relationships have to be extracted. This article discusses an approach for distantly supervised relation extraction over texts written in the Portuguese language, which uses an efficient technique for measuring similarity between relation instances, based on minwise hashing and on locality sensitive hashing. In the proposed method, the training examples are automatically collected from Wikipedia, corresponding to sentences that express semantic relationships between pairs of entities extracted from DBPedia. These examples are represented as sets of character quadgrams and other representative elements. The sets are indexed in a data structure that implements the idea of locality-sensitive hashing. To check which semantic relationship is expressed between a given pair of entities referenced in a sentence, the most similar training examples are searched, based on an approximation to the Jaccard coefficient, obtained through min-hashing. The relation class is assigned with basis on the weighted votes of the most similar examples. Tests with a dataset from Wikipedia validate the suitability of the proposed method, showing, for instance, that the method is able to extract 10 different types of semantic relations, 8 of them corresponding to asymmetric relations, with an average score of 55.6%, measured in terms of F1.

Read full abstract

Terms Of F1 Research Articles

Articles published on Terms Of F1

Multitask defect prediction

Mining Fraudsters and Fraudulent Strategies in Large-Scale Mobile Social Networks

Factor Graph Model Based User Profile Matching Across Social Networks

A new sequence based encoding for prediction of host–pathogen protein interactions

Predicting Effectiveness of Generate-and-Validate Patch Generation Systems Using Random Forest

A novel sentence similarity model with word embedding based on convolutional neural network

A benchmark study on the effectiveness of search-based data selection and feature selection for cross project defect prediction

Shot boundary detection using perceptual and semantic information

A machine‐learning approach to negation and speculation detection for sentiment analysis

SUPERVISED TERM WEIGHTING METHODS FOR URL CLASSIFICATION

Analyzing the Cognitive Level of Classroom Questions Using Machine Learning Techniques

Extracção de Relações Semânticas de Textos em Português Explorando a DBpédia e a Wikipédia

Transport Properties of Topological Insulator-Based Ferromagnet/f-Wave Superconductor Junction

RSVD-based Dimensionality Reduction for RecommenderSystems

On Fiber Coefficients of Fiber Cones†

A parametric area function model of three female vocal tracts based on orthogonal modes

Multiply warped products with nonsmooth metrics

MONITORING THROMBIN GENERATION WITH PROTHROMBIN FRAGMENT 1.2 ASSAY DURING CARDIOPULMONARY BYPASS SURGERY

The generalized damped cubic equation: integrability and general solution

Vowel perception in noise

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Terms Of F1 Research Articles

Articles published on Terms Of F1

Multitask defect prediction

Mining Fraudsters and Fraudulent Strategies in Large-Scale Mobile Social Networks

Factor Graph Model Based User Profile Matching Across Social Networks

A new sequence based encoding for prediction of host–pathogen protein interactions

Predicting Effectiveness of Generate-and-Validate Patch Generation Systems Using Random Forest

A novel sentence similarity model with word embedding based on convolutional neural network

A benchmark study on the effectiveness of search-based data selection and feature selection for cross project defect prediction

Shot boundary detection using perceptual and semantic information

A machine‐learning approach to negation and speculation detection for sentiment analysis

SUPERVISED TERM WEIGHTING METHODS FOR URL CLASSIFICATION

Analyzing the Cognitive Level of Classroom Questions Using Machine Learning Techniques

Extracção de Relações Semânticas de Textos em Português Explorando a DBpédia e a Wikipédia

Transport Properties of Topological Insulator-Based Ferromagnet/f-Wave Superconductor Junction

RSVD-based Dimensionality Reduction for RecommenderSystems

On Fiber Coefficients of Fiber Cones†

A parametric area function model of three female vocal tracts based on orthogonal modes

Multiply warped products with nonsmooth metrics

MONITORING THROMBIN GENERATION WITH PROTHROMBIN FRAGMENT 1.2 ASSAY DURING CARDIOPULMONARY BYPASS SURGERY

The generalized damped cubic equation: integrability and general solution

Vowel perception in noise