Abstract

Many cancer treatments destroy healthy cells along with cancerous ones, and can leave patients fatigued and with a compromised immune system. This makes it especially important to determine whether or not a given cancer treatment will work for the patient or will just cause further harm. Recently there has been work on using gene expression profiles (DNA microarrays) to predict how a patient will respond to a cancer treatment. However, these profiles carry the problem of high dimensionality (a very large number of features (genes) per instance), thus necessitating dimension-reducing techniques such as feature (gene) selection (data preprocessing techniques from the domain of data mining to find an ideal feature set). A particularly promising subset of feature selection techniques are ensemble feature selection techniques, which perform multiple instances of feature selection and aggregate the results into a single decision. Traditionally, this is accomplished by ranking the features in each list by a metric and aggregating the ranks of each feature into a single final decision for the feature. Many forms of aggregation have been considered, both in terms of how to generate the distinct lists and how to combine the ranks from each list. However, all of these works have assumed ranks must be created perlist and then aggregated in a separate step - rather than aggregating the scores of each list directly and performing ranking only on the final list. This work compares two feature list aggregation approaches (rank-based aggregation and score-based aggregation) using the mean aggregation technique in terms of classification. We use fifteen patient response datasets along with three feature selection techniques as the basis for the ensemble feature selection, and we employ four feature subset sizes and two classifiers. Our results show that in general, the rank-based aggregation approach outperforms the score-based aggregation approach for a majority of scenarios for both classifiers. However, this is not always the case and careful consideration is required before making a decision between the two.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call