Ensemble Feature Learning of Genomic Data Using Support Vector Machine.

Ali Anaissi; Madhu Goyal; Daniel R Catchpoole; Ali Braytee; Paul J Kennedy

Repository landing page

oai:doaj.org/article:5fd0347f9a87460d9ada146d92d07f88

Ensemble Feature Learning of Genomic Data Using Support Vector Machine.

Authors: Ali Anaissi
Madhu Goyal
Daniel R Catchpoole
Ali Braytee
Paul J Kennedy
Publication date: 1 January 2016
Publisher: 'Public Library of Science (PLoS)'
Doi

Abstract

The identification of a subset of genes having the ability to capture the necessary information to distinguish classes of patients is crucial in bioinformatics applications. Ensemble and bagging methods have been shown to work effectively in the process of gene selection and classification. Testament to that is random forest which combines random decision trees with bagging to improve overall feature selection and classification accuracy. Surprisingly, the adoption of these methods in support vector machines has only recently received attention but mostly on classification not gene selection. This paper introduces an ensemble SVM-Recursive Feature Elimination (ESVM-RFE) for gene selection that follows the concepts of ensemble and bagging used in random forest but adopts the backward elimination strategy which is the rationale of RFE algorithm. The rationale behind this is, building ensemble SVM models using randomly drawn bootstrap samples from the training set, will produce different feature rankings which will be subsequently aggregated as one feature ranking. As a result, the decision for elimination of features is based upon the ranking of multiple SVM models instead of choosing one particular model. Moreover, this approach will address the problem of imbalanced datasets by constructing a nearly balanced bootstrap sample. Our experiments show that ESVM-RFE for gene selection substantially increased the classification performance on five microarray datasets compared to state-of-the-art methods. Experiments on the childhood leukaemia dataset show that an average 9% better accuracy is achieved by ESVM-RFE over SVM-RFE, and 5% over random forest based approach. The selected genes by the ESVM-RFE algorithm were further explored with Singular Value Decomposition (SVD) which reveals significant clusters with the selected data

Similar works

Full text

Directory of Open Access Journals

oai:doaj.org/article:5fd0347f9...

Last time updated on 09/08/2016

This paper was published in Directory of Open Access Journals.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.