Sparse Linear Discriminant Analysis with more Variables than Observations

Gebru, Tsegay Gebrehiwot (2018). Sparse Linear Discriminant Analysis with more Variables than Observations. PhD thesis The Open University.

DOI: https://doi.org/10.21954/ou.ro.0000e621

Abstract

It is known that classical linear discriminant analysis (LDA) performs classification well when the number of observations is much larger than the number of variables. However, when the number of variables is larger than the number of observations, classical LDA cannot be performed because the within-group covariance matrix is singular. Recently proposed LDA methods that can handle singular within-group covariance matrix were reviewed. Most of these methods focus on regularizing the within-class covariance matrix. However, they give less attention to sparsity ( selecting variables), interpretation and computational cost, which are important in high-dimensional problems. The fact that most of the original variables may be irrelevant or redundant suggests looking for sparse solutions that involve only a small portion of the variables. In the present work, new sparse LDA methods are proposed that are suited to high-dimensional data. The first two methods assume groups share a common within-group covariance matrix and approximate this matrix by a diagonal matrix. One of these methods is a variant of the other that sacrifices some accuracy for greater computational speed. Both methods obtain sparsity by minimizing an l1 norm and maximizing discrimination power under a common loss function with a tuning parameter. The third method assumes that groups share common eigenvector in eigenvector-eigenvalue decomposition of their within-group covariance matrices, while their eigenvalues may differ. The fourth method assumes the within-group covariance matrices are proportional to each other. The fifth method is derived from the Dantzig selector and uses optimal scoring to construct discriminant function. The third and fourth methods achieve sparsity by imposing a cardinality constraint with the cardinality level determined by cross-validation. All the new methods reduce their computation time by sequentially determining individual discriminant functions. The methods are applied to six real data sets and perform well when compared with two existing methods.

Viewing alternatives

Look up in Google Scholar

Download history

Metrics

Public Attention

Altmetrics from Altmetric

Number of Citations

Citations from Dimensions

Download Final Version (PDF / 1MB)

Item Actions

Export

You can export this page using these formats

Digital Object Identifier - DOI

About

Item ORO ID
58913
Item Type
PhD Thesis
Keywords
discriminant analysis
Academic Unit or School
Faculty of Science, Technology, Engineering and Mathematics (STEM)
Mathematics and Statistics
Copyright Holders
Depositing User
Tsegay Gebru

CORE (COnnecting REpositories)

Open Research Online - ORO

Sparse Linear Discriminant Analysis with more Variables than Observations

Abstract

Viewing alternatives

Download history

Metrics

Public Attention

Number of Citations

Item Actions

Export

About

The Open University

Explore

Undergraduate

Postgraduate

Policy

Follow us on Social media