Pairwise gene GO-based measures for biclustering of high-dimensional expression data

Juan A. Nepomuceno; Alicia Troncoso; Isabel A. Nepomuceno-Chamorro; Jesús S. Aguilar-Ruiz

Repository landing page

oai:doaj.org/article:8b2538a9e9614954b434b01e49cec582

Pairwise gene GO-based measures for biclustering of high-dimensional expression data

Authors: Juan A. Nepomuceno
Alicia Troncoso
Isabel A. Nepomuceno-Chamorro
Jesús S. Aguilar-Ruiz
Publication date: 1 March 2018
Publisher: 'Springer Science and Business Media LLC'
Doi

Abstract

Abstract Background Biclustering algorithms search for groups of genes that share the same behavior under a subset of samples in gene expression data. Nowadays, the biological knowledge available in public repositories can be used to drive these algorithms to find biclusters composed of groups of genes functionally coherent. On the other hand, a distance among genes can be defined according to their information stored in Gene Ontology (GO). Gene pairwise GO semantic similarity measures report a value for each pair of genes which establishes their functional similarity. A scatter search-based algorithm that optimizes a merit function that integrates GO information is studied in this paper. This merit function uses a term that addresses the information through a GO measure. Results The effect of two possible different gene pairwise GO measures on the performance of the algorithm is analyzed. Firstly, three well known yeast datasets with approximately one thousand of genes are studied. Secondly, a group of human datasets related to clinical data of cancer is also explored by the algorithm. Most of these data are high-dimensional datasets composed of a huge number of genes. The resultant biclusters reveal groups of genes linked by a same functionality when the search procedure is driven by one of the proposed GO measures. Furthermore, a qualitative biological study of a group of biclusters show their relevance from a cancer disease perspective. Conclusions It can be concluded that the integration of biological information improves the performance of the biclustering process. The two different GO measures studied show an improvement in the results obtained for the yeast dataset. However, if datasets are composed of a huge number of genes, only one of them really improves the algorithm performance. This second case constitutes a clear option to explore interesting datasets from a clinical point of view

Similar works

Full text

Directory of Open Access Journals

oai:doaj.org/article:8b2538a9e...

Last time updated on 03/06/2019

This paper was published in Directory of Open Access Journals.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.