We are not able to resolve this OAI Identifier to the repository landing page. If you are the repository manager for this record, please head to the Dashboard and adjust the settings.
GENE REGULATORY NETWORK INFERENCE USING K-NEAREST-NEIGHBOR BASED MUTUAL INFORMATION AND THREE-NODE NETWORK CLASSIFICATION USING DIMENSIONALITY REDUCTION AND MACHINE LEARNING
Background: A cell exhibits a variety of responses to internal and external cues. These responses are possible, in part, due to the presence of an elaborate gene regulatory network (GRN) in every single cell. In the past twenty years, many groups worked on reconstructing the topological structure of GRNs from large-scale gene expression data using a variety of inference algorithms. Insights gained about participating players in GRNs may ultimately lead to therapeutic benefits. Mutual information (MI) is a widely used metric within
this inference/reconstruction pipeline as it can detect any correlation (linear
and non-linear) between any number of variables (n-dimensions). However, the
use of MI with continuous data (for example, normalized fluorescence intensity measurement of gene expression levels) is sensitive to data size, correlation strength and underlying distributions, and often requires laborious and, at times, ad hoc optimization.
Results: In this work, we first show that estimating MI of a bi- and tri-variate
Gaussian distribution using k-nearest neighbor (kNN) MI estimation results
in significant error reduction as compared to commonly used methods based on fixed binning. Second, we demonstrate that implementing the MI-based kNN Kraskov-Stoögbauer-Grassberger (KSG) algorithm leads to a significant improvement in GRN reconstruction for popular inference algorithms, such as Context Likelihood of Relatedness (CLR). Third, through extensive in-silico benchmarking we show that a new inference algorithm CMIA (Conditional Mutual Information Augmentation), inspired by CLR, in combination with the KSG-MI estimator, outperforms commonly used methods. Finally, we compare our three newly developed methods to classify three-node motifs: (i) MI and Z-score profiles, (ii) Dimensionality reduction by PCA and clustering using K-means, (iii) Supervised machine learning algorithms using MI input data. We show that at least 22 different 3-node motifs in-silico and 16 motifs on E.coli
experimental data can be distinguished by using all 2d and 3d MI quantities
and without any a priori knowledge of the regulator (source) genes.
Conclusions: Using three canonical datasets containing 15 synthetic networks, the newly developed method for GRN reconstruction - which combines CMIA, and the KSG-MI estimator - achieves an improvement of 20-35% in precision-recall measures over the current gold standard in the field. Validated on E. coli gene expression data, our method for three-node motifs classification achieves more than 60% overall accuracy, with 9 network motifs reaching as high as 80-100% precision. This new methods will enable researchers to discover new
gene interactions or choose gene candidates for experimental validations
Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.