Computational methods to study gene regulation in humans using DNA and RNA sequencing data

Saha, Ashis

Repository landing page

oai:jscholarship.library.jhu.edu:1774.2/64043

Computational methods to study gene regulation in humans using DNA and RNA sequencing data

Authors: Ashis Saha
Publication date: 25 June 2021
Publisher: 'The Busan Gyeongnam Mathematical Society'

Abstract

Genes work in a coordinated fashion to perform complex functions. Disruption of gene regulatory programs can result in disease, highlighting the importance of understanding them. We can leverage large-scale DNA and RNA sequencing data to decipher gene regulatory relationships in humans. In this thesis, we present three projects on regulation of gene expression by other genes and by genetic variants using two computational frameworks: co-expression networks and expression quantitative trait loci (eQTL). First, we investigate the effect of alignment errors in RNA sequencing on detecting trans-eQTLs and co-expression of genes. We demonstrate that misalignment due to sequence similarity between genes may result in over 75% false positives in a standard trans-eQTL analysis. It produces a higher than background fraction of potential false positives in a conventional co-expression study too. These false-positive associations are likely to misleadingly replicate between studies. We present a metric, cross-mappability, to detect and avoid such false positives. Next, we focus on joint regulation of transcription and splicing in humans. We present a framework called transcriptome-wide networks (TWNs) for combining total expression of genes and relative isoform levels into a single sparse network, capturing the interplay between the regulation of splicing and transcription. We build TWNs for 16 human tissues and show that the hubs with multiple isoform neighbors in these networks are candidate alternative splicing regulators. Then, we study the tissue-specificity of network edges. Using these networks, we detect 20 genetic variants with distant regulatory impacts. Finally, we present a novel network inference method, SPICE, to study the regulation of transcription. Using maximum spanning trees, SPICE prioritizes potential direct regulatory relationships between genes. We also formulate a comprehensive set of metrics using biological data to establish a standard to evaluate biological networks. According to most of these metrics, SPICE performs better than current popular network inference methods when applied to RNA-sequencing data from diverse human tissues

Similar works

Full text

Open in the Core reader

Download PDF

JScholarship

oai:jscholarship.library.jhu.e...

Last time updated on 19/10/2021

This paper was published in JScholarship.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.