Repository landing page

We are not able to resolve this OAI Identifier to the repository landing page. If you are the repository manager for this record, please head to the Dashboard and adjust the settings.

A scalable random walk with restart on heterogeneous networks with Apache Spark for ranking disease-related genes through type-II fuzzy data fusion

Abstract

One of the effective missions of biology and medical science is to find disease-related genes. Recent research uses gene/protein networks to find such genes. Due to false positive interactions in these networks, the results often are not accurate and reliable. Integrating multiple gene/protein networks could overcome this drawback, causing a network with fewer false positive interactions. The integration method plays a crucial role in the quality of the constructed network. In this paper, we integrate several sources to build a reliable heterogeneous network, i.e., a network that includes nodes of different types. Due to the different gene/protein sources, four gene-gene similarity networks are constructed first and integrated by applying the type-II fuzzy voter scheme. The resulting gene-gene network is linked to a disease-disease similarity network (as the outcome of integrating four sources) through a two-part disease-gene network. We propose a novel algorithm, namely random walk with restart on the heterogeneous network method with fuzzy fusion (RWRHN-FF). Through running RWRHN-FF over the heterogeneous network, disease-related genes are determined. Experimental results using the leave-one-out cross-validation indicate that RWRHN-FF outperforms existing methods. The proposed algorithm can be applied to find new genes for prostate, breast, gastric, and colon cancers. Since the RWRHN-FF algorithm converges slowly on large heterogeneous networks, we propose a parallel implementation of the RWRHN-FF algorithm on the Apache Spark platform for high-throughput and reliable network inference. Experiments run on heterogeneous networks of different sizes indicate faster convergence compared to other non-distributed modes of implementation

Similar works

Full text

thumbnail-image

Western Sydney ResearchDirect

redirect
Last time updated on 26/03/2023

This paper was published in Western Sydney ResearchDirect.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.