A Prediction Approach for the Functional Effects of Non-Coding Gene Variants

Yurtdas G., Aslan K., Ozyer S. T., Ozyer T., Kaya M., ALHAJJ R.

23rd International Arab Conference on Information Technology, ACIT 2022, Abu Dhabi, United Arab Emirates, 22 - 24 November 2022 identifier

  • Publication Type: Conference Paper / Full Text
  • Doi Number: 10.1109/acit57182.2022.9994094
  • City: Abu Dhabi
  • Country: United Arab Emirates
  • Keywords: Deep Learning, Functional interaction network, Non-coding genes, protein-protein interaction network
  • Istanbul Medipol University Affiliated: Yes


The aim of this study is to develop an approach for predicting the functional effects of variants of non-coding genes which have great importance in human genetics. Non-coding genes have formed a very vital field of study since they have a high effect on diseases. However, little is known about non-coding genes compared to coding genes, and they are found in the body almost 9 times more than coding genes. This is a critical issue, and i t is very important to predict the effects of these genes, which are so abundant in the body and difficult to understand. This exhibits the motivation of the study described in the paper. For this purpose, an extensive literature review was first conducted, and possible datasets that could be used were examined. Then, using Python programming language, we developed a prediction model with high accuracy. After investigating how important non-coding gene variants are, and in what areas they can be used, we decided to use a functional interaction network from the deep learning models as the most suitable method. We used STRING (Search Tool for the Retrieval of Interacting Genes/Proteins) which is a biological database and web resource of known and predicted protein-protein interactions. As a second step, we generated feature vectors. After checking the overlap of non-coding genes, we extracted three types of feature vectors. Identifying protein interaction network in Python, the outcome describes the interplay between the biomolecules encoded by genes. It allows to understand the complexities of cellular functions, and even predict potential therapeutics. As a last step, we implemented a deep learning model which included three fully connected (FC) layers, also known as dense layers, with dimensions 40, 10, and 2, respectively. Experimental results demonstrate that the proposed method captured high accuracy values.