Generic placeholder image

Current Genomics


ISSN (Print): 1389-2029
ISSN (Online): 1875-5488

Mini-Review Article

Heuristic Analysis of Genomic Sequence Processing Models for High Efficiency Prediction: A Statistical Perspective

Author(s): Aditi R. Durge, Deepti D. Shrimankar* and Ankush D. Sawarkar

Volume 23, Issue 5, 2022

Published on: 07 October, 2022

Page: [299 - 317] Pages: 19

DOI: 10.2174/1389202923666220927105311

Price: $65


Genome sequences indicate a wide variety of characteristics, which include species and sub-species type, genotype, diseases, growth indicators, yield quality, etc. To analyze and study the characteristics of the genome sequences across different species, various deep learning models have been proposed by researchers, such as Convolutional Neural Networks (CNNs), Deep Belief Networks (DBNs), Multilayer Perceptrons (MLPs), etc., which vary in terms of evaluation performance, area of application and species that are processed. Due to a wide differentiation between the algorithmic implementations, it becomes difficult for research programmers to select the best possible genome processing model for their application. In order to facilitate this selection, the paper reviews a wide variety of such models and compares their performance in terms of accuracy, area of application, computational complexity, processing delay, precision and recall. Thus, in the present review, various deep learning and machine learning models have been presented that possess different accuracies for different applications. For multiple genomic data, Repeated Incremental Pruning to Produce Error Reduction with Support Vector Machine (Ripper SVM) outputs 99.7% of accuracy, and for cancer genomic data, it exhibits 99.27% of accuracy using the CNN Bayesian method. Whereas for Covid genome analysis, Bidirectional Long Short-Term Memory with CNN (BiLSTM CNN) exhibits the highest accuracy of 99.95%. A similar analysis of precision and recall of different models has been reviewed. Finally, this paper concludes with some interesting observations related to the genomic processing models and recommends applications for their efficient use.

Keywords: Machine learning, genome processing, classification, computational complexity, deep learning, precision and recall.

Next »
Graphical Abstract
Barbeira, A.N.; Melia, O.J.; Liang, Y.; Bonazzola, R.; Wang, G.; Wheeler, H.E.; Aguet, F.; Ardlie, K.G.; Wen, X. Im, H.K. Fine‐mapping and QTL tissue‐sharing information improves the reliability of causal gene identification. Genet. Epidemiol., 2020, 44(8), 854-867.
[] [PMID: 32964524]
Seo, H.; Song, Y.J.; Cho, K.; Cho, D.H. Specificity analysis of genome based on statistically identical K-words with same base combination. IEEE Open J. Eng. Med. Biol., 2020, 1, 214-219.
[] [PMID: 35402963]
Libbrecht, M.W.; Noble, W.S. Machine learning applications in genetics and genomics. Nat. Rev. Genet., 2015, 16(6), 321-332.
[] [PMID: 25948244]
Schrider, D.R.; Kern, A.D. Supervised machine learning for population genetics: A new paradigm. Trends Genet., 2018, 34(4), 301-312.
[] [PMID: 29331490]
Abbas, Z.; Tayara, H.; Chong, K. Spinenet-6MA: A novel deep learning tool for predicting DNA N6-methyladenine sites in genomes. IEEE Access, 2020, 8, 201450-201457.
Sun, T.; Wei, Y.; Chen, W.; Ding, Y. Genome‐wide association study‐based deep learning for survival prediction. Stat. Med., 2020, 39(30), 4605-4620.
[] [PMID: 32974946]
Remita, M.A.; Halioui, A.; Malick Diouara, A.A.; Daigle, B.; Kiani, G.; Diallo, A.B. A machine learning approach for viral genome classification. BMC Bioinformatics, 2017, 18(1), 208.
[] [PMID: 28399797]
Abass, Y.A.; Adeshina, S.A. Deep learning methodologies for genomic data prediction:Review . Journal of Artificial Intelligence for Medical Sciences, 2021, 2(1-2), 1.
Yu, X.; Leiboff, S.; Li, X.; Guo, T.; Ronning, N.; Zhang, X.; Muehlbauer, G.J.; Timmermans, M.C.P.; Schnable, P.S.; Scanlon, M.J.; Yu, J. Genomic prediction of maize microphenotypes provides insights for optimizing selection and mining diversity. Plant Biotechnol. J., 2020, 18(12), 2456-2465.
[] [PMID: 32452105]
Martinez, M. Computational tools for genomic studies in plants. Curr. Genomics, 2016, 17(6), 509-514.
[] [PMID: 28217007]
Guo, Q.; Liu, Q.; Smith, N.A.; Liang, G.; Wang, M.B. RNA silencing in plants: Mechanisms, technologies and applications in horticultural crops. Curr. Genomics, 2016, 17(6), 476-489.
[] [PMID: 28217004]
Almeida, V.C.; Trentin, H.U.; Frei, U.K.; Lübberstedt, T. Genomic prediction of maternal haploid induction rate in maize. Plant Genome, 2020, 13(1), e20014.
[] [PMID: 33016635]
Zhong, L.; Hu, L.; Zhou, H. Deep learning based multi-temporal crop classification. Remote Sens. Environ., 2019, 221, 430-443.
Michel, S.; Löschenberger, F.; Sparry, E.; Ametz, C.; Bürstmayr, H. Mitigating the impact of selective phenotyping in training populations on the prediction ability by multi‐trait pedigree and genomic selection models. Plant Breed., 2020, 139(6), 1067-1075.
Dai, X.; Xu, Z.; Liang, Z.; Tu, X.; Zhong, S.; Schnable, J.C.; Li, P. Non‐homology‐based prediction of gene functions in maize (Zea mays ssp. mays). Plant Genome, 2020, 13(2), e20015.
[] [PMID: 33016608]
Grinberg, N.F.; Orhobor, O.I.; King, R.D. An evaluation of machine-learning for predicting phenotype: Studies in yeast, rice, and wheat. Mach. Learn., 2020, 109(2), 251-277.
[] [PMID: 32174648]
Onda, Y.; Mochida, K. Exploring genetic diversity in plants using high-throughput sequencing techniques. Curr. Genomics, 2016, 17(4), 358-367.
[] [PMID: 27499684]
Yadav, S.; Wei, X.; Joyce, P.; Atkin, F.; Deomano, E.; Sun, Y.; Nguyen, L.T.; Ross, E.M.; Cavallaro, T.; Aitken, K.S.; Hayes, B.J.; Voss-Fels, K.P. Improved genomic prediction of clonal performance in sugarcane by exploiting non-additive genetic effects. Theor. Appl. Genet., 2021, 134(7), 2235-2252.
[] [PMID: 33903985]
Virnodkar, S.S.; Pachghare, V.K.; Patil, V.C. Application of machine learning on remote sensing data for sugarcane crop classification: A review BT-ICT analysis and applications; Springer Singapore: Singapore, 2020, pp. 539-555.
Auinger, H.J.; Lehermeier, C.; Gianola, D.; Mayer, M.; Melchinger, A.E.; da Silva, S.; Knaak, C.; Ouzunova, M.; Schön, C.C. Calibration and validation of predicted genomic breeding values in an advanced cycle maize population. Theor. Appl. Genet., 2021, 134(9), 3069-3081.
[] [PMID: 34117908]
Lubanga, N.; Massawe, F.; Mayes, S. Genomic and pedigree‐based predictive ability for quality traits in tea (Camellia sinensis (L.) O. Kuntze). Euphytica, 2021, 217(3), 32.
Knoch, D.; Werner, C.R.; Meyer, R.C.; Riewe, D.; Abbadi, A.; Lücke, S.; Snowdon, R.J.; Altmann, T. Multi-omics-based prediction of hybrid performance in canola. Theor. Appl. Genet., 2021, 134(4), 1147-1165.
[] [PMID: 33523261]
Montesinos-López, O.A.; Montesinos-López, A.; Pérez-Rodríguez, P.; Barrón-López, J.A.; Martini, J.W.R.; Fajardo-Flores, S.B.; Gaytan-Lugo, L.S.; Santana-Mancilla, P.C.; Crossa, J. A review of deep learning applications for genomic selection. BMC Genomics, 2021, 22(1), 19.
[] [PMID: 33407114]
Pandey, M.K.; Chaudhari, S.; Jarquin, D.; Janila, P.; Crossa, J.; Patil, S.C.; Sundravadana, S.; Khare, D.; Bhat, R.S.; Radhakrishnan, T.; Hickey, J.M.; Varshney, R.K. Genome-based trait prediction in multi- environment breeding trials in groundnut. Theor. Appl. Genet., 2020, 133(11), 3101-3117.
[] [PMID: 32809035]
Mellers, G.; Mackay, I.; Cowan, S.; Griffiths, I.; Martinez-Martin, P.; Poland, J.A.; Bekele, W.; Tinker, N.A.; Bentley, A.R.; Howarth, C.J. Implementing within‐cross genomic prediction to reduce oat breeding costs. Plant Genome, 2020, 13(1), e20004.
[] [PMID: 33016630]
Basnet, B.R.; Crossa, J.; Dreisigacker, S.; Pérez-Rodríguez, P.; Manes, Y.; Singh, R.P.; Rosyara, U.R.; Camarillo-Castillo, F.; Murua, M. Hybrid wheat prediction using genomic, pedigree, and environmental covariables interaction models. Plant Genome, 2019, 12(1), 180051.
[] [PMID: 30951082]
Ramasamy, M.D.; Periasamy, K.; Krishnasamy, L.; Dhanaraj, R.K.; Kadry, S.; Nam, Y. Multi-disease classification model using Strassen’s Half of Threshold (SHoT) training algorithm in healthcare sector. IEEE Access, 2021, 9, 112624-112636.
Li, J.; Huang, Y.; Zhou, Y. A mini-review of the computational methods used in identifying RNA 5- methylcytosine sites. Curr. Genomics, 2020, 21(1), 3-10.
[] [PMID: 32655293]
Zhang, J.; Chen, Q.; Liu, B. DeepDRBP-2L: A new genome annotation predictor for identifying DNA-binding proteins and RNA-binding proteins using convolutional neural network and long short-term memory. IEEE/ACM Trans. Comput. Biol. Bioinform., 2021, 18(4), 1451-1463.
[] [PMID: 31722485]
Yu, X.; Gan, Z.; Xu, Y.; Wan, S.; Li, M.; Ding, S.; Zeng, T. Identifying essential methylation patterns and genes associated with stroke. IEEE Access, 2020, 8, 96669-96676.
Singh, S.; Yang, Y.; Póczos, B.; Ma, J. Predicting enhancer-promoter interaction from genomic sequence with deep neural networks. Quant. Biol., 2019, 7(2), 122-137.
[] [PMID: 34113473]
Xu, L.; Guo, Z.; Liu, X. Prediction of essential genes in prokaryote based on artificial neural network. Genes Genomics, 2020, 42(1), 97-106.
[] [PMID: 31736009]
Liu, B.; Han, L.; Liu, X.; Wu, J.; Ma, Q. Computational prediction of sigma-54 promoters in bacterial genomes by integrating motif finding and machine learning strategies. IEEE/ACM Trans. Comput. Biol. Bioinform., 2019, 16(4), 1211-1218.
[] [PMID: 29993815]
Davi, C.; Pastor, A.; Oliveira, T.; Neto, F.B.L.; Braga-Neto, U.; Bigham, A.W.; Bamshad, M.; Marques, E.T.A.; Acioli-Santos, B. Severe dengue prognosis using human genome data and machine learning. IEEE Trans. Biomed. Eng., 2019, 66(10), 2861-2868.
[] [PMID: 30716030]
Li, X.; Qiu, Y.; Zhou, J.; Xie, Z. Applications and challenges of machine learning methods in alzheimer’s disease multi-source data analysis. Curr. Genomics, 2021, 22(8), 564-582.
[] [PMID: 35386189]
Zhou, T.; Thung, K.H.; Liu, M.; Shen, D. Brain-wide genome-wide association study for alzheimer’s disease via joint projection learning and sparse regression model. IEEE Trans. Biomed. Eng., 2019, 66(1), 165-175.
[] [PMID: 29993426]
Sergeev, R.S.; Kavaliou, I.S.; Sataneuski, U.V.; Gabrielian, A.; Rosenthal, A.; Tartakovsky, M.; Tuzikov, A.V. Genome-wide analysis of MDR and XDR tuberculosis from belarus: Machine-learning approach. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 2019, 16(4), 1398-1408.
[] [PMID: 28678713]
Khorshed, T.; Moustafa, M.N.; Rafea, A. Deep learning for multi-tissue cancer classification of gene expressions (GeneXNet). IEEE Access, 2020, 8, 90615-90629.
Wu, H.C.; Wei, X.G.; Chan, S.C. Novel consensus gene selection criteria for distributed GPU partial least squares-based gene microarray analysis in Diffused Large B Cell Lymphoma (DLBCL) and related findings. IEEE/ACM Trans. Comput. Biol. Bioinform., 2018, 15(6), 2039-2052.
[] [PMID: 28991749]
Knight, J.M.; Ivanov, I.; Triff, K.; Chapkin, R.S.; Dougherty, E.R. Detecting multivariate gene interactions in RNA-Seq data using optimal bayesian classification. IEEE/ACM Trans. Comput. Biol. Bioinform., 2018, 15(2), 484-493.
[] [PMID: 26441451]
Yang, X.; Tian, L.; Chen, Y.; Yang, L.; Xu, S.; Wu, W. Inverse projection representation and category contribution rate for robust tumor recognition. IEEE/ACM Trans. Comput. Biol. Bioinform., 2020, 17(4), 1262-1275.
[PMID: 30575544]
Xu, P.; Zhao, G.; Kou, Z.; Fang, G.; Liu, W. Classification of cancers based on a comprehensive pathway activity inferred by genes and their interactions. IEEE Access, 2020, 8, 30515-30521.
Arowolo, M.O.; Adebiyi, M.O.; Adebiyi, A.A.; Okesola, O.J. A hybrid heuristic dimensionality reduction methods for classifying malaria vector gene expression data. IEEE Access, 2020, 8, 182422-182430.
Jujjavarapu, S.E.; Deshmukh, S. Artificial neural network as a classifier for the identification of hepatocellular carcinoma through prognosticgene signatures. Curr. Genomics, 2018, 19(6), 483-490.
[] [PMID: 30258278]
Ye, X.; Zhang, W.; Sakurai, T. Adaptive unsupervised feature learning for gene signature identification in non-small-cell lung cancer. IEEE Access, 2020, 8, 154354-154362.
Yuan, X.; Bai, J.; Zhang, J.; Yang, L.; Duan, J.; Li, Y.; Gao, M. CONDEL: Detecting copy number variation and genotyping deletion zygosity from single tumor samples using sequence data. IEEE/ACM Trans. Comput. Biol. Bioinform., 2020, 17(4), 1141-1153.
[PMID: 30489272]
Khalifa, N.E.M.; Taha, M.H.N.; Ezzat Ali, D.; Slowik, A.; Hassanien, A.E. Artificial intelligence technique for gene expression by tumor RNA-Seq Data: A novel optimized deep learning approach. IEEE Access, 2020, 8, 22874-22883.
Choi, J.; Rhee, J.K.; Chae, H. Cell subtype classification via representation learning based on a denoising autoencoder for single-cell RNA sequencing. IEEE Access, 2021, 9, 14540-14548.
Sonea, L.; Buse, M.; Gulei, D.; Onaciu, A.; Simon, I.; Braicu, C.; Berindan-Neagoe, I. Decoding the emerging patterns exhibited in non-coding rnas characteristic of lung cancer with regard to their clinical significance. Curr. Genomics, 2018, 19(4), 258-278.
[] [PMID: 29755289]
Liang, X.; Zhu, L.; Huang, D.S. Optimization of gene set annotations using robust trace-norm multitask learning. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 2018, 15(3), 1016-1021.
[] [PMID: 28391202]
He, Q.; Qiu, Z.; Tong, Y.; Song, K. A new TTZ feature extracting algorithm to decipher tobacco related mutation signature genes for the personalized lung adenocarcinoma treatment. IEEE Access, 2020, 8, 89031-89040.
Bian, J.; Modave, F. The rapid growth of intelligent systems in health and health care. Health Informatics J., 2020, 26(1), 5-7.
[] [PMID: 31928307]
Ho, T.K.K.; Gwak, J. Toward deep learning approaches for learning structure motifs and classifying biological sequences from RNA A-to-I editing events. IEEE Access, 2019, 7, 127464-127474.
Chen, L.; Pan, X.; Zeng, T.; Zhang, Y-H.; Huang, T.; Cai, Y-D. Identifying essential signature genes and expression rules associated with distinctive development stages of early embryonic cells. IEEE Access, 2019, 7, 128570-128578.
Dasari, C.M.; Bhukya, R. Explainable deep neural networks for novel viral genome prediction. Appl. Intell., 2021. Epub ahead of print
[] [PMID: 34764607]
Liu, Q.; Liu, F.; He, J.; Zhou, M.; Hou, T.; Liu, Y. VFM: Identification of bacteriophages from metagenomic bins and contigs based on features related to gene and genome composition. IEEE Access, 2019, 7, 177529-177538.
Ibba, M.I.; Crossa, J.; Montesinos-López, O.A.; Montesinos-López, A.; Juliana, P.; Guzman, C.; Delorean, E.; Dreisigacker, S.; Poland, J. Genome‐based prediction of multiple wheat quality traits in multiple years. Plant Genome, 2020, 13(3), e20034.
[] [PMID: 33217204]
Dias, R.; Torkamani, A. Artificial intelligence in clinical and genomic diagnostics. Genome Med., 2019, 11(1), 70.
[] [PMID: 31744524]
Nawaz, M.S.; Fournier-Viger, P.; Shojaee, A.; Fujita, H. Using artificial intelligence techniques for COVID-19 genome analysis. Appl. Intell., 2021, 51(5), 3086-3103.
[] [PMID: 34764587]
Poran, A.; Harjanto, D.; Malloy, M.; Arieta, C.M.; Rothenberg, D.A.; Lenkala, D.; van Buuren, M.M.; Addona, T.A.; Rooney, M.S.; Srinivasan, L.; Gaynor, R.B. Sequence-based prediction of SARS-CoV-2 vaccine targets using a mass spectrometry-based bioinformatics predictor identifies immunogenic T cell epitopes. Genome Med., 2020, 12(1), 70.
[] [PMID: 32791978]
Xie, Q.; He, X.; Yang, F.; Liu, X.; Li, Y.; Liu, Y.; Yang, Z.; Yu, J.; Zhang, B.; Zhao, W. Analysis of the genome sequence and prediction of B-Cell epitopes of the envelope protein of middle east respiratory syndrome-coronavirus. IEEE/ACM Trans. Comput. Biol. Bioinform., 2018, 15(4), 1344-1350.
[] [PMID: 28574363]
Kushwaha, S.; Bahl, S.; Bagha, A.K.; Parmar, K.S.; Javaid, M.; Haleem, A.; Singh, R.P. Significant applications of machine learning for COVID-19 pandemic. J. Indus. Integr. Manage., 2020, 5(4), 453-479.
Whata, A.; Chimedza, C. Deep learning for SARS COV-2 genome sequences. IEEE Access, 2021, 9, 59597-59611.
[] [PMID: 34812391]
El Allali, A.; Elhamraoui, Z.; Daoud, R. Machine learning applications in RNA modification sites prediction. Comput. Struct. Biotechnol. J., 2021, 19, 5510-5524.
[] [PMID: 34712397]
Moghaddar, N.; Khansefid, M.; van der Werf, J.H.J.; Bolormaa, S.; Duijvesteijn, N.; Clark, S.A.; Swan, A.A.; Daetwyler, H.D.; MacLeod, I.M. Genomic prediction based on selected variants from imputed whole-genome sequence data in Australian sheep populations. Genet. Sel. Evol., 2019, 51(1), 72.
[] [PMID: 31805849]
Zrimec, J.; Börlin, C.S.; Buric, F.; Muhammad, A.S.; Chen, R.; Siewers, V.; Verendel, V.; Nielsen, J.; Töpel, M.; Zelezniak, A. Deep learning suggests that gene expression is encoded in all parts of a co-evolving interacting gene regulatory structure. Nat. Commun., 2020, 11(1), 6141.
[] [PMID: 33262328]
Kotsiantis, S.B.; Zaharakis, I.D.; Pintelas, P.E. Machine learning: A review of classification and combining techniques. Artif. Intell. Rev., 2006, 26(3), 159-190.
Zhou, W.; Ji, H. Genome‐wide prediction of chromatin accessibility based on gene expression. Wiley Interdiscip. Rev. Comput. Stat., 2021, 13(5), 1-13.
Galán, R.J.; Bernal-Vasquez, A.M.; Jebsen, C.; Piepho, H.P.; Thorwarth, P.; Steffan, P.; Gordillo, A.; Miedaner, T. Early prediction of biomass in hybrid rye based on hyperspectral data surpasses genomic predictability in less-related breeding material. Theor. Appl. Genet., 2021, 134(5), 1409-1422.
[] [PMID: 33630103]
Patra, P.; Izawa, T.; Pena-Castillo, L. REPA: Applying pathway analysis to genome-wide transcription factor binding data. IEEE/ACM Trans. Comput. Biol. Bioinform., 2018, 15(4), 1270-1283.
[] [PMID: 27019499]
Waldvogel, A.M.; Feldmeyer, B.; Rolshausen, G.; Exposito-Alonso, M.; Rellstab, C.; Kofler, R.; Mock, T.; Schmid, K.; Schmitt, I.; Bataillon, T.; Savolainen, O.; Bergland, A.; Flatt, T.; Guillaume, F.; Pfenninger, M. Evolutionary genomics can improve prediction of species’ responses to climate change. Evol. Lett., 2020, 4(1), 4-18.
[] [PMID: 32055407]
Sedaghat, N.; Fathy, M.; Modarressi, M.H.; Shojaie, A. Combining supervised and unsupervised learning for improved mirna target prediction. IEEE/ACM Trans. Comput. Biol. Bioinform., 2018, 15(5), 1.
[] [PMID: 28715336]
Jung, I.; Choi, J.; Chae, H. A non-negative matrix factorization-based framework for the analysis of multi-class time-series single-cell RNA-Seq data. IEEE Access, 2020, 8, 42342-42348.
Wu, Y.; Tong, Y.; Zhu, X.; Wu, X. NOSEP: Nonoverlapping sequence pattern mining with gap constraints. IEEE Trans. Cybern., 2018, 48(10), 2809-2822.
[] [PMID: 28976327]
Khan, S.; Khan, M.; Iqbal, N.; Li, M.; Khan, D.M. Spark-based parallel deep neural network model for classification of large scale RNAs into piRNAs and non-piRNAs. IEEE Access, 2020, 8, 136978-136991.
Wang, G.; Pu, P.; Shen, T. An efficient gene bigdata analysis using machine learning algorithms. Multimedia Tools Appl., 2020, 79(15-16), 9847-9870.

Rights & Permissions Print Export Cite as
© 2023 Bentham Science Publishers | Privacy Policy