Generic placeholder image

Current Bioinformatics


ISSN (Print): 1574-8936
ISSN (Online): 2212-392X

Research Article

A Novel Hybrid Filter/Wrapper Feature Selection Approach Based on Improved Fruit Fly Optimization Algorithm and Chi-square Test for High Dimensional Microarray Data

Author(s): Chaokun Yan, Bin Wu , Jingjing Ma , Ge Zhang, Junwei Luo, Jianlin Wang* and Huimin Luo *

Volume 16, Issue 1, 2021

Published on: 24 March, 2020

Page: [63 - 79] Pages: 17

DOI: 10.2174/1574893615666200324125535

Price: $65


Background: Microarray data is widely utilized for disease analysis and diagnosis. However, it is hard to process them directly and achieve high classification accuracy due to the intrinsic characteristics of high dimensionality and small size samples. As an important data preprocessing technique, feature selection is usually used to reduce the dimensionality of some datasets.

Methods: Given the limitations of employing filter or wrapper approaches individually for feature selection, in the study, a novel hybrid filter-wrapper approach, CS_IFOA, is proposed for high dimensional datasets. First, the Chi-square Test is utilized to filter out some irrelevant or redundant features. Next, an improved binary Fruit Fly Optimization algorithm is conducted to further search the optimal feature subset without degrading the classification accuracy. Here, the KNN classifier with the 10-fold-CV is utilized to evaluate the classification accuracy.

Results: Extensive experimental results on six benchmark biomedical datasets show that the proposed CS-IFOA can achieve superior performance compared with other state-of-the-art methods. The CS-IFOA can get a smaller number of features while achieving higher classification accuracy. Furthermore, the standard deviation of the experimental results is relatively small, which indicates that the proposed algorithm is relatively robust.

Conclusion: The results confirmed the efficiency of our approach in identifying some important genes for high-dimensional biomedical datasets, which can be used as an ideal pre-processing tool to help optimize the feature selection process, and improve the efficiency of disease diagnosis.

Keywords: Feature selection, fruit fly optimization algorithm, Chi-square Test, levy flight, Gaussian mutation, algorithm.

Graphical Abstract
Lee K, Man Z, Wang D, et al. Classification of microarray datasets using finite impulse response extreme learning machine for cancer diagnosis. Neural Comput Appl 2013; 22(3-4): 457-68.
Liu H, Zhao Z. Manipulating data and dimension reduction methods: feature selection. In: Encyclopedia of Complexity and Systems Science. 2009; pp. 5348-59.
Ekbal A, Saha S. Joint model for feature selection and parameter optimization coupled with classifier ensemble in chemical mention recognition. Knowl Base Syst 2015; 85: 37-51.
Kira K, Rendell LA. The feature selection problem: traditional methods and a new algorithm. Proceedings of tenth National Conference on Artificial Intelligence. 129-34.
Martín-Valdivia MT, Díaz-Galiano MC, Montejo-Raez A, et al. Using information gain to improve multi-modal information retrieval systems. Inf Process Manage 2008; 44(3): 1146-58.
Baldi P, Long AD. A Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes. Bioinformatics 2001; 17(6): 509-19.
[] [PMID: 11395427]
Kononenko I. Estimating attributes: analysis and extensions of RELIEF. European Conference on Machine Learning 1994; 171-82.
Hall M. Correlation-based feature selection for machine learning. PhD thesis, The University of Waikato, Inc 1999.
Verbiest N, Derrac J, Cornelis C, et al. Evolutionary wrapper approaches for training set selection as preprocessing mechanism for support vector machines: Experimental evaluation and support vector analysis. Appl Soft Comput 2016; 38: 10-22.
Jain A, Zongker D. Feature selection: Evaluation, application, and small sample performance. IEEE Trans Pattern Anal Mach Intell 1997; 19(2): 153-8.
Xue B, Zhang M, Browne WN, et al. A survey on evolutionary computation approaches to feature selection. IEEE Trans Evol Comput 2016; 20(4): 606-26.
Vieira SM, Mendonça LF, Farinha GJ, et al. Modified binary PSO for feature selection using SVM applied to mortality prediction of septic patients. Appl Soft Comput 2013; 13(8): 3494-504.
Ghanad NK, Ahmadi S. Combination of PSO algorithm and Naive Bayesian classification for Parkinson disease diagnosis. Adv Comp Sci Int J 2015; 4(4): 119-25.
Hu B, Dai Y, Su Y, et al. Feature selection for optimized high-dimensional biomedical data using the improved shuffled frog leaping algorithm. IEEE/ACM Trans Comput Biol Bioinformatics 2018; 15(6): 1765-73.
[] [PMID: 28113635]
Sayed SAEF, Nabil E, Badr A. A binary clonal flower pollination algorithm for feature selection. Pattern Recognit Lett 2016; 77: 21-7.
Yan C, Ma J, Luo H, et al. Hybrid binary coral reefs optimization algorithm with simulated annealing for feature selection in high-dimensional biomedical datasets. Chemom Intell Lab Syst 2019; 184: 102-11.
Mafarja MM, Mirjalili S. Hybrid whale optimization algorithm with simulated annealing for feature selection. Neurocomputing 2017; 260: 302-12.
Ni B, Liu J. A hybrid filter/wrapper gene selection method for microarray classification. International Conference on Machine Learning & Cybernetics 2004; 2537-42.
Pan WT. A new fruit fly optimization algorithm: taking the financial distress model as an example. Knowl Base Syst 2012; 26(2): 69-74.
Lei X, Ding Y, Fujita H, et al. Identification of dynamic protein complexes based on fruit fly optimization algorithm. Knowl Base Syst 2016; 105: 270-7.
Ye F, Lou XY, Sun LF. An improved chaotic fruit fly optimization based on a mutation strategy for simultaneous feature selection and parameter optimization for SVM and its applications. PLoS One 2017; 12(4): e0173516.
[] [PMID: 28369096]
Plackett RL. Karl Pearson and the chi-squared test. International Statistical Review/Revue Internationale de Statistique 1983; 51: 59-72.
Jin X, Xu A, Bie R, et al. Machine learning techniques and chi-square feature selection for cancer classification using SAGE gene expression profiles. In: International Workshop on Data Mining for Biomedical Applications. 2006; pp. 106-15.
Mantegna RN. Fast, accurate algorithm for numerical simulation of Lévy stable stochastic processes. Phys Rev E Stat Phys Plasmas Fluids Relat Interdiscip Topics 1994; 49(5): 4677-83.
[] [PMID: 9961762]
Zhu Z, Ong YS, Dash M. Markov blanket-embedded genetic algorithm for gene selection. Pattern Recognit 2007; 40(11): 3236-48.
Golub TR, Slonim DK, Tamayo P, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 1999; 286(5439): 531-7.
[] [PMID: 10521349]
Ding C, Peng H. Minimum redundancy feature selection from microarray gene expression data. J Bioinform Comput Biol 2005; 3(2): 185-205.
[] [PMID: 15852500]
Masuda N, Porter MA, Lambiotte R. Random walks and diffusion on networks. Phys Rep 2017; 716: 1-58.
Nabil E. A modified flower pollination algorithm for global optimization. Expert Syst Appl 2016; 57: 192-203.
Aguilar-Ruiz JS, Azuaje F, Riquelme JC, et al. Data mining approaches to diffuse large B-Cell Lymphoma gene expression data interpretation International Conference on Data Warehousing and Knowledge Discovery. In: 2004; pp. 279-88.

Rights & Permissions Print Cite
© 2024 Bentham Science Publishers | Privacy Policy