We present an iterative PLS algorithm based on backward variable elimination through the “Variable Influence on Projection” (VIP) statistic, which finds an optimal PLS model through training and test sets. It simultaneously manages to reduce the number of selected genes by an iterative procedure and finds the best number of PLS factors to reach an optimal classification performance. It is a simple approach that uses only one mathematical method, maintains the identification of discriminatory genes, and builds an optimal predicting model with a fast computation. The algorithm runs as a module of the SIMFIT statistical package, where the optimal model and datasets can be re-run to further interpret the system through additional PLS options, such as scores and loadings plots, or class assignment of new samples.
The proposed algorithm was tested under different scenarios occurring in microarray analysis using simulated data. The results are also compared against different classification methods such as KNN, PAM, SVM, RF and standard PLS.