Identification of Robust Clustering Methods in Gene Expression Data Analysis

Md.    Bipul    Hossen; Md.      Siraj-Ud-Doulah

Abstract

Background: Cluster analysis techniques of gene expression microarray data is of increasing interest in the field of current bioinformatics. One of the reasons for this is the need for molecular-based refinement of broadly defined biological classes, with implications in cancer diagnosis, prognosis and treatment. And many algorithms have been developed for this problem.

Objective: However microarray data frequently include outliers, and how to treat these outlier's effects in the subsequent analysis-clustering.

Method: In this paper, we present the large-scale analysis of seven different agglomerative hierarchical clustering methods and five proximity measures for the analysis of 33 cancer gene expression datasets. As a case study, we used two experimental datasets: Affymetrix and cDNA, and different percent outliers were artificially added to these datasets.

Results: We found that ward method gives the highest corrected Rand index value with respect to the spearman proximity measures when datasets contain with and without outliers.

Conclusion: This study proves that ward method is more robust clustering methods in gene expression data analysis among other methods.

Keywords: Agglomerative hierarchical clustering, corrected rand index, microarray gene expressions data, outlier, proximity measures.

« Previous Next »

Graphical Abstract

Rights & Permissions Print Cite

Article Metrics

27

1

Journal Information

For Authors

For Editors

For Reviewers

Explore Articles

Open Access

Open Access Articles

For Visitors

DOI https://dx.doi.org/10.2174/1574893611666160610103926	Print ISSN 1574-8936
Publisher Name Bentham Science Publisher	Online ISSN 2212-392X

Current Bioinformatics

Identification of Robust Clustering Methods in Gene Expression Data Analysis

Abstract

Graphical Abstract

Related Journals

Related Books