Microarray dataset
Below you find the microarray datasets used for rule-based sample classification with the BioHEL evolutionary learning system:
Diffuse large B-cell lymphoma dataset [1.5 MB] - Shipp et al. 2002, 7129 genes, 77 samples
Prostate cancer dataset [1.5 MB] - Singh et al. 2002, 2135 genes, 102 samples
Breast cancer dataset [13 MB] - Naderi et al. 2006, 47293 genes, 128 samples
We applied three feature selection algorithms (CFS, RFS, PLSS) to these datasets using two cross validation schemes: 10-fold and leave-one-out (LOO). We obtained the following final datasets (stored in commonly used Weka arff format):
10-fold [587 kB], LOO [4.5 MB] - diffuse large B-cell lymphoma dataset
10-fold [633 kB], LOO [6.3 MB] - prostate cancer dataset
10-fold [300 kB], LOO [3.6 MB] - breast cancer dataset
Publications
-
DOI
data
BibTeX
Using Rule-Based Machine Learning for Candidate Disease Gene Prioritization and Sample Classification of Cancer Gene Expression Datain PLoS ONE, 7(7):e39932, July 2012
@ARTICLE{Glaab2012, title = {Using Rule-Based Machine Learning for Candidate Disease Gene Prioritization and Sample Classification of Cancer Gene Expression Data}, author = {Glaab, Enrico and Bacardit, Jaume and Garibaldi, Jonathan M. and Krasnogor, Natalio}, year = 2012, doi = {10.1371/journal.pone.0039932}, month = jul, journal = {PLoS ONE}, volume = {7}, number = {7}, pages = {e39932} }