Microarray dataset

Below you find the microarray datasets used for rule-based sample classification with the BioHEL evolutionary learning system:

Diffuse large B-cell lymphoma dataset [1.5 MB] - Shipp et al. 2002, 7129 genes, 77 samples
Prostate cancer dataset [1.5 MB] - Singh et al. 2002, 2135 genes, 102 samples
Breast cancer dataset [13 MB] - Naderi et al. 2006, 47293 genes, 128 samples

We applied three feature selection algorithms (CFS, RFS, PLSS) to these datasets using two cross validation schemes: 10-fold and leave-one-out (LOO). We obtained the following final datasets (stored in commonly used Weka arff format):

10-fold [587 kB], LOO [4.5 MB] - diffuse large B-cell lymphoma dataset
10-fold [633 kB], LOO [6.3 MB] - prostate cancer dataset
10-fold [300 kB], LOO [3.6 MB] - breast cancer dataset

Publications

DOI data BibTeX

E. Glaab, J. Bacardit, J.M. Garibaldi, N. Krasnogor

Using Rule-Based Machine Learning for Candidate Disease Gene Prioritization and Sample Classification of Cancer Gene Expression Data

in PLoS ONE, 7(7):e39932, July 2012

@ARTICLE{Glaab2012,
  title = {Using Rule-Based Machine Learning for Candidate Disease Gene Prioritization and Sample Classification of Cancer Gene Expression Data},
  author = {Glaab, Enrico and Bacardit, Jaume and Garibaldi, Jonathan M. and Krasnogor, Natalio},
  year = 2012,
  doi = {10.1371/journal.pone.0039932},
  month = jul,
  journal = {PLoS ONE},
  volume = {7},
  number = {7},
  pages = {e39932}
}