The hurdles of tackling a biomedical data mining competition: The DREAM ALS challenge

by Jaume Bacardit

16:00 (40 min) in BSTC G.33

Many papers published in machine learning conferences and journals follow a very similar scheme: a somewhat new method is proposed, they test them on some of the (almost toy) datasets from the UCI repository, compare with some other methods (very likely using WEKA) and all is done. Tackling a real-world data analytics problem has many challenges that are totally oblivious to any of these papers: data is not a nicely formatted matrix, there are a mix of continuous, categorical and time series data, missing values, data is extremely dirty and annotation is inconsistent.

In this seminar I'm going to talk about the experience in participating in the 2015 DREAM ALS biomedical data analytics competition which made me tackle all of the these aspects. Finally, I will reflect on the experience of the competition by using a recent analysis made by Google about software engineering practices for machine learning.