Bioinformatics

Bioinformatics is a broad research area that focuses on the computational processing and analysis of biological data. Sometimes processing vast amounts of biological data, sometimes applying very computationally intensive algorithms and, in most cases, both. Over the years bioinformatics has been (and continues to be) a crucial source of new challenges for many different areas of research in computer science and mathematics, such as optimisation, data mining, high performance computing or e-science. Our group has applied bioinformatics techniques to several important problems such as protein structure prediction, protein structure comparison and omics (transcriptomics, proteomics, lipidomics, epigenomics) data analysis.

Protein Structure Prediction (PSP)

PSP methods generate models of proteins (3D coordinates of atoms) based on its amino acid composition. PSP remains, after several decades of research, one of the main open problems in biology. Several (often complementary) techniques and representations for PSP exist, based on different sources of information and a wide variety of prediction and model refinement methods. In general this techniques require vast amounts of computational resources. Our work on PSP includes:

Protein Structure Comparison (PSC)

The algorithmic comparison of protein structures is a crucial and very challenging task in bioinformatics. PSC methods can be used to make inferences on protein function or to distinguish between near-native models and decoys in protein structure prediction and protein design. Moreover, what makes this task really challenging is that there is no unique “silver bullet” measure of similarity that is suitable for all tasks/datasets. In this area we have developed both individual algorithms for structural comparison as well as consensus methods integrating multiple structural similarity measures.

Omics Data Mining

Biological research depends on many experimental technologies (e.g. transcriptomics, proteomics, metabolomics) that generate very large amounts of quantitative information from biological samples. Their usage has improved our understanding about biological systems. However, the effectiveness of these technologies is constrained by the limitations of the data analysis methods. Data analysis is required to navigate the noisy, high dimensional spaces and identify the relevant variables, their interactions and association to the biological system or process being studied. Within this area we have performed research in multiple directions: