Rational confederation of genes and diseases

by Doron Lancet (Weizmann Institute of Science)

11:00 (60 min) in BCB

Human diseases are disposed at the heart of extensive research that encompasses genomics, bioinformatics, systems biology and systems medicine. Some of the challenges facing disease bioinformatics are disease nomenclature, standard symbols (as for genes), and integration of information from diverse sources. A most important issue to be tackled is generating a global view of gene- disease relationships. This is relatively straightforward for the monogenic diseases, natural human knockouts that constitute a rich source of biological insight. However, for complex diseases, a concerted effort is needed to sort out signal from noise. This necessitates the use of comprehensive disease and gene compendia with extensive cross-relations. Two relevant such tools will be presented. The first is the widely used GeneCards, encompassing automatically mined information from >100 sources on ~120,000 gene entries, including the most comprehensive compilation of non protein-coding RNA genes. In the GeneCards pipeline we plan the incorporation of genomic enhancers, for which there is increasing evidence for involvement in disease. The second relevant tool is MalaCards, a most comprehensive resource of human diseases, with ~17,000 entries, mined from >60 sources. The development of MalaCards posed many algorithmic challenges, such as disease names unification, integrated classification and disease-gene scrutiny. In analogy to GeneCards, MalaCards displays a web card for each human disease, with 17 sections, including textual summaries, related diseases, genetic variations, genetic tests and relevant publications.

Next generation sequencing of malady-affected individuals has become a key technology for relating genes to diseases. We have developed VarElect for linking disease phenotypes to gene variants. It performs judicious prioritization among short-listed variations, leveraging the rich information and inter-links within GeneCards and MalaCards. VarElect’s algorithm affords inferring direct as well as indirect links between genes and phenotype-related keywords. For indirect implication, gene-to-gene relations are formed via expanded paralogy relations, shared publications, interaction networks and shared biological pathways. A recent enhancement for the latter is PathCards, which unifies >3,000 pathways from 12 data sources into ~1000 SuperPaths with optimal informativeness and minimal redundancy. PathCards greatly enhances VarElect’s capacity to portray unsuspected disease-gene relations. GeneCards, MalaCards and their affiliated tool VarElect and PathCards thus provides a facile and robust avenue for confederating genes with diseases.