FuNeL User Guide

Overview
Installation
Configuration
Running FuNeL
Post Filtering
Optimisation

Overview

FuNeL is a machine learning approach to generate functional networks using the co-prediction paradigm. Gene-gene functional interactions are defined by using a machine learning rule-based classification algorithm (BioHEL). This approach is based on the assumption that genes within the same classification rules have an increase likelihood to be functionally related. A general FuNeL protocol is shown below:

co-prediction protocol

The rule-based machine learning build the model of the dataset that is later used to infer a network. It is statistically refined using permutation tests to filter out the non-significant nodes. The protocol has two optional settings: feature selection and 2nd stage of rule learning. It can be run with one of the following configurations:

C1 (reduced dataset + 1 machine learning phase)
C2 (original dataset + 1 machine learning phase)
C3 (reduced dataset + 2 machine learning phases)
C4 (original dataset + 2 machine learning phases)

Installation

System requirements

To run FuNeL a GNU/Linux or a Mac OS system is required with the following software installed:

Weka (GNU General Public Licence)
Python 2.7.x (Python Software Foundation Licence)
NumPy (BSD Licence)
GNU R (GNU General Public Licence)

Download

Before you start, you need to download FuNeL scripts.

Compilation

To run FuNeL, you need to compile BioHEL first. Its source code is distributed along with FuNeL scripts. Use the Makefile provided in BioHEL folder:

tar axvf funel.tar.xz
cd funel/biohel
make

Configuration

FuNeL execution depends on three variables set in the coprediction.sh script.

WEKA_PATH - path to the weka.jar file e.g. weka-3.6.10/weka.jar
(FuNeL uses Weka's SVM-RFE implementation),
NUM_BIOHEL_RUNS - number of BioHEL runs, we suggest 10000 as default
NUM_PERMUTATIONS - number of permuted datasets used for significance testing, we suggest 100 as default

Running FuNeL

To generate a co-prediction functional network run the following script from the main FuNeL directory:

./coprediction.sh <project_name> <dataset> <configuration> [<attributes>]

The script requires three parameters:

project name - name of the results directory
dataset - biological data in ARFF format
configuration - protocol variant (number from 1 to 4), see the list in Overview
attributes - number of attributes to retain from feature selection (only for configurations 1 and 3)

Example

The data directory contains a diffuse large B-cell lymphoma dataset in ARFF format (from Shipp2002). To generate a co-prediction network from this dataset with configuration C1 retaining 500 attributes, run:

./coprediction.sh lymphoma data/lymphoma_dataset.arff 1 500

The result file co-prediction_network.txt and other intermediate files will be created in the results/lymphoma directory.

Post-filtering

If the dataset attributes are probes (instead of genes), you can still obtain a gene functional network using the postfiltering.sh script. It substitutes probe names with the correspondent gene names based on the provided mapping file (see data/mapping.txt).

./postfiltering.sh <project_name> <mapping_file>

The mapping file is a two column tab separated file: probe_id gene_id. If two probes are mapped to the same gene the corresponding nodes are merged. If a probe name is not mapped to any gene name, the node is removed from the network.

Optimisation

The network generation process could be speeded up by parellisation of the machine learning runs in coprediction.sh:

for run in `seq 1 $NUM_PERMUTATIONS`
do
    scripts/rbng.sh $projectname 0 $NUM_BIOHEL_RUNS $run
done

The script rbng.sh creates the co-prediction network using a single permuted data set. Due to datasets independence the machine learning phase could be performed in parallel.

Contact

If you have any questions or comments about the FuNeL or this tutorial in particular, please contact us at jaume.bacardit.