BioHEL's Rule Post-processing Engine tutorial
Download
Before you start, you need to download the Rule Post-processing Engine source code.
Installation
To decompress the code use the following line:
$ tar zxvf postprocessing.tar.gz
To compile the code run the following commands:
$ touch .depend
$ make dep
$ make
Configuration
Inside the compress file there is an example configuration file named test-pp.conf
. This file should be modified by the user depending on the operator or the combination of operators the user wants to use. The engine uses up to 6 operators.
To specify an operator in the configuration file use the following syntax:
operator {number} policy {name}
Five different operator policies are available:
- cl - non-conservative cleaning
- cl2 - conservative cleaning
- pr - pruning
- sw - rule swapping
- none - skip
For example, the following is a valid configuration:
operator 1 policy cl
operator 2 policy pr
operator 3 policy cl2
operator 4 policy pr
operator 5 policy sw
Also in the configuration file it is possible to specify which statistics should be dumped to the standard output by changing the following lines:
trainset stats enabled {option}
testset stats enabled {option}
The options are:
- START - generates statistics before applying the operators,
- END - generates statistics after applying the operators,
- ALL - generates statistics before and after.
Commenting these lines stops the system from generating the respective statistics.
Execution
To run the code over a final rule set, execute the following command:
./postprocessing test-pp.conf <ruleset> <train set> [test set]
The test set is optional. When included, additional statistics will be calculated.
In the example
folder, we provide an example of a training set, test set and a rule set for the Adult problem from the UCI repository.
Both training and test sets should be given in WEKA format. The rule set should have the attributes in the rules separated with "|" and expressed as follows:
- continuous: Att
<att-name>
is [<lower bound>
,<upper bound>
] - discrete: Att
<att-name>
is<value>
,<value>
,<value>
The following is an example of the ruleset format:
Att a16 is A,C,G|Att a18 is A,C,T|Att changes is [<1.000000]|Pass
Att a06 is A,C,G|Att a11 is C|Att a62 is T|Att changes is [<7.000000]|Pass
Default rule -> Fail
To extract the final rule set from the BioHEL's output, use the Perl script in scripts/extract_rules.pl
as follows:
cat <output> | extract_rules.pl
Contact
For any questions or comments please contact jaume.bacardit.