Interface with C4.5 Example

Example for: Invoking C4.5 and grabbing the Decision Tree it learns.

This is a simple example that introduces everything you'll need to invoke C4.5 from your program and to retrieve a DecisionTreePtr containing the tree that C4.5 induces.  The Example includes a made-up data set, a sample makefile, and a program which interfaces with C4.5.  The example's file are in the <VFML-root>/examples/c45interface/ directory.   This document presents the code with a detailed commentary and some suggestions for modifications.

You might like to go to the <VFML-root>/examples/c45interface/ directory and get your favorite code/text editor ready.

The Dataset

The dataset used for the c45interface example is made-up.  You can see the scan-dataset example for more information about it.

C4.5

C4.5 is Ross Quinlan's excellent decision tree induction system.  You can download it from Professor Quinlan's homepage at http://www.cse.unsw.edu.au/~quinlan/.  You will need to download and install C4.5R8 in your path to make this example, and VFML's C4.5 interface, work.

The Makefile

Glance at the makefile; the top couple lines contain information you would need to update if you want to use the file with another project.

The makefile is set up to work as is for the c45interface example.  Make sure you've properly installed the VFML library (see the Getting Started section if you haven't done this yet), and changed to the <VFML-root>/examples/c45interface/ directory.  Type 'make' to build the example program, run it by typing c45interface, and look at the output.  You should see a printout of a decision tree which was induced by C4.5.

The Code

Now let's take a look at the code, load c45interface.c into your editor.

Setup

#include "uwml.h"
#include <stdio.h>

These two include files will appear in just about every project build with VFML.   The first includes all the VFML interfaces, the second is needed to work with files, something you will do in most of your VFML project.

int main(void) {
   ExampleSpecPtr es = ExampleSpecRead("test.names");
   ExamplePtr e;
   FILE *exampleIn = fopen("test.data", "r");
   DecisionTreePtr dt;
   VoidListPtr examples = VLNew();

These lines load the example spec, declare an example pointer, and open the example data file.  The example spec is very important, it contains a complete description of the dataset including attributes, their types and values, and the classes.  Your program will query the example spec to determine how to go about working with a particular dataset, what values to expect, and how to iterate over them.  You will also need to pass the spec to various VFML interfaces; it might be a good thing to make global in your projects.

exampleIn is initialized to contain a file handle to the data which is configured for reading.  The program will read examples from this file, one at a time, into the 'examples' list until there are no more left to read.  You can refer to the scan-dataset example for more information about how this works.

Invoking C4.5

   dt = C45Learn(es, examples);

The C45Learn function does the hard work in this example.  Its parameters are the example spec, an array of example pointers, and a count of the number of examples contained in the array; the function interfaces with C4.5 (if C4.5 is correctly installed in you path), asks it to learn a decision tree on the examples contained in the array, and returns the induced tree.  You are responsible for the memory used by the returned tree and should call DecisionTreeFree(dt) when done with the tree.

Finally the program prints the induced tree to the standard output.