Create Your Own Learner

When you implement a learner you need to worry about more than the learning; you need to interface with your environment, deal with command line arguments, locate and load files, interface with other tools.  All these things are important, but aren't particularly interesting.  This page describes the main issues to consider when working in the VFML environment.  The implement-learner example will present a framework that implements solutions for these issues.

Locating datafiles
VFML currently support C4.5 format.for datafiles; your learners will need to accept the filestem name as an argument, locate and open .names, .data, and perhaps .test files.  We suggest you accept the command line argument, -f <filestem> as this will allow you to easily interface with VFML's xvalidate and batchtest tools.
Testing & reporting error rate
You will probably also want to accept a command line flag (we suggest -u) to tell your learner to test its accuracy on the examples in <filestem>.test and report the results.  To completely integrate with VFML's xvalidate and batchtest, you should also accept a flag that causes your learner to limit its output to: <error-rate on test set> <size of learned model>.  VFML generally overloads the -u flag to perform both functions.
Other command line arguments
You will probably want to accept other command line arguments to configure the execution of your learner.  The implement-learner example accepts a few dummy arguments to help get you started.
Building and linking with VFML
Any learner build with VFML should include uwml.h and link with the VFML library.   The implement learner example program, descirbed in the next section, provides a makefile that shows you how to do this under UNIX as well as a VC++6.0 project that helps get you started under Windows.

The Implement Learner Example Program

Example For: The framework that every learner will need

This is a simple example that presents the simplest possible learning algorithm - one that always predicts the most-common-class in the training set.  This code is very similar to the implementation of VFML's mostcommonclass learner.  It includes a makefile and a source file which are located in the <VFML-root>/examples/implement-learner/ directory.   This document presents an overview of the code which should be sufficient to get you started modifying it for your own needs.

You might like to go to the <VFML-root>/examples/implement-learner/ directory and get your favorite code/text editor ready.  You might also like to copy the directory somewhere and begin modifying the example for your own needs.

The Makefile

This makefile will be a good starting point for your VFML projects.  Glance at the makefile; the top couple lines contain information you would need to update if you want to use the file with another project.

Make sure you've properly installed the VFML library (see the Getting Started section if you haven't done this yet), then type 'make' to build the example program.  Run it by typing implement-learner -h, and look at the output.

The VC++ 6.0 Project

We've provided a starter project for windows using VC++ 6.0.  It is configured to work if you've installed the VFML library into c:/proj/uwml/, if not see the Getting Started section for more information on how to update the configuration.

The windows version also uses a different source file, implement-learner-windows.c.   The only difference between this file and implement-learner.c is that it doesn't do any timing. 

The Code

This will be a high-level overview of the code from the example; it should be enough to get you started.   For a more detailed description of a VFML project see the loading data documentation.

Command Line Arguments
_printUsage and _processArgs work together to get a valid command line and set a collection of globals from it.  The example shows you how to accept flags, strings, ints, and doubles from the command line.  One note is that you should be careful not to have any arguments that are sub-strings of other arguments - if you don't get the ordering correct the strcmp might accept the longer argument as an instance of the shorter one.
Datasets
The example will take -souce <directory> and -f <file-stem> arguments and use them to find a dataset.  It then reads the names file into a global, gEs, and iterates over the examples from the .data file.
Testing
If you pass the -u argument to the program, it will test its 'model' on the examples in the .test file and output the results in a format appropriate for interfacing with xvalidate and batchtest.
Debugging Output
By default, the program is pretty quiet.  If you pass the -v argument, the program will output more information about its progress.  In your learner, you might want to implement higher message levels (in response to multiple -v flags on the command line) to print out more detailed information about your learner's progress. See VFML's Debugging API for some code that may help with this.