Main Page | Modules | Data Structures | File List | Globals | Related Pages

cvfdt File Reference


Detailed Description

Learns a DecisionTree from a high-speed time-changing data stream (or very large data set).

Learns a decision tree from a high-speed time-changing data stream or a very large data set as described in this paper. cvfdt does not work with continuous attributes.

CVFDT learns a tree as follows. It starts with a single leaf and starts collecting training examples from a data stream (with the -stdin argument) or from the file stem.data. When it has enough data to know, with high confidence that it knows which attribute is the best to partition the data with, it turns the leaf into an internal node splitting on that attribute and starts learning at the new leaves recursively. CVFDT maintains a window of training examples and keeps its learned tree up-to-date with this window by monitoring the quality of its old decisions as data moves into and out of the window. In particular, whenever a new example is read it is added to the statistics at all the nodes in the tree that it passes through, the last example in the window is forgotten from every node where it had previously had an effect, and the validity of all statistical tests are checked. If cvfdt detects a change it starts growing an alternate tree in parallel which is rooted at the newly-invalidated node. When the alternate is more accurate on new data than the original the original is replaced by the alternate and freed.

cvfdt takes input and does output in c4.5 format. It expects to find the files <stem>.names and <stem>.data.

Thanks:
to Laurie Spencer for doing the core development work for cvfdt.

Wish List:
Modify this learner to work with continuous attributes.

An API to this learner like the one to learning BeliefNet structure in beliefnet-engine.h

Arguments


Generated for VFML by doxygen hosted by SourceForge.net Logo