J48Consolidated: Class for generating a pruned or unpruned C45 consolidated tree

URL:http://www.aldapa.eus/res/weka-ctc
Author:Jesús M. Pérez <txus.perez{[at]}ehu.eus>, Igor Ibarguren <igor.ibarguren{[at]}ehu.eus>
Maintainer:Jesús M. Pérez <txus.perez{[at]}ehu.eus>

Class for generating a pruned or unpruned C4.5 consolidated tree.
Uses the Consolidated Tree Construction (CTC) algorithm: a single tree is built based on a set of subsamples. New options are added to the J48 class to set the Resampling Method (RM) for the generation of samples to be used in the consolidation process. For more information, see:

Jesús M. Pérez and Javier Muguerza and Olatz Arbelaitz and Ibai Gurrutxaga and José I. Martí­n. "Combining multiple class distribution modified subsamples in a single tree". Pattern Recognition Letters (2007), 28(4), pp 414-422. doi:10.1016/j.patrec.2006.08.013

A new way has been added to determine the number of samples to be used in the consolidation process which guarantees the minimum percentage, the coverage value, of the examples of the original sample to be contained by the set of built subsamples. For more information, see:

Igor Ibarguren and Jesús M. Pérez and Javier Muguerza and Ibai Gurrutxaga and Olatz Arbelaitz. "Coverage-based resampling: Building robust consolidated decision trees". Knowledge Based Systems (2015), Vol. 79, pp 51-67. doi:10.1016/j.knosys.2014.12.023
In this new update we have added the implementation of structural metrics to quantify the explanation capacity of the consolidated trees for a new work on the Partially Consolidated Tree-Bagging (PCTBagging) algorithm (see "J48PartiallyConsolidated: An implementation of the PCTBagging algorithm for WEKA").
Tree Structure Measures (Explainability Quantification): Beyond standard J48 metrics (TreeSize, NumLeaves), three new measures quantify explainability: NumInnerNodes counts decision nodes (direct explanatory components); ExplanationLength computes average root-to-leaf path length; and WeightedExplanationLength adjusts this by leaf instance counts. These evaluate the trade-off between model complexity and human interpretability in the consolidated tree.
For more information, see:
Jesús M. Pérez and Olatz Arbelaitz.
"Multi-Criteria Node Selection in Direct PCTBagging: Balancing Interpretability and Accuracy with Bootstrap Sampling and Unrestricted Pruning". Information Sciences (2025), submitted.
doi:10.1016/j.ins.2025.XX.XXX

All available versions:
Latest
3.3
3.2
3.1
3.0