Author: | Jesús M. Pérez <txus.perez{[at]}ehu.eus> | |
Category: | Preprocessing, Experimenter | |
Date: | 2025-06-02 | |
Depends: | weka (>=3.8.6) | |
Description: | Class for extracting the main descriptive characteristics of a dataset based on WEKA's simplest classifier, ZeroR. When used as a classification algorithm in the WEKA's Experimenter (located in the "rules" group), it returns the descriptive features (number of classes, number of attributes...) of a set of datasets as if they were metrics (Comparison field) used to evaluate the goodness of the classifier (like Percent_correct, Area_under_ROC, Elapsed_Time_training...). For proper results configuration (Setup tab of the Experimenter), it's recommended to set the 'Experiment Type' to "Train/Test Percentage Split (order preserved)" with 100% 'Train Percentage'. This ensures measures like Number_of_training_instances or NumMissingValuesDataset aren't affected by Train/Test data splits of the default 'Cross-validation' option. To obtain research-ready results, specify 'CSV file' as 'Results Destination' and provide a filename. After running the experiment, the generated CSV can be opened in spreadsheet software, displaying datasets in rows and their complete features (plus ZeroR metrics) in columns - similar to the dataset description tables commonly found in machine learning publications. List of extracted characteristics (all starting with “measure” due to WEKA naming convention):
Jesús M. Pérez and Olatz Arbelaitz. "Multi-Criteria Node Selection in Direct PCTBagging: Balancing Interpretability and Accuracy with Bootstrap Sampling and Unrestricted Pruning". Information Sciences (2025), submitted. doi:10.1016/j.ins.2025.XX.XXX | |
Enhances: | ||
License: | GPL 3.0 | |
Maintainer: | Jesús M. Pérez <txus.perez{[at]}ehu.eus> | |
PackageURL: | http://www.aldapa.eus/res/weka-dataextractor/DatasetCharacteristicsExtractor-v1.0.zip | |
Related: | ZeroR | |
URL: | http://www.aldapa.eus/res/weka-dataextractor | |
Version: | 1.0 |