public class ComplementNaiveBayes extends Classifier implements OptionHandler, WeightedInstancesHandler, TechnicalInformationHandler
@inproceedings{Rennie2003, author = {Jason D. Rennie and Lawrence Shih and Jaime Teevan and David R. Karger}, booktitle = {ICML}, pages = {616-623}, publisher = {AAAI Press}, title = {Tackling the Poor Assumptions of Naive Bayes Text Classifiers}, year = {2003} }Valid options are:
-N Normalize the word weights for each class
-S Smoothing value to avoid zero WordGivenClass probabilities (default=1.0).
Constructor and Description |
---|
ComplementNaiveBayes() |
Modifier and Type | Method and Description |
---|---|
void |
buildClassifier(Instances instances)
Generates the classifier.
|
double |
classifyInstance(Instance instance)
Classifies a given instance.
|
Capabilities |
getCapabilities()
Returns default capabilities of the classifier.
|
boolean |
getNormalizeWordWeights()
Returns true if the word weights for each class are to be normalized
|
java.lang.String[] |
getOptions()
Gets the current settings of the classifier.
|
java.lang.String |
getRevision()
Returns the revision string.
|
double |
getSmoothingParameter()
Gets the smoothing value to be used to avoid zero WordGivenClass
probabilities.
|
TechnicalInformation |
getTechnicalInformation()
Returns an instance of a TechnicalInformation object, containing
detailed information about the technical background of this class,
e.g., paper reference or book this class is based on.
|
java.lang.String |
globalInfo()
Returns a string describing this classifier
|
java.util.Enumeration |
listOptions()
Returns an enumeration describing the available options.
|
static void |
main(java.lang.String[] argv)
Main method for testing this class.
|
java.lang.String |
normalizeWordWeightsTipText()
Returns the tip text for this property
|
void |
setNormalizeWordWeights(boolean doNormalize)
Sets whether if the word weights for each class should be normalized
|
void |
setOptions(java.lang.String[] options)
Parses a given list of options.
|
void |
setSmoothingParameter(double val)
Sets the smoothing value used to avoid zero WordGivenClass probabilities
|
java.lang.String |
smoothingParameterTipText()
Returns the tip text for this property
|
java.lang.String |
toString()
Prints out the internal model built by the classifier.
|
debugTipText, distributionForInstance, forName, getDebug, makeCopies, makeCopy, setDebug
public java.util.Enumeration listOptions()
listOptions
in interface OptionHandler
listOptions
in class Classifier
public java.lang.String[] getOptions()
getOptions
in interface OptionHandler
getOptions
in class Classifier
public void setOptions(java.lang.String[] options) throws java.lang.Exception
-N Normalize the word weights for each class
-S Smoothing value to avoid zero WordGivenClass probabilities (default=1.0).
setOptions
in interface OptionHandler
setOptions
in class Classifier
options
- the list of options as an array of stringsjava.lang.Exception
- if an option is not supportedpublic boolean getNormalizeWordWeights()
public void setNormalizeWordWeights(boolean doNormalize)
doNormalize
- whether the word weights are to be normalizedpublic java.lang.String normalizeWordWeightsTipText()
public double getSmoothingParameter()
public void setSmoothingParameter(double val)
val
- the new smooting valuepublic java.lang.String smoothingParameterTipText()
public java.lang.String globalInfo()
public TechnicalInformation getTechnicalInformation()
getTechnicalInformation
in interface TechnicalInformationHandler
public Capabilities getCapabilities()
getCapabilities
in interface CapabilitiesHandler
getCapabilities
in class Classifier
Capabilities
public void buildClassifier(Instances instances) throws java.lang.Exception
buildClassifier
in class Classifier
instances
- set of instances serving as training datajava.lang.Exception
- if the classifier has not been built successfullypublic double classifyInstance(Instance instance) throws java.lang.Exception
The classification rule is:
MinC(forAllWords(ti*Wci))
where
ti is the frequency of word i in the given instance
Wci is the weight of word i in Class c.
For more information see section 4.4 of the paper mentioned above in the classifiers description.
classifyInstance
in class Classifier
instance
- the instance to classifyjava.lang.Exception
- if the classifier has not been built yet.public java.lang.String toString()
toString
in class java.lang.Object
public java.lang.String getRevision()
getRevision
in interface RevisionHandler
getRevision
in class Classifier
public static void main(java.lang.String[] argv)
argv
- the options