public class NaiveBayesMultinomialText extends AbstractClassifier implements UpdateableClassifier, UpdateableBatchProcessor, WeightedInstancesHandler, Aggregateable<NaiveBayesMultinomialText>
-W Use word frequencies instead of binary bag of words.
-P <# instances> How often to prune the dictionary of low frequency words (default = 0, i.e. don't prune)
-M <double> Minimum word frequency. Words with less than this frequence are ignored. If periodic pruning is turned on then this is also used to determine which words to remove from the dictionary (default = 3).
-normalize Normalize document length (use in conjunction with -norm and -lnorm)
-norm <num> Specify the norm that each instance must have (default 1.0)
-lnorm <num> Specify L-norm to use (default 2.0)
-lowercase Convert all tokens to lowercase before adding to the dictionary.
-stopwords-handler The stopwords handler to use (default Null).
-tokenizer <spec> The tokenizing algorihtm (classname plus parameters) to use. (default: weka.core.tokenizers.WordTokenizer)
-stemmer <spec> The stemmering algorihtm (classname plus parameters) to use.
-output-debug-info If set, classifier is run in debug mode and may output additional info to the console
-do-not-check-capabilities If set, classifier capabilities are not checked before classifier is built (use with caution).
BATCH_SIZE_DEFAULT, NUM_DECIMAL_PLACES_DEFAULT| Constructor and Description |
|---|
NaiveBayesMultinomialText() |
| Modifier and Type | Method and Description |
|---|---|
NaiveBayesMultinomialText |
aggregate(NaiveBayesMultinomialText toAggregate)
Aggregate an object with this one
|
void |
batchFinished()
Signal that the training data is finished (for now).
|
void |
buildClassifier(Instances data)
Generates the classifier.
|
double[] |
distributionForInstance(Instance instance)
Calculates the class membership probabilities for the given test instance.
|
void |
finalizeAggregation()
Call to complete the aggregation process.
|
Capabilities |
getCapabilities()
Returns default capabilities of the classifier.
|
double |
getLNorm()
Get the L Norm used.
|
boolean |
getLowercaseTokens()
Get whether to convert all tokens to lowercase
|
double |
getMinWordFrequency()
Get the minimum word frequency.
|
double |
getNorm()
Get the instance's Norm.
|
boolean |
getNormalizeDocLength()
Get whether to normalize the length of each document
|
java.lang.String[] |
getOptions()
Gets the current settings of the classifier.
|
int |
getPeriodicPruning()
Get how often to prune the dictionary
|
java.lang.String |
getRevision()
Returns the revision string.
|
Stemmer |
getStemmer()
Returns the current stemming algorithm, null if none is used.
|
StopwordsHandler |
getStopwordsHandler()
Gets the stopwords handler.
|
Tokenizer |
getTokenizer()
Returns the current tokenizer algorithm.
|
boolean |
getUseWordFrequencies()
Get whether to use word frequencies rather than binary bag of words
representation.
|
java.lang.String |
globalInfo()
Returns a string describing classifier
|
java.util.Enumeration<Option> |
listOptions()
Returns an enumeration describing the available options.
|
java.lang.String |
LNormTipText()
Returns the tip text for this property
|
java.lang.String |
lowercaseTokensTipText()
Returns the tip text for this property
|
static void |
main(java.lang.String[] args)
Main method for testing this class.
|
java.lang.String |
minWordFrequencyTipText()
Returns the tip text for this property
|
java.lang.String |
normalizeDocLengthTipText()
Returns the tip text for this property
|
java.lang.String |
normTipText()
Returns the tip text for this property
|
java.lang.String |
periodicPruningTipText()
Returns the tip text for this property
|
void |
reset()
Reset the classifier.
|
void |
setLNorm(double newLNorm)
Set the L-norm to used
|
void |
setLowercaseTokens(boolean l)
Set whether to convert all tokens to lowercase
|
void |
setMinWordFrequency(double minFreq)
Set the minimum word frequency.
|
void |
setNorm(double newNorm)
Set the norm of the instances
|
void |
setNormalizeDocLength(boolean norm)
Set whether to normalize the length of each document
|
void |
setOptions(java.lang.String[] options)
Parses a given list of options.
|
void |
setPeriodicPruning(int p)
Set how often to prune the dictionary
|
void |
setStemmer(Stemmer value)
the stemming algorithm to use, null means no stemming at all (i.e., the
NullStemmer is used).
|
void |
setStopwordsHandler(StopwordsHandler value)
Sets the stopwords handler to use.
|
void |
setTokenizer(Tokenizer value)
the tokenizer algorithm to use.
|
void |
setUseWordFrequencies(boolean u)
Set whether to use word frequencies rather than binary bag of words
representation.
|
java.lang.String |
stemmerTipText()
Returns the tip text for this property.
|
java.lang.String |
stopwordsHandlerTipText()
Returns the tip text for this property.
|
java.lang.String |
tokenizerTipText()
Returns the tip text for this property.
|
java.lang.String |
toString()
Returns a textual description of this classifier.
|
void |
updateClassifier(Instance instance)
Updates the classifier with the given instance.
|
java.lang.String |
useWordFrequenciesTipText()
Returns the tip text for this property
|
batchSizeTipText, classifyInstance, debugTipText, distributionsForInstances, doNotCheckCapabilitiesTipText, forName, getBatchSize, getDebug, getDoNotCheckCapabilities, getNumDecimalPlaces, implementsMoreEfficientBatchPrediction, makeCopies, makeCopy, numDecimalPlacesTipText, postExecution, preExecution, run, runClassifier, setBatchSize, setDebug, setDoNotCheckCapabilities, setNumDecimalPlacesequals, getClass, hashCode, notify, notifyAll, wait, wait, waitmakeCopypublic java.lang.String globalInfo()
public Capabilities getCapabilities()
getCapabilities in interface ClassifiergetCapabilities in interface CapabilitiesHandlergetCapabilities in class AbstractClassifierCapabilitiespublic void buildClassifier(Instances data) throws java.lang.Exception
buildClassifier in interface Classifierdata - set of instances serving as training datajava.lang.Exception - if the classifier has not been generated successfullypublic void updateClassifier(Instance instance) throws java.lang.Exception
updateClassifier in interface UpdateableClassifierinstance - the new training instance to include in the modeljava.lang.Exception - if the instance could not be incorporated in the model.public double[] distributionForInstance(Instance instance) throws java.lang.Exception
distributionForInstance in interface ClassifierdistributionForInstance in class AbstractClassifierinstance - the instance to be classifiedjava.lang.Exception - if there is a problem generating the predictionpublic void reset()
public void setStemmer(Stemmer value)
value - the configured stemming algorithm, or nullNullStemmerpublic Stemmer getStemmer()
public java.lang.String stemmerTipText()
public void setTokenizer(Tokenizer value)
value - the configured tokenizing algorithmpublic Tokenizer getTokenizer()
public java.lang.String tokenizerTipText()
public java.lang.String useWordFrequenciesTipText()
public void setUseWordFrequencies(boolean u)
u - true if word frequencies are to be used.public boolean getUseWordFrequencies()
public java.lang.String lowercaseTokensTipText()
public void setLowercaseTokens(boolean l)
l - true if all tokens are to be converted to lowercasepublic boolean getLowercaseTokens()
public java.lang.String periodicPruningTipText()
public void setPeriodicPruning(int p)
p - how often to prunepublic int getPeriodicPruning()
public java.lang.String minWordFrequencyTipText()
public void setMinWordFrequency(double minFreq)
minFreq - the minimum word frequency to usepublic double getMinWordFrequency()
public java.lang.String normalizeDocLengthTipText()
public void setNormalizeDocLength(boolean norm)
norm - true if document lengths is to be normalizedpublic boolean getNormalizeDocLength()
public java.lang.String normTipText()
public double getNorm()
public void setNorm(double newNorm)
newNorm - the norm to wich the instances must be setpublic java.lang.String LNormTipText()
public double getLNorm()
public void setLNorm(double newLNorm)
newLNorm - the L-normpublic void setStopwordsHandler(StopwordsHandler value)
value - the stopwords handler, if null, Null is usedpublic StopwordsHandler getStopwordsHandler()
public java.lang.String stopwordsHandlerTipText()
public java.util.Enumeration<Option> listOptions()
listOptions in interface OptionHandlerlistOptions in class AbstractClassifierpublic void setOptions(java.lang.String[] options)
throws java.lang.Exception
-W Use word frequencies instead of binary bag of words.
-P <# instances> How often to prune the dictionary of low frequency words (default = 0, i.e. don't prune)
-M <double> Minimum word frequency. Words with less than this frequence are ignored. If periodic pruning is turned on then this is also used to determine which words to remove from the dictionary (default = 3).
-normalize Normalize document length (use in conjunction with -norm and -lnorm)
-norm <num> Specify the norm that each instance must have (default 1.0)
-lnorm <num> Specify L-norm to use (default 2.0)
-lowercase Convert all tokens to lowercase before adding to the dictionary.
-stopwords-handler The stopwords handler to use (default Null).
-tokenizer <spec> The tokenizing algorihtm (classname plus parameters) to use. (default: weka.core.tokenizers.WordTokenizer)
-stemmer <spec> The stemmering algorihtm (classname plus parameters) to use.
-output-debug-info If set, classifier is run in debug mode and may output additional info to the console
-do-not-check-capabilities If set, classifier capabilities are not checked before classifier is built (use with caution).
setOptions in interface OptionHandlersetOptions in class AbstractClassifieroptions - the list of options as an array of stringsjava.lang.Exception - if an option is not supportedpublic java.lang.String[] getOptions()
getOptions in interface OptionHandlergetOptions in class AbstractClassifierpublic java.lang.String toString()
toString in class java.lang.Objectpublic java.lang.String getRevision()
getRevision in interface RevisionHandlergetRevision in class AbstractClassifierpublic NaiveBayesMultinomialText aggregate(NaiveBayesMultinomialText toAggregate) throws java.lang.Exception
Aggregateableaggregate in interface Aggregateable<NaiveBayesMultinomialText>toAggregate - the object to aggregatejava.lang.Exception - if the supplied object can't be aggregated for some
reasonpublic void finalizeAggregation()
throws java.lang.Exception
AggregateablefinalizeAggregation in interface Aggregateable<NaiveBayesMultinomialText>java.lang.Exception - if the aggregation can't be finalized for some reasonpublic void batchFinished()
throws java.lang.Exception
UpdateableBatchProcessorbatchFinished in interface UpdateableBatchProcessorjava.lang.Exception - if a problem occurspublic static void main(java.lang.String[] args)
args - the options