public class CharacterNGramTokenizer extends Tokenizer
-max <int> The max size of the Ngram (default = 3).
-min <int> The min size of the Ngram (default = 1).
| Constructor and Description |
|---|
CharacterNGramTokenizer() |
| Modifier and Type | Method and Description |
|---|---|
int |
getNGramMaxSize()
Gets the max N of the NGram.
|
int |
getNGramMinSize()
Gets the min N of the NGram.
|
java.lang.String[] |
getOptions()
Gets the current option settings for the OptionHandler.
|
java.lang.String |
getRevision()
Returns the revision string.
|
java.lang.String |
globalInfo()
Returns a string describing the tokenizer
|
boolean |
hasMoreElements()
returns true if there's more elements available
|
java.util.Enumeration<Option> |
listOptions()
Returns an enumeration of all the available options..
|
static void |
main(java.lang.String[] args)
Runs the tokenizer with the given options and strings to tokenize.
|
java.lang.String |
nextElement()
Returns N-grams and also (N-1)-grams and ....
|
java.lang.String |
NGramMaxSizeTipText()
Returns the tip text for this property.
|
java.lang.String |
NGramMinSizeTipText()
Returns the tip text for this property.
|
void |
setNGramMaxSize(int value)
Sets the max size of the Ngram.
|
void |
setNGramMinSize(int value)
Sets the min size of the Ngram.
|
void |
setOptions(java.lang.String[] options)
Parses a given list of options.
|
void |
tokenize(java.lang.String s)
Sets the string to tokenize.
|
runTokenizer, tokenizeequals, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitmakeCopypublic java.lang.String globalInfo()
globalInfo in class Tokenizerpublic java.util.Enumeration<Option> listOptions()
listOptions in interface OptionHandlerlistOptions in class Tokenizerpublic java.lang.String[] getOptions()
getOptions in interface OptionHandlergetOptions in class Tokenizerpublic void setOptions(java.lang.String[] options)
throws java.lang.Exception
-max <int> The max size of the Ngram (default = 3).
-min <int> The min size of the Ngram (default = 1).
setOptions in interface OptionHandlersetOptions in class Tokenizeroptions - the list of options as an array of stringsjava.lang.Exception - if an option is not supportedpublic int getNGramMaxSize()
public void setNGramMaxSize(int value)
value - the size of the NGram.public java.lang.String NGramMaxSizeTipText()
public void setNGramMinSize(int value)
value - the size of the NGram.public int getNGramMinSize()
public java.lang.String NGramMinSizeTipText()
public boolean hasMoreElements()
hasMoreElements in interface java.util.Enumeration<java.lang.String>hasMoreElements in class Tokenizerpublic java.lang.String nextElement()
nextElement in interface java.util.Enumeration<java.lang.String>nextElement in class Tokenizerpublic void tokenize(java.lang.String s)
public java.lang.String getRevision()
public static void main(java.lang.String[] args)
args - the commandline options and strings to tokenize