public class NGramTokenizer extends CharacterDelimitedTokenizer
-delimiters <value> The delimiters to use (default ' \r\n\t.,;:'"()?!').
-max <int> The max size of the Ngram (default = 3).
-min <int> The min size of the Ngram (default = 1).
Constructor and Description |
---|
NGramTokenizer() |
Modifier and Type | Method and Description |
---|---|
int |
getNGramMaxSize()
Gets the max N of the NGram.
|
int |
getNGramMinSize()
Gets the min N of the NGram.
|
java.lang.String[] |
getOptions()
Gets the current option settings for the OptionHandler.
|
java.lang.String |
getRevision()
Returns the revision string.
|
java.lang.String |
globalInfo()
Returns a string describing the stemmer
|
boolean |
hasMoreElements()
returns true if there's more elements available
|
java.util.Enumeration<Option> |
listOptions()
Returns an enumeration of all the available options..
|
static void |
main(java.lang.String[] args)
Runs the tokenizer with the given options and strings to tokenize.
|
java.lang.String |
nextElement()
Returns N-grams and also (N-1)-grams and ....
|
java.lang.String |
NGramMaxSizeTipText()
Returns the tip text for this property.
|
java.lang.String |
NGramMinSizeTipText()
Returns the tip text for this property.
|
void |
setNGramMaxSize(int value)
Sets the max size of the Ngram.
|
void |
setNGramMinSize(int value)
Sets the min size of the Ngram.
|
void |
setOptions(java.lang.String[] options)
Parses a given list of options.
|
void |
tokenize(java.lang.String s)
Sets the string to tokenize.
|
delimitersTipText, getDelimiters, setDelimiters
runTokenizer, tokenize
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
makeCopy
public java.lang.String globalInfo()
globalInfo
in class Tokenizer
public java.util.Enumeration<Option> listOptions()
listOptions
in interface OptionHandler
listOptions
in class CharacterDelimitedTokenizer
public java.lang.String[] getOptions()
getOptions
in interface OptionHandler
getOptions
in class CharacterDelimitedTokenizer
public void setOptions(java.lang.String[] options) throws java.lang.Exception
-delimiters <value> The delimiters to use (default ' \r\n\t.,;:'"()?!').
-max <int> The max size of the Ngram (default = 3).
-min <int> The min size of the Ngram (default = 1).
setOptions
in interface OptionHandler
setOptions
in class CharacterDelimitedTokenizer
options
- the list of options as an array of stringsjava.lang.Exception
- if an option is not supportedpublic int getNGramMaxSize()
public void setNGramMaxSize(int value)
value
- the size of the NGram.public java.lang.String NGramMaxSizeTipText()
public void setNGramMinSize(int value)
value
- the size of the NGram.public int getNGramMinSize()
public java.lang.String NGramMinSizeTipText()
public boolean hasMoreElements()
hasMoreElements
in interface java.util.Enumeration<java.lang.String>
hasMoreElements
in class Tokenizer
public java.lang.String nextElement()
nextElement
in interface java.util.Enumeration<java.lang.String>
nextElement
in class Tokenizer
public void tokenize(java.lang.String s)
public java.lang.String getRevision()
public static void main(java.lang.String[] args)
args
- the commandline options and strings to tokenize