|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectweka.core.tokenizers.Tokenizer
weka.core.tokenizers.CharacterDelimitedTokenizer
weka.core.tokenizers.NGramTokenizer
public class NGramTokenizer
Splits a string into an n-gram with min and max grams.
Valid options are:-delimiters <value> The delimiters to use (default ' \r\n\t.,;:'"()?!').
-max <int> The max size of the Ngram (default = 3).
-min <int> The min size of the Ngram (default = 1).
| Constructor Summary | |
|---|---|
NGramTokenizer()
|
|
| Method Summary | |
|---|---|
int |
getNGramMaxSize()
Gets the max N of the NGram. |
int |
getNGramMinSize()
Gets the min N of the NGram. |
java.lang.String[] |
getOptions()
Gets the current option settings for the OptionHandler. |
java.lang.String |
getRevision()
Returns the revision string. |
java.lang.String |
globalInfo()
Returns a string describing the stemmer |
boolean |
hasMoreElements()
returns true if there's more elements available |
java.util.Enumeration |
listOptions()
Returns an enumeration of all the available options.. |
static void |
main(java.lang.String[] args)
Runs the tokenizer with the given options and strings to tokenize. |
java.lang.Object |
nextElement()
Returns N-grams and also (N-1)-grams and .... |
java.lang.String |
NGramMaxSizeTipText()
Returns the tip text for this property. |
java.lang.String |
NGramMinSizeTipText()
Returns the tip text for this property. |
void |
setNGramMaxSize(int value)
Sets the max size of the Ngram. |
void |
setNGramMinSize(int value)
Sets the min size of the Ngram. |
void |
setOptions(java.lang.String[] options)
Parses a given list of options. |
void |
tokenize(java.lang.String s)
Sets the string to tokenize. |
| Methods inherited from class weka.core.tokenizers.CharacterDelimitedTokenizer |
|---|
delimitersTipText, getDelimiters, setDelimiters |
| Methods inherited from class weka.core.tokenizers.Tokenizer |
|---|
runTokenizer, tokenize |
| Methods inherited from class java.lang.Object |
|---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public NGramTokenizer()
| Method Detail |
|---|
public java.lang.String globalInfo()
globalInfo in class Tokenizerpublic java.util.Enumeration listOptions()
listOptions in interface OptionHandlerlistOptions in class CharacterDelimitedTokenizerpublic java.lang.String[] getOptions()
getOptions in interface OptionHandlergetOptions in class CharacterDelimitedTokenizer
public void setOptions(java.lang.String[] options)
throws java.lang.Exception
-delimiters <value> The delimiters to use (default ' \r\n\t.,;:'"()?!').
-max <int> The max size of the Ngram (default = 3).
-min <int> The min size of the Ngram (default = 1).
setOptions in interface OptionHandlersetOptions in class CharacterDelimitedTokenizeroptions - the list of options as an array of strings
java.lang.Exception - if an option is not supportedpublic int getNGramMaxSize()
public void setNGramMaxSize(int value)
value - the size of the NGram.public java.lang.String NGramMaxSizeTipText()
public void setNGramMinSize(int value)
value - the size of the NGram.public int getNGramMinSize()
public java.lang.String NGramMinSizeTipText()
public boolean hasMoreElements()
hasMoreElements in interface java.util.EnumerationhasMoreElements in class Tokenizerpublic java.lang.Object nextElement()
nextElement in interface java.util.EnumerationnextElement in class Tokenizerpublic void tokenize(java.lang.String s)
tokenize in class Tokenizers - the string to tokenizepublic java.lang.String getRevision()
public static void main(java.lang.String[] args)
args - the commandline options and strings to tokenize
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||