Fonctions membres publiques | |
| tx_indexedsearch_lexer () | |
| split2Words ($wordString) | |
| addWords (&$words, &$wordString, $start, $len) | |
| get_word (&$str, $pos=0) | |
| utf8_is_letter (&$str, &$len, $pos=0) | |
| charType ($cp) | |
| utf8_ord (&$str, &$len, $pos=0, $hex=false) | |
Attributs publics | |
| $debug = FALSE | |
| $debugString = '' | |
| $csObj | |
| $lexerConf | |
| tx_indexedsearch_lexer::tx_indexedsearch_lexer | ( | ) |
| tx_indexedsearch_lexer::split2Words | ( | $ | wordString | ) |
Splitting string into words. Used for indexing, can also be used to find words in query.
| string | String with UTF-8 content to process. |
| tx_indexedsearch_lexer::addWords | ( | &$ | words, | |
| &$ | wordString, | |||
| $ | start, | |||
| $ | len | |||
| ) |
Add word to word-array This function should be used to make sure CJK sequences are split up in the right way
| array | Array of accumulated words | |
| string | Complete Input string from where to extract word | |
| integer | Start position of word in input string | |
| integer | The Length of the word string from start position |
| tx_indexedsearch_lexer::get_word | ( | &$ | str, | |
| $ | pos = 0 | |||
| ) |
Get the first word in a given utf-8 string (initial non-letters will be skipped)
| string | Input string (reference) | |
| integer | Starting position in input string |
| tx_indexedsearch_lexer::utf8_is_letter | ( | &$ | str, | |
| &$ | len, | |||
| $ | pos = 0 | |||
| ) |
See if a character is a letter (or a string of letters or non-letters).
| string | Input string (reference) | |
| integer | Byte-length of character sequence (reference, return value) | |
| integer | Starting position in input string |
| tx_indexedsearch_lexer::charType | ( | $ | cp | ) |
Determine the type of character
| integer | Unicode number to evaluate |
| tx_indexedsearch_lexer::utf8_ord | ( | &$ | str, | |
| &$ | len, | |||
| $ | pos = 0, |
|||
| $ | hex = false | |||
| ) |
Converts a UTF-8 multibyte character to a UNICODE codepoint
| string | UTF-8 multibyte character string (reference) | |
| integer | The length of the character (reference, return value) | |
| integer | Starting position in input string | |
| boolean | If set, then a hex. number is returned |
| tx_indexedsearch_lexer::$debug = FALSE |
| tx_indexedsearch_lexer::$debugString = '' |
| tx_indexedsearch_lexer::$csObj |
| tx_indexedsearch_lexer::$lexerConf |
Valeur initiale :
array(
'printjoins' => array( // This is the Unicode numbers of chars that are allowed INSIDE a sequence of letter chars (alphanum + CJK)
0x2e, // "."
0x2d, // "-"
0x5f, // "_"
0x3a, // ":"
0x2f, // "/"
0x27, // "'"
// 0x615, // ARABIC SMALL HIGH TAH
),
'casesensitive' => FALSE, // Set, if case sensitive indexing is wanted.
'removeChars' => array( // List of unicode numbers of chars that will be removed before words are returned (eg. "-")
0x2d // "-"
)
)
1.5.3