Solutions Open Source

Référence de la classe tx_indexedsearch_lexer

Liste de tous les membres

Fonctions membres publiques

 tx_indexedsearch_lexer ()
 split2Words ($wordString)
 addWords (&$words, &$wordString, $start, $len)
 get_word (&$str, $pos=0)
 utf8_is_letter (&$str, &$len, $pos=0)
 charType ($cp)
 utf8_ord (&$str, &$len, $pos=0, $hex=false)

Attributs publics

 $debug = FALSE
 $debugString = ''
 $csObj
 $lexerConf


Documentation des fonctions membres

tx_indexedsearch_lexer::tx_indexedsearch_lexer (  ) 

Constructor: Initializes the charset class, t3lib_cs

Renvoie:
void

tx_indexedsearch_lexer::split2Words ( wordString  ) 

Splitting string into words. Used for indexing, can also be used to find words in query.

Paramètres:
string String with UTF-8 content to process.
Renvoie:
array Array of words in utf-8

tx_indexedsearch_lexer::addWords ( &$  words,
&$  wordString,
start,
len 
)

Add word to word-array This function should be used to make sure CJK sequences are split up in the right way

Paramètres:
array Array of accumulated words
string Complete Input string from where to extract word
integer Start position of word in input string
integer The Length of the word string from start position
Renvoie:
void

tx_indexedsearch_lexer::get_word ( &$  str,
pos = 0 
)

Get the first word in a given utf-8 string (initial non-letters will be skipped)

Paramètres:
string Input string (reference)
integer Starting position in input string
Renvoie:
array 0: start, 1: len or false if no word has been found

tx_indexedsearch_lexer::utf8_is_letter ( &$  str,
&$  len,
pos = 0 
)

See if a character is a letter (or a string of letters or non-letters).

Paramètres:
string Input string (reference)
integer Byte-length of character sequence (reference, return value)
integer Starting position in input string
Renvoie:
boolean letter (or word) found

tx_indexedsearch_lexer::charType ( cp  ) 

Determine the type of character

Paramètres:
integer Unicode number to evaluate
Renvoie:
array Type of char; index-0: the main type: num, alpha or CJK (Chinese / Japanese / Korean)

tx_indexedsearch_lexer::utf8_ord ( &$  str,
&$  len,
pos = 0,
hex = false 
)

Converts a UTF-8 multibyte character to a UNICODE codepoint

Paramètres:
string UTF-8 multibyte character string (reference)
integer The length of the character (reference, return value)
integer Starting position in input string
boolean If set, then a hex. number is returned
Renvoie:
integer UNICODE codepoint


Documentation des données membres

tx_indexedsearch_lexer::$debug = FALSE

tx_indexedsearch_lexer::$debugString = ''

tx_indexedsearch_lexer::$csObj

tx_indexedsearch_lexer::$lexerConf

Valeur initiale :

 array(
                'printjoins' => array(  // This is the Unicode numbers of chars that are allowed INSIDE a sequence of letter chars (alphanum + CJK)
                        0x2e,   // "."
                        0x2d,   // "-"
                        0x5f,   // "_"
                        0x3a,   // ":"
                        0x2f,   // "/"
                        0x27,   // "'"
                        // 0x615,       // ARABIC SMALL HIGH TAH
                ),
                'casesensitive' => FALSE,       // Set, if case sensitive indexing is wanted.
                'removeChars' => array(         // List of unicode numbers of chars that will be removed before words are returned (eg. "-")
                        0x2d    // "-"
                )
        )


La documentation de cette classe a été générée à partir du fichier suivant :
Généré le Fri Aug 31 11:43:51 2007 pour OBLADY - Typo3 API v4.1.2 par  doxygen 1.5.3