Solutions Open Source

Référence de la classe tx_indexedsearch_crawler

Liste de tous les membres

Fonctions membres publiques

 crawler_init (&$pObj)
 crawler_execute ($params, &$pObj)
 crawler_execute_type1 ($cfgRec, &$session_data, $params, &$pObj)
 crawler_execute_type2 ($cfgRec, &$session_data, $params, &$pObj)
 crawler_execute_type3 ($cfgRec, &$session_data, $params, &$pObj)
 crawler_execute_type4 ($cfgRec, &$session_data, $params, &$pObj)
 cleanUpOldRunningConfigurations ()
 checkUrl ($url, $urlLog, $baseUrl)
 indexExtUrl ($url, $pageId, $rl, $cfgUid, $setId)
 indexSingleRecord ($r, $cfgRec, $rl=NULL)
 loadIndexerClass ()
 getUidRootLineForClosestTemplate ($id)
 generateNextIndexingTime ($cfgRec)
 checkDeniedSuburls ($url, $url_deny)
 addQueueEntryForHook ($cfgRec, $title)
 deleteFromIndex ($id)
 processCmdmap_preProcess ($command, $table, $id, $value, &$pObj)
 processDatamap_afterDatabaseOperations ($status, $table, $id, $fieldArray, &$pObj)

Attributs publics

 $secondsPerExternalUrl = 3
 $instanceCounter = 0
 $callBack = 'EXT:indexed_search/class.crawler.php:&tx_indexedsearch_crawler'


Documentation des fonctions membres

tx_indexedsearch_crawler::crawler_init ( &$  pObj  ) 

Initialization of crawler hook. This function is asked for each instance of the crawler and we must check if something is timed to happen and if so put entry(s) in the crawlers log to start processing. In reality we select indexing configurations and evaluate if any of them needs to run.

Paramètres:
object Parent object (tx_crawler lib)
Renvoie:
void

tx_indexedsearch_crawler::crawler_execute ( params,
&$  pObj 
)

Call back function for execution of a log element

Paramètres:
array Params from log element. Must contain $params['indexConfigUid']
object Parent object (tx_crawler lib)
Renvoie:
array Result array

tx_indexedsearch_crawler::crawler_execute_type1 ( cfgRec,
&$  session_data,
params,
&$  pObj 
)

Indexing records from a table

Paramètres:
array Indexing Configuration Record
array Session data for the indexing session spread over multiple instances of the script. Passed by reference so changes hereto will be saved for the next call!
array Parameters from the log queue.
object Parent object (from "crawler" extension!)
Renvoie:
void

tx_indexedsearch_crawler::crawler_execute_type2 ( cfgRec,
&$  session_data,
params,
&$  pObj 
)

Indexing files from fileadmin

Paramètres:
array Indexing Configuration Record
array Session data for the indexing session spread over multiple instances of the script. Passed by reference so changes hereto will be saved for the next call!
array Parameters from the log queue.
object Parent object (from "crawler" extension!)
Renvoie:
void

tx_indexedsearch_crawler::crawler_execute_type3 ( cfgRec,
&$  session_data,
params,
&$  pObj 
)

Indexing External URLs

Paramètres:
array Indexing Configuration Record
array Session data for the indexing session spread over multiple instances of the script. Passed by reference so changes hereto will be saved for the next call!
array Parameters from the log queue.
object Parent object (from "crawler" extension!)
Renvoie:
void

tx_indexedsearch_crawler::crawler_execute_type4 ( cfgRec,
&$  session_data,
params,
&$  pObj 
)

Page tree indexing type

Paramètres:
array Indexing Configuration Record
array Session data for the indexing session spread over multiple instances of the script. Passed by reference so changes hereto will be saved for the next call!
array Parameters from the log queue.
object Parent object (from "crawler" extension!)
Renvoie:
void

tx_indexedsearch_crawler::cleanUpOldRunningConfigurations (  ) 

Look up all old index configurations which are finished and needs to be reset and done

Renvoie:
void

tx_indexedsearch_crawler::checkUrl ( url,
urlLog,
baseUrl 
)

Check if an input URL are allowed to be indexed. Depends on whether it is already present in the url log.

Paramètres:
string URL string to check
array Array of already indexed URLs (input url is looked up here and must not exist already)
string Base URL of the indexing process (input URL must be "inside" the base URL!)
Renvoie:
string Returls the URL if OK, otherwise false

tx_indexedsearch_crawler::indexExtUrl ( url,
pageId,
rl,
cfgUid,
setId 
)

Indexing External URL

Paramètres:
string URL, http://....
integer Page id to relate indexing to.
array Rootline array to relate indexing to
integer Configuration UID
integer Set ID value
Renvoie:
array URLs found on this page

tx_indexedsearch_crawler::indexSingleRecord ( r,
cfgRec,
rl = NULL 
)

Indexing Single Record

Paramètres:
array Record to index
array Configuration Record
array Rootline array to relate indexing to
Renvoie:
void

tx_indexedsearch_crawler::loadIndexerClass (  ) 

Include indexer class.

Renvoie:
void

tx_indexedsearch_crawler::getUidRootLineForClosestTemplate ( id  ) 

Get rootline for closest TypoScript template root. Algorithm same as used in Web > Template, Object browser

Paramètres:
integer The page id to traverse rootline back from
Renvoie:
array Array where the root lines uid values are found.

tx_indexedsearch_crawler::generateNextIndexingTime ( cfgRec  ) 

Generate the unix time stamp for next visit.

Paramètres:
array Index configuration record
Renvoie:
integer The next time stamp

tx_indexedsearch_crawler::checkDeniedSuburls ( url,
url_deny 
)

Checks if $url has any of the URls in the $url_deny "list" in it and if so, returns true.

Paramètres:
string URL to test
string String where URLs are separated by line-breaks; If any of these strings is the first part of $url, the function returns TRUE (to indicate denial of decend)
Renvoie:
boolean TRUE if there is a matching URL (hence, do not index!)

tx_indexedsearch_crawler::addQueueEntryForHook ( cfgRec,
title 
)

Adding entry in queue for Hook

Paramètres:
array Configuration record
string Title/URL
Renvoie:
void

tx_indexedsearch_crawler::deleteFromIndex ( id  ) 

Deletes all data stored by indexed search for a given page

Paramètres:
integer Uid of the page to delete all pHash
Renvoie:
void

tx_indexedsearch_crawler::processCmdmap_preProcess ( command,
table,
id,
value,
&$  pObj 
)

TCEmain hook function for on-the-fly indexing of database records

Paramètres:
string TCEmain command
string Table name
string Record ID. If new record its a string pointing to index inside t3lib_tcemain::substNEWwithIDs
mixed Target value (ignored)
object Reference to tcemain calling object
Renvoie:
void

tx_indexedsearch_crawler::processDatamap_afterDatabaseOperations ( status,
table,
id,
fieldArray,
&$  pObj 
)

TCEmain hook function for on-the-fly indexing of database records

Paramètres:
string Status "new" or "update"
string Table name
string Record ID. If new record its a string pointing to index inside t3lib_tcemain::substNEWwithIDs
array Field array of updated fields in the operation
object Reference to tcemain calling object
Renvoie:
void


Documentation des données membres

tx_indexedsearch_crawler::$secondsPerExternalUrl = 3

tx_indexedsearch_crawler::$instanceCounter = 0

tx_indexedsearch_crawler::$callBack = 'EXT:indexed_search/class.crawler.php:&tx_indexedsearch_crawler'


La documentation de cette classe a été générée à partir du fichier suivant :
Généré le Fri Aug 31 11:43:49 2007 pour OBLADY - Typo3 API v4.1.2 par  doxygen 1.5.3