Fonctions membres publiques | |
| crawler_init (&$pObj) | |
| crawler_execute ($params, &$pObj) | |
| crawler_execute_type1 ($cfgRec, &$session_data, $params, &$pObj) | |
| crawler_execute_type2 ($cfgRec, &$session_data, $params, &$pObj) | |
| crawler_execute_type3 ($cfgRec, &$session_data, $params, &$pObj) | |
| crawler_execute_type4 ($cfgRec, &$session_data, $params, &$pObj) | |
| cleanUpOldRunningConfigurations () | |
| checkUrl ($url, $urlLog, $baseUrl) | |
| indexExtUrl ($url, $pageId, $rl, $cfgUid, $setId) | |
| indexSingleRecord ($r, $cfgRec, $rl=NULL) | |
| loadIndexerClass () | |
| getUidRootLineForClosestTemplate ($id) | |
| generateNextIndexingTime ($cfgRec) | |
| checkDeniedSuburls ($url, $url_deny) | |
| addQueueEntryForHook ($cfgRec, $title) | |
| deleteFromIndex ($id) | |
| processCmdmap_preProcess ($command, $table, $id, $value, &$pObj) | |
| processDatamap_afterDatabaseOperations ($status, $table, $id, $fieldArray, &$pObj) | |
Attributs publics | |
| $secondsPerExternalUrl = 3 | |
| $instanceCounter = 0 | |
| $callBack = 'EXT:indexed_search/class.crawler.php:&tx_indexedsearch_crawler' | |
| tx_indexedsearch_crawler::crawler_init | ( | &$ | pObj | ) |
Initialization of crawler hook. This function is asked for each instance of the crawler and we must check if something is timed to happen and if so put entry(s) in the crawlers log to start processing. In reality we select indexing configurations and evaluate if any of them needs to run.
| object | Parent object (tx_crawler lib) |
| tx_indexedsearch_crawler::crawler_execute | ( | $ | params, | |
| &$ | pObj | |||
| ) |
Call back function for execution of a log element
| array | Params from log element. Must contain $params['indexConfigUid'] | |
| object | Parent object (tx_crawler lib) |
| tx_indexedsearch_crawler::crawler_execute_type1 | ( | $ | cfgRec, | |
| &$ | session_data, | |||
| $ | params, | |||
| &$ | pObj | |||
| ) |
Indexing records from a table
| array | Indexing Configuration Record | |
| array | Session data for the indexing session spread over multiple instances of the script. Passed by reference so changes hereto will be saved for the next call! | |
| array | Parameters from the log queue. | |
| object | Parent object (from "crawler" extension!) |
| tx_indexedsearch_crawler::crawler_execute_type2 | ( | $ | cfgRec, | |
| &$ | session_data, | |||
| $ | params, | |||
| &$ | pObj | |||
| ) |
Indexing files from fileadmin
| array | Indexing Configuration Record | |
| array | Session data for the indexing session spread over multiple instances of the script. Passed by reference so changes hereto will be saved for the next call! | |
| array | Parameters from the log queue. | |
| object | Parent object (from "crawler" extension!) |
| tx_indexedsearch_crawler::crawler_execute_type3 | ( | $ | cfgRec, | |
| &$ | session_data, | |||
| $ | params, | |||
| &$ | pObj | |||
| ) |
Indexing External URLs
| array | Indexing Configuration Record | |
| array | Session data for the indexing session spread over multiple instances of the script. Passed by reference so changes hereto will be saved for the next call! | |
| array | Parameters from the log queue. | |
| object | Parent object (from "crawler" extension!) |
| tx_indexedsearch_crawler::crawler_execute_type4 | ( | $ | cfgRec, | |
| &$ | session_data, | |||
| $ | params, | |||
| &$ | pObj | |||
| ) |
Page tree indexing type
| array | Indexing Configuration Record | |
| array | Session data for the indexing session spread over multiple instances of the script. Passed by reference so changes hereto will be saved for the next call! | |
| array | Parameters from the log queue. | |
| object | Parent object (from "crawler" extension!) |
| tx_indexedsearch_crawler::cleanUpOldRunningConfigurations | ( | ) |
Look up all old index configurations which are finished and needs to be reset and done
| tx_indexedsearch_crawler::checkUrl | ( | $ | url, | |
| $ | urlLog, | |||
| $ | baseUrl | |||
| ) |
Check if an input URL are allowed to be indexed. Depends on whether it is already present in the url log.
| string | URL string to check | |
| array | Array of already indexed URLs (input url is looked up here and must not exist already) | |
| string | Base URL of the indexing process (input URL must be "inside" the base URL!) |
| tx_indexedsearch_crawler::indexExtUrl | ( | $ | url, | |
| $ | pageId, | |||
| $ | rl, | |||
| $ | cfgUid, | |||
| $ | setId | |||
| ) |
Indexing External URL
| string | URL, http://.... | |
| integer | Page id to relate indexing to. | |
| array | Rootline array to relate indexing to | |
| integer | Configuration UID | |
| integer | Set ID value |
| tx_indexedsearch_crawler::indexSingleRecord | ( | $ | r, | |
| $ | cfgRec, | |||
| $ | rl = NULL | |||
| ) |
Indexing Single Record
| array | Record to index | |
| array | Configuration Record | |
| array | Rootline array to relate indexing to |
| tx_indexedsearch_crawler::loadIndexerClass | ( | ) |
Include indexer class.
| tx_indexedsearch_crawler::getUidRootLineForClosestTemplate | ( | $ | id | ) |
Get rootline for closest TypoScript template root. Algorithm same as used in Web > Template, Object browser
| integer | The page id to traverse rootline back from |
| tx_indexedsearch_crawler::generateNextIndexingTime | ( | $ | cfgRec | ) |
Generate the unix time stamp for next visit.
| array | Index configuration record |
| tx_indexedsearch_crawler::checkDeniedSuburls | ( | $ | url, | |
| $ | url_deny | |||
| ) |
Checks if $url has any of the URls in the $url_deny "list" in it and if so, returns true.
| string | URL to test | |
| string | String where URLs are separated by line-breaks; If any of these strings is the first part of $url, the function returns TRUE (to indicate denial of decend) |
| tx_indexedsearch_crawler::addQueueEntryForHook | ( | $ | cfgRec, | |
| $ | title | |||
| ) |
Adding entry in queue for Hook
| array | Configuration record | |
| string | Title/URL |
| tx_indexedsearch_crawler::deleteFromIndex | ( | $ | id | ) |
Deletes all data stored by indexed search for a given page
| integer | Uid of the page to delete all pHash |
| tx_indexedsearch_crawler::processCmdmap_preProcess | ( | $ | command, | |
| $ | table, | |||
| $ | id, | |||
| $ | value, | |||
| &$ | pObj | |||
| ) |
TCEmain hook function for on-the-fly indexing of database records
| string | TCEmain command | |
| string | Table name | |
| string | Record ID. If new record its a string pointing to index inside t3lib_tcemain::substNEWwithIDs | |
| mixed | Target value (ignored) | |
| object | Reference to tcemain calling object |
| tx_indexedsearch_crawler::processDatamap_afterDatabaseOperations | ( | $ | status, | |
| $ | table, | |||
| $ | id, | |||
| $ | fieldArray, | |||
| &$ | pObj | |||
| ) |
TCEmain hook function for on-the-fly indexing of database records
| string | Status "new" or "update" | |
| string | Table name | |
| string | Record ID. If new record its a string pointing to index inside t3lib_tcemain::substNEWwithIDs | |
| array | Field array of updated fields in the operation | |
| object | Reference to tcemain calling object |
| tx_indexedsearch_crawler::$secondsPerExternalUrl = 3 |
| tx_indexedsearch_crawler::$instanceCounter = 0 |
| tx_indexedsearch_crawler::$callBack = 'EXT:indexed_search/class.crawler.php:&tx_indexedsearch_crawler' |
1.5.3