Search2 Administrator Guide
From KnowledgeTree Document Management Made Simple
Configuration
[ search ]
; The number of results per page
; defaults to 25
resultsPerPage = default
; The date format used when making queries using widgets
; defaults to Y-m-d .... NOTE Future development
dateFormat = default
[ indexer ]
; The core indexing class
coreClass=PHPLuceneIndexer
; The number of documents to be indexed in a cron session
; defaults to 20
batchDocuments = default
; The location of the lucene indexes
luceneDirectory=${varDirectory}/indexes
; The url for the Java Lucene Server. This should match up the the Lucene Server configuration.
; Defaults to http://localhost:8875
javaLuceneURL = default
Setting up the Lucene Directory
If using the Java Lucene Server, simply start the server. Ensure that it is configured correctly.
Some more information is available in ktroot/bin/luceneserver/README.TXT
Edit the config.ini and ensure that the 'javaLuceneURL' field is correct.
If using the PHP Lucene Server, you need to run the search2/indexing/bin/recreateIndex.php.
Registering new extractors
If a new extractor has been added to the search2/indexing/extractors folder, the search2/indexing/bin/registerTypes.php script must be run to associate them with the correct mime types. Note that old associations will not be overwritten.
Migration
Migrating to the new server requires that the content of the full text tables are extracted and inserted into the Lucene indexes.
This is done using the search2/indexing/bin/migrate.php script.
(this feature can be heavy - care should be taken when implementing)
Search Results Ranking
Review the 'search_ranking' table to find the weightings associated with matching subexpressions. These may be modified to improve the relevance of search results according to your needs.
Status
TODO: The lucene indexers should provide some statistics on the lucene index. It should provide some general information on the index, but a diagnostics function should be available to ensure that the correct version of the documents are indexed and possibly reschedule indexing if there is a mismatch for some reason.
(this feature could be heavy on the system - care should be taken when implementing)
Background Tasks
search2/indexing/bin/cronIndexer.php - task to batch index files.
search2/indexing/bin/optimise.php - task to optimise the lucene index.
The indexing script should be run frequently - say every 5 minutes. The config.ini allows for the number of documents to be indexed to be configured. This defaults to 20. If the frequency is shortened, you may want to decrease the number of documents that will be indexed so that there is no serious load that can impact on the performance of the system.
The lucene index requires optimisation to ensure that performance is optimal. This could be run once a day around midnight, or weekly depending on frequency of updates to the index.
HOWTO - how to run a php script from the command line
php -Cq script.php
del.icio.us
reddit

