|
|
3.9. External parsersDataparkSearch indexer can use external parsers to index various file types (mime types). Parser is an executable program which converts one of the mime types to text/plain or text/html. For example, if you have postscript files, you can use ps2ascii parser (filter), which reads postscript file from stdin and produces ascii to stdout. 3.9.1. Supported parser typesIndexer supports four types of parsers that can:
3.9.2. Setting up parsers
3.9.3. Avoid indexer hang on parser executionTo avoid a indexer hang on parser execution, you may specify the amount of time in seconds for parser execution in your indexer.conf by ParserTimeOut command. For example: ParserTimeOut 600 Default value is 300 seconds, i.e. 5 minutes. 3.9.4. Pipes in parser's command lineYou can use pipes in parser's command line. For example, these lines will be useful to index gzipped man pages from local disk: AddType application/x-gzipped-man *.1.gz *.2.gz *.3.gz *.4.gz Mime application/x-gzipped-man text/plain "zcat | deroff" 3.9.5. Charsets and parsersSome parsers can produce output in other charset than given in LocalCharset command. Specify charset to make indexer convert parser's output to proper one. For example, if your catdoc is configured to produce output in windows-1251 charset but LocalCharset is koi8-r, use this command for parsing MS Word documents: Mime application/msword "text/plain; charset=windows-1251" "catdoc -a $1" 3.9.6. DPS_URL environment variableWhen executing a parser indexer creates DPS_URL environment variable with an URL being processed as a value. You can use this variable in parser scripts. 3.9.7. Some third-party parsers
|