|
|
5.2. Cache mode storage5.2.1. Introductioncache words storage mode is able to index and search quickly through several millions of documents. 5.2.2. Cache mode word indexes structureThe main idea of cache storage mode is that word index and URLs sorting information is stored on disk rather than in SQL database. Full URL information however is kept in SQL database (tables url and urlinfo). Word index is divided into number of files specified by WrdFiles command (default value is 0x300). URLs sorting information is divided into number of files specified by URLDataFiles command (default value is 0x300).
Word index is located in files under /var/tree directory of DataparkSearch installation. URLs sorting information is located in files under /var/url directory of DataparkSearch installation. 5.2.3. Cache mode toolsThere are two additional programs cached and splitter used in cache mode indexing. cached is a TCP daemon which collects word information from indexers and stores it on your hard disk. It can operate in two modes, as old cachelogd daemon to logs data only, and in new mode, when cachelogd and splitter functionality are combined. splitter is a program to create fast word indexes using data collected by cached. Those indexes are used later in search process. 5.2.4. Starting cache modeTo start "cache mode" follow these steps:
5.2.5. Optional usage of several splitterssplitter has two command line arguments: -f [first file] -t [second file] which allows limiting used files range. If no parameters are specified splitter distributes all prepared files. You can limit files range using -f and -t keys specifying parameters in HEX notation. For example, splitter -f 000 -t A00 will create word indexes using files in the range from 000 to A00. These keys allow using several splitters at the same time. It usually gives more quick indexes building. For example, this shell script starts four splitters in background: #!/bin/sh splitter -f 000 -t 3f0 & splitter -f 400 -t 7f0 & splitter -f 800 -t bf0 & splitter -f c00 -t ff0 & 5.2.6. Using run-splitter scriptThere is a run-splitter script in /sbin directory of DataparkSearch installation. It helps to execute subsequently all three indexes building steps. "run-splitter" has these two command line parameters: run-splitter --hup --split or a short version: run-splitter -k -s Each parameter activates corresponding indexes building step. run-splitter executes all three steps of index building in proper order:
In most cases just run run-splitter script with all -k -s arguments. Separate usage of those three flags which correspond to three steps of indexes building is rarely required. run-splitter have optional parameters: -p=n and -v=m to specify pause in seconds after each log buffer update and verbose level respectively. n is seconds number (default value: 0), m is verbosity level (default value: 4). 5.2.7. Doing searchTo start using search.cgi in the "cache mode", edit as usually your search.htm template and add the "cache" as value of dbmode parameter of DBAddr command. 5.2.8. Using search limitsTo use search limits in cache mode, you should add appropriate Limit command(s) to your indexer.conf (or cached.conf, if cached is used) and to search.htm or searchd.conf (if searchd is used). To use, for example, search limit by tag, by category and by site, add follow lines to search.htm or to indexer.conf (searchd.conf, if searchd is used). Limit t:tag Limit c:category Limit site:siteid where t - name of CGI parameter (&t=) for this constraint, tag - type of constraint. Instead of tag/category/siteid in example above you can use any of values from table below: Table 5-1. Cache limit types
|
||||||||||||||||||||||