Open your preferred front-end in Web browser:
To find something just type words you want to
find and press SUBMIT button. For example: mysql
odbc. DataparkSearch will find all documents that
mysql and/or word
odbc. Best documents having bigger
weights will be displayed first.
To find a phrase, simple enclose it in quotas. For example: "uncontrollable sphere".
DataparkSearch front-ends support the following parameters given in CGI query string. You may use them in HTML form on search page.
Table 8-1. Available search parameters
|q||text parameter with search query|
|vq||text parameter with search query in the Verity Query Language (prefix variant), see Section 8.1.8>. To use this parameter, you need to leave empty the q parameter.|
|s||characters sequence, specify results sorting order. Small caps specify ascendant sorting, upper caps - descendant. Following characters can be used: R or r - for sorting by relevance, P or p - for sorting by PopularityRank, I or i - for sorting by Importance (multiplication of relevance and PopularityRank), A or a - for sorting by sum of relevance and PopularityRank, D or d - for sorting by last modified date. Default value: RP.|
|ps||page size, number of search results displayed on one page, 20 by default. Maximum page size is 100. This value does not allow passing very big page sizes to avoid server overload and might be changed with MAX_PS definition in search.c.|
|np||page number, starting by 0, 0 by default (first page)|
|p||page number starting by 1. Suitable for use with OpenSearch|
|m||search mode. Currently "all","any", "near" and "bool" values are supported.|
|wm||word match. You may use this parameter to choose word match type. There are "wrd", "beg", "end" and "sub" values that respectively mean whole word, word beginning, word ending and word substring match.|
|t||Tag limit. Limits search through only documents with given tag. This parameter has the same effect with -t indexer option|
|c||Category limit. Take a look into Section 6.2> for details.|
|ul||URL limit, URL
substring to limit search through subsection of database. It
supports SQL % and _ LIKE wildcards. This parameter has the same
effect with -u indexer option. If relative URL is specified
search.cgi inserts % signs before and after "ul"
value when compiled with SQL support. It allows to write URL substring
in HTML from to limit search, for example <OPTION
VALUE="/manual/"> instead of VALUE="%/manual/%". When full URL with
schema is specified search.cgi adds % sign only
after this value. For example for <OPTION
VALUE="http://localhost/"> search.cgi will
pass http://localhost/% in SQL LIKE
Not supported for cache storage mode.
|wf||Weight factors. It allows changing different document sections weights at a search time. Should be passed in the form of hex number. Check the explanation below.|
|g||Language limit. Language abbreviation to limit search results by url.lang field.|
|tmplt||Template filename (without path). To specify template file other standard search.htm.|
|type||Content-Type limit. Content-type to limit search results by url.content_type field. For cache mode storage this should be exact match. For SQL-modes it may be sql-like pattern.|
|sp||Words forms limit. =1, if you need search all forms (include spelling suggestions, if aspell support is enabled) for entered words. =0, if you need search only entered words. Default value is 1. You may switch it to 0 for faster search.|
|sy||Synonyms limit. =1, if you need add synonyms for entered words. =0, do not use synonyms. Default value is 1. You may switch it to 0 for faster search.|
|empty||Use limits to show results if no query words is entered (only for cache mode). =yes, to show results from limits, if no query words is entered (default). =no, do not show results from limits, if no query words is entered.|
|dt||Limit by time. Three types is supported.
If dt is set to back, that means you want to limit result to recent pages, and you should specify this recentness in variable dp.
If dt is set to er,
that means the search will be limited to pages newer or older than date given.
If dt is set to range,
that means search within given range of dates. Variables
All times in cache mode measured in a hour precision.
|dp||Limit by recentness, if dt value is back.
It should be specified in xxxA[yyyB[zzzC]] format. Spaces are allowed between xxx and A and yyy and so on).
xxx, yyy, zzz are numbers (can be negative!), A, B, C can be one of the following (the letters are the same as in
strptime/strftime functions): s - second, M - minute, h - hour, d - day, m - month, y - year. Examples:
4h30M - 2 hours and 30 minutes 1Y6m-15d - 1 year and six month minus 15 days 1h-60M+1s - 1 hour minus 60 minutes plus 1 second
|dx||is newer/older flag (1 means newer or after, -1 means older or before), if dt value is er.|
|dm||Month, if dt value is er. 0 - January, 1 - February, ... 11 - December.|
|dy||Year, if dt value is er. Four digits. For example, 1999 or 2001.|
|dd||Day, if dt value is er. 1...31.|
|db||Beginning date, if dt value is range. Each date is string in the form dd/mm/yyyy, there dd is day, mm is month and yyyy is four-digits year.|
|de||End date, if dt value is range. Each date is string in the form dd/mm/yyyy, there dd is day, mm is month and yyyy is four-digits year.|
It is possible to pass
wf HTML form variable
wf variable represents weight
factors for specific document parts. Currently
body, title, keywords, description, url parts, crosswords as well as user
defined META and HTTP headers are supported. Take a look into
"Section" part of indexer.conf-dist.
To be able use this feature it is recommended to set different sections IDs for different document parts in "Section" indexer.conf command. Currently up to 256 different sections are supported.
Imagine that we have these default sections in indexer.conf:
Section body 1 256 Section title 2 128 Section keywords 3 128 Section description 4 128
wf value is a string of hex digits ABCD. Each
digit is a factor for corresponding section weight. The most right
digit corresponds to section 1. For the given above sections
D is a factor for section 1 (body)
C is a factor for section 2 (title)
B is a factor for section 3 (keywords)
A is a factor for section 4 (description)
wf=0001 will search through body only.
wf=1110 will search through title,keywords,description but not
through the body.
wf=F421 will search through:
Description with factor 15 (F hex)
Keywords with factor 4
Title with factor 2
Body with factor 1
By default, if
wf variable is omitted in the
query, all section factors are 1, it means all sections have the same
weight. If the number of sections in
wf is less than the number of sections defined, then the rest sections are initialized by the value of highest section weight defined in
wf=01 will also search through body only.
If DataparkSearch has been built with fast relevance calculation (with
option for configure), in this case, only zero and non-zero values for
wf variable take an effect (this allows only include/exclude
specified sections in search results).
To use full support for dynamic section weights, you need specify
option for configure when configuring DataparkSearch.
When using a dynamic shtml page containing SSI that calls search.cgi, i.e. search.cgi is not called directly as a CGI program, it is necessary to override Apache's SCRIPT_NAME environment attribute so that all the links on search pages lead to the dynamic page and not to search.cgi.
For example, when a shtml page contains a line <--#include virtual="search.cgi">, SCRIPT_NAME variable will still point to search.cgi, but not to the shtml page.
To override SCRIPT_NAME variable we implemented a DPSEARCH_SELF variable that you may add to Apache's httpd.conf file. Thus search.cgi will check DPSEARCH_SELF variable first and then SCRIPT_NAME. Here is an example of using DPSEARCH_SELF environment variable with SetEnv/PassEnv Apache's httpd.conf command:
SetEnv DPSEARCH_SELF /path/to/search.cgi PassEnv DPSEARCH_SELF
It is often required to use several templates with the same search.cgi. There are actually several ways to do it. They are given here in the order how search.cgi detects template name.
search.cgi checks environment variable DPSEARCH_TEMPLATE. So you can put a path to desired search template into this variable.
search.cgi checks path info part of URL available in the PATH_INFO environment variable. E.g. http://localhost/cgi-bin/search.cgi/search1.html uses search1.htm as its template, and http://localhost/cgi-bin/search.cgi/search2.html uses search2.htm, and so on.
search.cgi also supports Apache internal redirect. It checks REDIRECT_STATUS and REDIRECT_URL environment variables. To activate this way of template usage you may add these lines in Apache srm.conf:
AddType text/html .zhtml AddHandler zhtml .zhtml Action zhtml /cgi-bin/search.cgi
Put search.cgi into your /cgi-bin/ directory. Then put HTML template into your site directory structure under any name with .zthml extension, for example template.zhtml. Now you may open search page: http://www.site.com/path/to/template.zhtml You may use any unused extension instead of .zthml of course.
If the above two ways fail, search.cgi opens a template which has the same name with the script being executed using SCRIPT_NAME environment variable. search.cgi will open a template ETC/search.htm, search1.cgi will open ETC/search1.htm and so on, where ETC is DataparkSearch /etc directory (usually /usr/local/dpsearch/etc). So, you can use the same search.cgi with different templates without having to recompile it. Just create one or several hard or symbolic links for search.cgi or copy it and put corresponding search templates into /etc directory of DataparkSearch installation.
Take a look also into Making multi-language search pages section
The operator allin<section>:, where <section> is the name of a section, defined in sections.conf file (or in any dpsearch's configuration file by Section command) with non-zero section number (see Section 3.10.43>), that operator allows to limit the search domain for a query word by the section specified.
This operator differ from limiting search domain using &wf= CGI-variable in a way, that such limit is imposing only on query words specified after this operator.
For example, if you have the following commands in sections.conf file
Section body 1 256 Section title 2 128 Section url 3 0 strictthen you can use the following operators in search query: allinbody:, allintitle: and allinurl:.
For the query computer allintitle: science it will be found the documents that contain the word "science" in the title and the word "computer" in any document section.
If you want more advanced results you may use query language. You should select "bool" search mode in the search from.
DataparkSearch understands the following boolean operators:
AND or & - logical AND. For example, "mysql & odbc" or "mysql AND odbc" - DataparkSearch will find any URLs that contain both "mysql" and "odbc".
NEAR - NEAR operator, identical to AND operator, but come true if both words are within 16 words of each other. For example, "mysql NEAR odbc" - DataparkSearch will find any URLs that contain both "mysql" and "odbc" within 16 words of each other.
ANYWORD or * - ANYWORD operator, identical to AND operator, but come true if both words have any one word between and left operand have lesser position than right operand. For example, "mysql * odbc" - DataparkSearch will find any URLs that contain both "mysql" and "odbc" within any word between, for example, any document with "mysql via odbc" phrase.
OR or | - logical OR. For example, "mysql | odbc" or "mysql OR odbc" - DataparkSearch will find any URLs that contain word "mysql" or word "odbc".
NOT or ~ - logical NOT. For example, "mysql & ~ odbc" or "mysql AND NOT odbc" - DataparkSearch will find URLs that contain word "mysql" and do not contain word "odbc" at the same time. Note that ~ just excludes given word from results. Query "~ odbc" will find nothing!
() - group command to compose more complex queries. For example "(mysql | msql) & ~ postgres". Query language is simple and powerful at the same time. Just consider query as usual boolean expression.
Only the prefix variant of the Verity Query Language is supported by DataparkSearch.
Also, only the following subset of VQL operators is supported by DataparkSearch:
Table 8-2. VQL operators supported by DataparkSearch
|<ACCRUE>||equal to OR operator in boolean mode.|
|<AND>||equal to AND operator in boolean mode.|
|<ANY>||equal to OR operator in boolean mode.|
|<NEAR>||equal to NEAR operator in boolean mode.|
|<NOT>||equal to NOT operator in boolean mode.|
|<OR>||equal to OR operator in boolean mode.|
|<PHRASE>||equal to a phrase in boolean mode.|
|<WORD>||is considered as an empty operator.|
Expired documents are still searchable with their old content.