DataparkSearch versions of 2003-2004


Latest versions.
17 Dec 2004: 4.27, 3,475,218 bytes, 23.03.2007, 03:09 MSK
Compilation problem with latest ChaSen version has been fixed.
Values for cache mode limit on Content-Language is now computes on first 2 bytes of language code.
Values for StoredFiles, URLDataFiles and WrdFiles can now be changed. Simple specify values for OldStoredFiles, OldURLDataFiles and OldWrdFiles and run "indexer -T" on PC where cached or stored database is located. You should remove OldStoredFiles, OldURLDataFiles and OldWrdFiles commands after conversion.
Support for MP3 ID3v2.0 and ID3v2.4 tags has been added. Support for the MP3 ID3v2.3 tags has been improved.
Units were changed from seconds to milliseconds for pause between indexing documents (-p switch for indexer).
<!--noindex> and <!--/noindex--> tags can now be used to exclude the text between from indexing for compatibility with ASPSeek.
Support for the UTF-16LE and UTF-16BE encodings has been added. Language maps format has been changed. You need to replace used maps from distribution or recreate your own maps with new version of dpguesser.
Excerpts construction time is now included into search time displayed.
Occasional hang on queries with no result has been fixed.
A possible memory corruption has been fixed in excerpt construction.
Indexer is now send Accept request header according to MIME parsers configured.
A possible memory corruption in mod_dpsearch has been fixed.
Apache version detection has been improved.
Quasi-ispell support for Japanese language has been added. You need to download the quasi-ispell data dpsearch-spell-ja.tgz from our site or from one of our mirrors.
Some speed improvements has been made.
Several bugs were fixed.
05 Nov 2004: 4.26, 3,467,280 bytes, 23.03.2007, 03:04 MSK
Canonical charset names were adjusted according to the IANA preferred names.
The HrefSection command has been added. Use it to extract links from any document section.
Recoding for SGML entities in URL has been fixed.
Arabic, Hebrew, Icelandic, Japanese, Latvian, Romanian and Thai stopword lists were added.
The MaxDocsPerServer command has been added. No more than given number of pages will be indexed from one Server during this run of indexer.
TagIf and CategoryIf commands has been added. Use them to assign tag or category according pattern match on an document section.
IndexIf and NoIndexIf commands has been added. Use these command to allow/disallow indexing by pattern match on an document section.
The value for a section can now extract from document content using regex-like pattern.
The Bind command has been added. Use it to specify local IP address.
Several bugs were fixed.
13 Oct 2004: 4.25, 3,446,203 bytes, 23.03.2007, 02:59 MSK
Recoding from the Unicode to the EUC-JP, Big5, EUC-KR, GB2312, GBK, Gujarati, SJIS has been fixed.
Due to conflict with other programs, mconv and mguesser utilities has been renamed to dpconv and dpguesser respectively.
Support has been added for the cp866u and koi-7 codepages.
Ability to sort search results by sum of relevancy and Popularity Rank has been added. Use 'A' or 'a' character in search pattern to sort in decreasing and increasing order respectively.
The processing of SGML character entities has been fixed.
-l switch for run-splitter has been added. Use it to flush cached buffers only.
The HoldCache command has been added. Use it to specify time period to hold search cache files.
Several bugs were fixed.
14 Sep 2004: 4.24, 3,412,580 bytes, 23.03.2007, 02:55 MSK
The PreloadLimit command was added. Use it to preload cache mode limits for most frequently used limit's values.
For PostgreSQL connections can now specify a Unix socket as parameter in DBAddr command.
The dpstoredoc handler was added for mod_dpsearch with fuctionality of storedoc.cgi.
The Spanish stopword list was enhanced.
Support was added for IBM cp037, cp1026, cp500, cp875, cp1133 and Iranian ISIRI3342 codepages.
Cache mode bases are now compressed if zlib support is enabled. To upgrade from previous version, please, do the follow:
  1. stop all dataparksearch's daemons.
  2. backup your data. if conversation process will fail or aborted, you'll need restore data to complete later all at once.
  3. compile and install new version.
  4. on PC where cache mode data is located, remove cached and stored parameters from DBAddr in indexer.conf.
  5. on PC where cache mode data is located, run "indexer -O" (don't run stored and cached)
  6. restore your original DBAddr command in indexer.conf.
zlib support is now enabled by default.
Fast relevancy calculation was revesited.
The English synonyms list was enhanced.
Several bugs were fixed.
14 Aug 2004: 4.23, 3,395,623 bytes, 23.03.2007, 02:51 MSK
The TrackHops command was added. Use it to enable hops tracking in reindexing.
There are some improvements to speed-up searches.
The Italian synonyms list was added.
Fast relevancy calculation has been added and is enabled by default. Use --enable-fullrel option for confugure to enable full relevancy calculation.
The LINKS table structure was changed with the addition of the valid field.
The SkipUnreferred command was added. Use it to skip reindexing for unreferred documents.
A -b switch for splitter and run-splitter was added. Use it to force a base cheking/optimizing before cache update.
Several bugs were fixed.
20 Jul 2004: 4.22, 3,223,922 bytes, 23.03.2007, 02:46 MSK
The PeriodByHops command was added. Use it to specify reindexing period per hops basis.
A postponed query tracking for searchd was added. This feature require System V message queue support.
SSLv2_client_method() was changed to SSLv23_client_method() for better compatibility.
The splitter can now accept an alternative configfile name as a command line argument.
The processing of -w switch for stored was fixed.
A support for Windows cp950 and Big5-hkscs codepages was added.
The IndexDocSizeLimit command was added. Use it to limit the amount of data stored in index per document.
The PopRankNeoIterations command was added. It allow specify the number of iterations for the Neo PopRank calculation.
Several bugs (#148, #149) were fixed.
15 Jun 2004: 4.21, 3,145,820 bytes, 23.03.2007, 02:42 MSK
Doc directory layout was slightly changed according FreeBSD tree.
The set of SGML character entities was extended.
CacheLogWords and CacheLogDels commands were added to adjust size of shared memory buffers for cache mode.
Excerpt construction was fixed.
A new switch -H was added for indexer to send command to flush all cached buffers.
Several memory leaks were fixed.
Several bugs (#102, #106, #107, #108, #109, #110, #147) were fixed.
19 May 2004: 4.20, 3,128,338 bytes, 23.03.2007, 02:37 MSK
Support for Internationalized Domain Names was added. Use --enable-idn option for configure to enable. You need the GNU libidn to be installed on your system. The URL table structure was changed with the addition of the charset_id field.
A Korean language phrases segmenter was added. Use LoadKoreanList command to enable.
Korean language maps for EUC-KR charset were added.
Base hashing was changed, so you need to run cached and stored databases checkup with OptimizeRatio equal to 0 after upgrading.
Cached and stored checkup was split into stages, use -Z option for indexer to optimize; -ZZ to optimize and checkup; -ZZZ to optimize, checkup and urls verify for cached database; -Y to optimize; -YY to optimize and checkup stored database.
Polish language maps for cp1250 and cp852 were added.
Support for the Apache2 web server was added for mod_dpsearch.
The checkup for cached databases was made faster.
A possible memory corruption was fixed for SQL-servers without subselect.
Compilation errors on Solaris 9 were fixed.
16 Apr 2004: 4.19, 3,072,520 bytes, 23.03.2007, 02:33 MSK
mod_dpsearch was added for the Apache web server. Use --enable-apache-module switch for configure to enable.
A bug in Unicode canonical decomposition was fixed.
A URLDumpCacheSize command was added. Use it to specify the number of urls selected at once to write cache mode indexes, or to preload url data, or to calculate the Popularity Rank. Default value is 100000.
The Neo PopRank is now calculated during indexing/reindexing.
Synonyms and Stopwords reduce to the Unicode normal form C when loading.
An error in Neo PopRank calculation was fixed.
A ResultContentType command was added. Use it to specify Content-Type header for search results page.
By default, every indexer's thread is make a separate connection to database. Use -U option for indexer to make one shared connection to database for all threads.
A possible indexer hang was fixed for a large amount indexing threads without cached or stored usage.
Several Bugs (#10, #15, #16, #19, #20, #22, #23, #24, #25, #27) were fixed.
15 Mar 2004: 4.18, 3,047,515 bytes, 23.03.2007, 02:20 MSK
Redundant documents display was fixed in results for two or more stopwords inside quotes.
Quotes detection for several charsets as LocalCharset was fixed.
A New method for the Popularity Rank calculation was added. Use PopRankMethod command to select desired method.
Top100 and Top1000 stopwords lists were added for English, French, German and Dutch languages.
Large synonyms list was added for Russina. A synonyms list was added for French.
The Russian stopwords list was updated.
The clones display was fixed.
An apostrophe can now can be part of a word, i.e. words like "men's" are considered as one unique word.
Search term highlighting for LocalCharset UTF-8 was fixed.
The cached database cheking loop was fixed.
Compilation errors were fixed on systems with variable number of arguments for the gethostbyname_r function.
21 Feb 2004: 4.17, 2,919,902 bytes, 23.03.2007, 02:16 MSK
Possible indexer hang with many connections to cached on fast PC was fixed.
Possible memory corruptions while indexing using ftp:// scheme were fixed.
Unicode support extended. Unicode Letter, Mark, Number and Symbol classes are considered now as word's characters. All indexed words reduces now to Unicode normal form C before storing in database or searching. Accent insensitive search added. Use "AccentExtensions yes" command to enable.
Unicode data was updated to 4.0.1 version.
url.since field was added to track DeleteOlder for pages when no Last-Modified header is present in server response. This field hold the time when pages were added into database.
Common large files support option for configure was added.
Now url data can be preloaded by searchd to speed-up searches. Use "PreloadURLData yes" command in your searchd.conf to enable. This worth about 20 bytes of memory per url.
Default value for URLSelectCacheSize parameter was increased to 1024.
Empty results for double entered query words was fixed.
16 Jan 2004: 4.16, 2,875,804 bytes, 23.03.2007, 02:12 MSK
Compilation flags were added to build using LFS API on 32-bit Linux systems (for support files larger 2GB).
Now by default indexer in cache mode do not send to cached command to write url data and limits at exit. Use indexer -W switch to send this command if you need. Or send HUP signal to cached to do the same.
New URLs is now checks against robots.txt before storing in database.
Search can now order results by importance (i.e. by multiplication of relevancy and popularity).
Documents size added for databases statistics. Use -SS switch for indexer to display.
MinDocSize command was added. Use it to checkonly documents with size less than specified.
image/gif mime-type internal parser was added. Only the comment and the plain text extensions is taken for index.
More accurate excerpts construction.
Lost records in cache mode due using "indexer -C" by category or by url were fixed.
One now can increase and decrease cached, stored and searchd log level using SIGUSR1 and SIGUSR2 signals.
-p switch for splitter to setup pause in seconds after each log buffer update was added.
-v switch for splitter to setup log level was added.
CollectLinks command was added. Use "CollectLinks yes" to enable links information collection. By default links collection is disabled (note: this was enabled by default in previous versions).
Language varying was switched off for documents with erroneous status (400 or above).
Cache mode bugs from mnoGoSearch 3.2.16 CVS were fixed.
27 Nov 2003: Datapark Search Engine 4.16 started from current mnoGoSearch CVS version.
mnoGoSearch 3.2.16 CVS ChangeLog till splitting
Traditional chinese frequency dictionary added.
LoadChineseList and LoadThaiList command's syntax modified.
libparanoia-like checking added. Use --with-paranoia switch for configure to enable.
Date range calculation fixed for cache mode time limits.
Cache mode modified. Use "indexer -O" to convert to new base format, if upgrade.
<!IFLIKE, <!ELIKE, <!ELSELIKE conditional operators for search template were added.
Stored database may be used without stored daemon. Use "DoStore yes" command to enable.
Ability to specify srvinfo table name as parameter in ServerTable command was added.
stored database modified. You need delete all data and reindex all, if upgrade.
robots.txt processing was fixed.
MimerSQL support via UnixODBC was added.
Several bugs (#442, #445, #448, #449, #453, #454, #458, #461, #479, #480, #481) were fixed.


Xiti