DataparkSearch Engine

DataparkSearch Engine is a full-featured open sources web-based search engine released under the GNU General Public License and designed to organize search within a website, group of websites, intranet or local system.

Key features

  • Support for http, https, ftp, nntp and news URL schemes.
  • htdb virtual URL scheme for indexing SQL databases.
  • text/html, text/xml, text/plain, audio/mpeg (MP3) and image/gif mime types built-in support.
  • External parsers support for other document types.
  • Ability to index multilingual sites using content negotiation.
  • Searching all of the word forms using ispell affixes and dictionaries.
  • Fuzzy searching based on acronyms and abbreviations.
  • Stop-words, synonyms and acronyms lists.
  • Boolean query language support.
  • Popularity Rank based on neural network model.
  • Results sorting by relevance, popularity rank, last modified time and by importance (a multiplication of relevance and popularity rank).
  • Various character sets support.
  • Accent insensitive search.
  • Phrase segmenting for Chinese, Japanese, Korean and Thai languages.
  • mod_dpsearch - search module for Apache web server.
  • Internationalized Domain Names support.
  • The Summary Extraction Algorithm.

Documentation

DataparkSearch documentation is enclosed in release or snapshot distribution in doc subdirectory. And it's also available on-line in English (PDF, 1,417,401 bytes) and in Russian.

You can use our forum to ask about DataparkSearch. Or you may subscribe to DataparkSearch group at Yahoo! Groups: tech.groups.yahoo.com/group/dataparksearch/. And you can share your DataparkSearch experience with all the World at our wiki.

Previous DataparkSearch versions and ChangeLog, (As a RSS feed).

DataparkSearch Engine PAD file.

DataparkSearch's TODO.

Download

Latest DataparkSearch version released: dpsearch-4.49.tar.gz, 2,493,884 bytes, 13.02.2008, 13:21 MSK

You may try latest snapshot: dpsearch-4.50-20042008.tar.gz, 2,517,043 bytes, 21.04.2008, 01:58 MSK

dpsearch-spell-ja.tgz, 68,705 bytes, 09.11.2004, 01:06 MSK - Quasi-ispell data for Japanese. THIS IS NOT VALID ISPELL DATA. Can be used only with DataparkSearch 4.27 or later version. All data are in EUC-JP charset.

Additional Data

Frequency dictionaries
Traditional Chinese, 730,641 bytes, 05.03.2005, 03:07 MSK
Mandarin, 394,634 bytes, 05.03.2005, 03:07 MSK
Korean, 30,624 bytes, 05.03.2005, 03:07 MSK
Korean, EUC-KR charset, 246,038 bytes
Thai, 126,572 bytes, 05.03.2005, 03:07 MSK
Synonym lists
English, 774,663 bytes, 05.03.2005, 03:07 MSK
German, 131,880 bytes, 05.03.2005, 03:07 MSK
Italian, 166,684 bytes, 05.03.2005, 03:07 MSK
Polish, 92,158 bytes, 05.03.2005, 03:07 MSK
Russian, 73,968 bytes, 06.03.2005, 21:09 MSK
Acronym and abbreviation lists
English biomedical acronyms and abbreviations, 7,084 bytes, 30.07.2005, 01:42 MSK
Other code

Mirrors

Bugs

You may see all open or new bug reports or post your bug reports in our bug system.

Sample sites

Donate

If you use DataparkSearch and found it useful, or want to encourage further development, feel free to make a donation (at Kagi) to support this project. Any amount is greatfully appreciated.





DataparkSearch's Awards

To leave a donation via MasterCard, VISA, American Express, JCB, check, money order, or wire transfer please click the button below:
Kagi donate

Donate with Kagi






Geo Visitors Map