|
DataparkSearch Engine is a full-featured open sources web-based search
engine released under the GNU General Public License and designed to organize search
within a website, group of websites, intranet or local system.
Key features
- Support for http, https, ftp,
nntp and news URL schemes.
- htdb virtual URL scheme for indexing SQL databases.
- text/html, text/xml, text/plain,
audio/mpeg (MP3) and image/gif mime types built-in support.
- External parsers support for other document types.
- Ability to index multilingual sites using content negotiation.
- Searching all of the word forms using ispell affixes and dictionaries.
- Fuzzy searching based on acronyms and abbreviations.
- Stop-words, synonyms and acronyms lists.
- Boolean query language support.
- Popularity Rank based on neural network model.
- Results sorting by relevance, popularity rank, last
modified time and by importance (a multiplication of
relevance and popularity rank).
- Various character sets support.
- Accent insensitive search.
- Phrase segmenting for Chinese, Japanese, Korean and Thai languages.
- mod_dpsearch - search module for Apache web server.
- Internationalized Domain Names support.
- The Summary Extraction Algorithm.
Documentation
DataparkSearch documentation is enclosed in release or
snapshot distribution in doc subdirectory. And it's also available on-line in
English (PDF, 1,417,401 bytes)
and in Russian.
You can use our
forum to ask about DataparkSearch. Or you may subscribe to DataparkSearch group at Yahoo! Groups:
tech.groups.yahoo.com/group/dataparksearch/.
And you can share your DataparkSearch experience with all the World at our wiki.
Previous DataparkSearch versions and ChangeLog,
(As a RSS feed).
DataparkSearch Engine PAD file.
DataparkSearch's TODO.
Download
Latest DataparkSearch version released: dpsearch-4.49.tar.gz,
2,493,884 bytes, 13.02.2008, 13:21 MSK
You may try latest snapshot: dpsearch-4.50-20042008.tar.gz,
2,517,043 bytes, 21.04.2008, 01:58 MSK
dpsearch-spell-ja.tgz,
68,705 bytes, 09.11.2004, 01:06 MSK -
Quasi-ispell data for Japanese. THIS IS NOT VALID ISPELL DATA.
Can be used only with DataparkSearch 4.27 or later version. All data are in EUC-JP charset.
Additional Data
- Frequency dictionaries
- Traditional Chinese,
730,641 bytes, 05.03.2005, 03:07 MSK
- Mandarin,
394,634 bytes, 05.03.2005, 03:07 MSK
- Korean,
30,624 bytes, 05.03.2005, 03:07 MSK
- Korean, EUC-KR charset, 246,038 bytes
- Thai,
126,572 bytes, 05.03.2005, 03:07 MSK
- Synonym lists
- English,
774,663 bytes, 05.03.2005, 03:07 MSK
- German,
131,880 bytes, 05.03.2005, 03:07 MSK
- Italian,
166,684 bytes, 05.03.2005, 03:07 MSK
- Polish,
92,158 bytes, 05.03.2005, 03:07 MSK
- Russian,
73,968 bytes, 06.03.2005, 21:09 MSK
- Acronym and abbreviation lists
- English biomedical acronyms and abbreviations,
7,084 bytes, 30.07.2005, 01:42 MSK
- Other code
Mirrors
Bugs
You may see all open or new bug reports or post your bug
reports in our bug system.
Sample sites
- 43°N 39°E (PgSQL, cache mode, searchd is used.
SQL-server: PIII 670MHz, 512M RAM, IDE SATA 120Gb. Searching PC: Celeron 2.25GHz, 1G RAM, IDE UDMA100.
1'157'668 pages, 1'273'860 sites, 27.2 Gbytes indexed.
).
Test search in Chinese,
Test search in Japanese,
Test search in Korean,
Test search in Thai.
- All Sochi's Internet (as subsection of 43°N 39°E)
- News Lookup Service (MySQL, cache mode, searchd not used)
- DataparkSearch Engine usage location map
Donate
If you use DataparkSearch and found it useful,
or want to encourage further development, feel free to make a donation
(at Kagi) to support this project. Any amount is greatfully appreciated.
|
DataparkSearch's Awards
To leave a donation via MasterCard, VISA, American Express, JCB, check, money order, or wire transfer please click the button below:
Donate with Kagi
|