Introduction

From Dpsearch

Jump to: navigation, search

DataparkSearch is a full-featured web search engine. It consists of two parts. The first part is an indexing mechanism (the indexer). The indexer walks over hypertext references and stores found words and new references into the database. The second part is a CGI front-end to provide the search service using the data collected by the indexer. Please see essay writers

DataparkSearch was cloned from the 3.2.16 CVS version of mnoGoSearch at 27 November 2003 as DataparkSearch 4.16. The mnoGoSearch's first release took place in November 1998. The search engine had the name of UDMSearch until October 2000 when the project was acquired by Lavtech.Com Corp. and changed its name to mnoGoSearch.

The latest change log of DataparkSearch can be found on our website.

Contents

DataparkSearch Features

Main DataparkSearch's features are as follows:

  • MySQL (libz library required), PostgreSQL, iODBC, unixODBC, EasySoft ODBC-ODBC bridge, InterBase, Oracle (see [dpsearch-oracle.en.html Section 5.5]), MS SQL back-ends support.
  • HTTP support.
  • HTTP proxy support.
  • HTTPS support.
  • FTP support.
  • NNTP support (both news:// and nntp:// URL schemes).
  • [dpsearch-extended-indexing.en.html#htdb HTDB virtual URL scheme] support. One may build index and search through the big text fields/blobs of SQL database.
  • [dpsearch-extended-indexing.en.html#mirror Mirroring features].
  • text/html, text/xml, text/plain, audio/mpeg (MP3) and image/gif built-in support.
  • [dpsearch-pars.en.html External parsers] support for other document types.
  • Ability to index multilingual sites using content negotiation.
  • Searching all of the word forms using ispell affixes and dictionaries
  • Basic authorization support. One may index password protected intranet HTTP servers.
  • Proxy authorization support.
  • Reentry capability. One may use several indexing and searching processes at the same time even on the same database. Multi-threaded indexing support.
  • Stop-list support.
  • <META NAME="robots" content="..."> and robots.txt support.
  • C language CGI web front-end.
  • Boolean query language support.
  • Results sorting by relevancy, popularity rank, last modified date and by importance (a multiplication of relevancy and popularity rank).
  • Fuzzy search: different word forms, spelling corrections, [dpsearch-fuzzy.en.html#synonyms synonyms], [dpsearch-fuzzy.en.html#acronym acronyms and abbreviations].
  • [dpsearch-international.en.html#charset Various character sets support].
  • HTML templates to easily customize search results.
  • Advanced search options like time limits, category and tags limits etc.
  • [dpsearch-cjk.en.html Phrases segmenting for Chinese, Japanese, Korean and Thai languages].
  • [dpsearch-fuzzy.en.html#accent Accent insensitive search].
  • [dpsearch-mod_dpsearch.en.html mod_dpsearch] - search module for Apache web server.
  • Internationalized Domain Names support.
  • [dpsearch-rel.en.html#sea The Summary Extraction Algorithm] (SEA).

Where to get DataparkSearch.

Check for the latest version of DataparkSearch at: www.dataparksearch.org, as well at Google Code.

DataparkSearch is also available in FreeBSD ports collection, see www.freshports.org/www/dpsearch and in the T2 Linux SDE.

DataparkSearch's source is available via SVN at Google Code:

svn checkout http://dataparksearch.googlecode.com/svn/trunk/ dataparksearch-read-only

Disclaimer

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. See COPYING file for details.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA

Authors

Maxim Zakharov <maxime@maxime.net.ru>, homepage

Contributors

Michael Kynast <kynast@newslookup.com>: First DataparkSearch user. Testing on Linux Red Hat.

Jean-Gerard Pailloncy: Testing on OpenBSD.

Amit Joshi: Testing on CentOS, packaging for Debian, some ideas to improve the scalability for several PC and using several DBAddr.

mnoGoSearch developers and contributors <devel@mnogosearch.org>: Development and contributions for mnoGoSearch versions up to 3.2.15.

who's online
Personal tools