Some parsers can produce output in other charset
than given in LocalCharset command. Specify charset to make indexer
convert parser's output to proper one. For example, if your catdoc is
configured to produce output in windows-1251 charset but LocalCharset
is koi8-r, use this command for parsing MS Word documents:
Mime application/msword "text/plain; charset=windows-1251" "catdoc -a $1"
DataparkSearch can be build with libextractor library.
Using this library, DataparkSearch can index keywords from files of the following formats: PDF, PS, OLE2 (DOC, XLS, PPT), OpenOffice (sxw), StarOffice (sdw), DVI, MAN, FLAC, MP3 (ID3v1 and ID3v2), NSF(E) (NES music), SID (C64 music), OGG, WAV, EXIV2, JPEG, GIF, PNG, TIFF, DEB, RPM, TAR(.GZ), ZIP, ELF, S3M (Scream Tracker 3), XM (eXtended Module), IT (Impulse Tracker), FLV, REAL, RIFF (AVI), MPEG, QT and ASF.
To build DataparkSearch with libextractor library, install the library, and then configure and compile DataparkSearch.
Bellow the relationship between keyword types of libextractor and DataparkSearch's section names is given:
Table 3-1. Relationship between libextractor's keyword types and DataparkSearch section names
| Keyword Type | Section name |
|---|
| EXTRACTOR_FILENAME | Filename |
| EXTRACTOR_MIMETYPE | Mimetype |
| EXTRACTOR_TITLE | Title |
| EXTRACTOR_AUTHOR | Author |
| EXTRACTOR_ARTIST | Artist |
| EXTRACTOR_DESCRIPTION | Description |
| EXTRACTOR_COMMENT | Comment |
| EXTRACTOR_DATE | Date |
| EXTRACTOR_PUBLISHER | Publisher |
| EXTRACTOR_LANGUAGE | Content-Language |
| EXTRACTOR_ALBUM | Album |
| EXTRACTOR_GENRE | Genre |
| EXTRACTOR_LOCATION | Location |
| EXTRACTOR_VERSIONNUMBER | VersionNumber |
| EXTRACTOR_ORGANIZATION | Organization |
| EXTRACTOR_COPYRIGHT | Copyright |
| EXTRACTOR_SUBJECT | Subject |
| EXTRACTOR_KEYWORDS | Meta.Keywords |
| EXTRACTOR_CONTRIBUTOR | Contributor |
| EXTRACTOR_RESOURCE_TYPE | Resource-Type |
| EXTRACTOR_FORMAT | Format |
| EXTRACTOR_RESOURCE_IDENTIFIER | Resource-Idendifier |
| EXTRACTOR_SOURCE | Source |
| EXTRACTOR_RELATION | Relation |
| EXTRACTOR_COVERAGE | Coverage |
| EXTRACTOR_SOFTWARE | Software |
| EXTRACTOR_DISCLAIMER | Disclaimer |
| EXTRACTOR_WARNING | Warning |
| EXTRACTOR_TRANSLATED | Translated |
| EXTRACTOR_CREATION_DATE | Creation-Date |
| EXTRACTOR_MODIFICATION_DATE | Modification-Date |
| EXTRACTOR_CREATOR | Creator |
| EXTRACTOR_PRODUCER | Producer |
| EXTRACTOR_PAGE_COUNT | Page-Count |
| EXTRACTOR_PAGE_ORIENTATION | Page-Orientation |
| EXTRACTOR_PAPER_SIZE | Paper-Size |
| EXTRACTOR_USED_FONTS | Used-Fonts |
| EXTRACTOR_PAGE_ORDER | Page-Order |
| EXTRACTOR_CREATED_FOR | Created-For |
| EXTRACTOR_MAGNIFICATION | Magnification |
| EXTRACTOR_RELEASE | Release |
| EXTRACTOR_GROUP | Group |
| EXTRACTOR_SIZE | Size |
| EXTRACTOR_SUMMARY | Summary |
| EXTRACTOR_PACKAGER | Packager |
| EXTRACTOR_VENDOR | Vendor |
| EXTRACTOR_LICENSE | License |
| EXTRACTOR_DISTRIBUTION | Distribution |
| EXTRACTOR_BUILDHOST | BuildHost |
| EXTRACTOR_OS | OS |
| EXTRACTOR_DEPENDENCY | Dependency |
| EXTRACTOR_HASH_MD4 | Hash-MD4 |
| EXTRACTOR_HASH_MD5 | Hash-MD5 |
| EXTRACTOR_HASH_SHA0 | Hash-SHA0 |
| EXTRACTOR_HASH_SHA1 | Hash-SHA1 |
| EXTRACTOR_HASH_RMD160 | Hash-RMD160 |
| EXTRACTOR_RESOLUTION | Resolution |
| EXTRACTOR_CATEGORY | Ext.Category |
| EXTRACTOR_BOOKTITLE | BookTitle |
| EXTRACTOR_PRIORITY | Priority |
| EXTRACTOR_CONFLICTS | Conflicts |
| EXTRACTOR_REPLACES | Replaces |
| EXTRACTOR_PROVIDES | Provides |
| EXTRACTOR_CONDUCTOR | Conductor |
| EXTRACTOR_INTERPRET | Interpret |
| EXTRACTOR_OWNER | Owner |
| EXTRACTOR_LYRICS | Lyrics |
| EXTRACTOR_MEDIA_TYPE | Media-Type |
| EXTRACTOR_CONTACT | Contact |
| EXTRACTOR_THUMBNAIL_DATA | Thumbnail-Data |
| EXTRACTOR_PUBLICATION_DATE | Publication-Date |
| EXTRACTOR_CAMERA_MAKE | Camera-Make |
| EXTRACTOR_CAMERA_MODEL | Camera-Model |
| EXTRACTOR_EXPOSURE | Exposure |
| EXTRACTOR_APERTURE | Aperture |
| EXTRACTOR_EXPOSURE_BIAS | Exposure-Bias |
| EXTRACTOR_FLASH | Flash |
| EXTRACTOR_FLASH_BIAS | Flash-Bias |
| EXTRACTOR_FOCAL_LENGTH | Focal-Length |
| EXTRACTOR_FOCAL_LENGTH_35MM | Focal-Length-35MM |
| EXTRACTOR_ISO_SPEED | ISO-Speed |
| EXTRACTOR_EXPOSURE_MODE | Exposure-Mode |
| EXTRACTOR_METERING_MODE | Metering-Mode |
| EXTRACTOR_MACRO_MODE | Macro-Mode |
| EXTRACTOR_IMAGE_QUALITY | Image-Quality |
| EXTRACTOR_WHITE_BALANCE | White-Balance |
| EXTRACTOR_ORIENTATION | Orientation |
| EXTRACTOR_TEMPLATE | Template |
| EXTRACTOR_SPLIT | Split |
| EXTRACTOR_PRODUCTVERSION | ProductVersion |
| EXTRACTOR_LAST_SAVED_BY | Last-Saved-By |
| EXTRACTOR_LAST_PRINTED | Last-Printed |
| EXTRACTOR_WORD_COUNT | Word-Count |
| EXTRACTOR_CHARACTER_COUNT | Character-Count |
| EXTRACTOR_TOTAL_EDITING_TIME | Total-Editing-Time |
| EXTRACTOR_THUMBNAILS | Thumbnails |
| EXTRACTOR_SECURITY | Security |
| EXTRACTOR_CREATED_BY_SOFTWARE | Created-By-Software |
| EXTRACTOR_MODIFIED_BY_SOFTWARE | Modified-By-Software |
| EXTRACTOR_REVISION_HISTORY | Revision-History |
| EXTRACTOR_LOWERCASE | Lowercase |
| EXTRACTOR_COMPANY | Company |
| EXTRACTOR_GENERATOR | Generator |
| EXTRACTOR_CHARACTER_SET | Meta-Charset |
| EXTRACTOR_LINE_COUNT | Line-Count |
| EXTRACTOR_PARAGRAPH_COUNT | Paragraph-Count |
| EXTRACTOR_EDITING_CYCLES | Editing-Cycles |
| EXTRACTOR_SCALE | Scale |
| EXTRACTOR_MANAGER | Manager |
| EXTRACTOR_MOVIE_DIRECTOR | Movie-Director |
| EXTRACTOR_DURATION | Duration |
| EXTRACTOR_INFORMATION | Information |
| EXTRACTOR_FULL_NAME | Full-Name |
| EXTRACTOR_CHAPTER | Chapter |
| EXTRACTOR_YEAR | Year |
| EXTRACTOR_LINK | Link |
| EXTRACTOR_MUSIC_CD_IDENTIFIER | Music-CD-Identifier |
| EXTRACTOR_PLAY_COUNTER | Play-Counter |
| EXTRACTOR_POPULARITY_METER | Popularity-Meter |
| EXTRACTOR_CONTENT_TYPE | Ext.Content-Type |
| EXTRACTOR_ENCODED_BY | Encoded-By |
| EXTRACTOR_TIME | Time |
| EXTRACTOR_MUSICIAN_CREDITS_LIST | Musician-Credits-List |
| EXTRACTOR_MOOD | Mood |
| EXTRACTOR_FORMAT_VERSION | Format-Version |
| EXTRACTOR_TELEVISION_SYSTEM | Television-System |
| EXTRACTOR_SONG_COUNT | Song-Count |
| EXTRACTOR_STARTING_SONG | Strting-Song |
| EXTRACTOR_HARDWARE_DEPENDENCY | Hardware-Dependency |
| EXTRACTOR_RIPPER | Ripper |
| EXTRACTOR_FILE_SIZE | File-Size |
| EXTRACTOR_TRACK_NUMBER | Track-Number |
| EXTRACTOR_ISRC | ISRC |
| EXTRACTOR_DISC_NUMBER | Disc-Number |
If a section name from the list above doesn't specified in sections.conf, the value of corresponding keyword is written as body section.
Keywords of unknown type are written as body section as well.