Scripal modules for various programming languages

Java, C#, Javascript, Python

Notice: All functions throw exceptions in case of fatal errors.
You may read the last error with getErrMsg() any time, an empty string indicates no error occured.
See constants used by Scripal library at the end of the document.
Remember to use proper escape sequences and nested strings, depending on the language used.
Example: obj.match("match 't' ")

module management and global data

Scripal object TFScripal

A Scripal object is based on class TFScripal, constructors are

TFScripal methods

results and class TFResult

every TFScripal object has an object TFResult, all indexeces are in the range 0,1...

TFResult {
   fileNames[] -> file name of result, when using searchFiles() method of TFScripal
   text[] -> results strings
   positions[] -> positions depending on config.posType
   ratings[] -> ratings when using nearest or block match, ratings are in the range 0 (no match)...1 (perfect match)
   tags[] -> tags set in Scripal source
   size() -> no of results
}

access with obj.results.text[i] etc.

results may be exported in the following formats * result in JSON format
string obj.getResultJSON()

configuration

The configuration object config is a thread/process based singleton.

templates

The templates object ist a global singleton, which is thread safe and can be accessed by all processes and threads.

constants

encodings

ENC_DEFAULT = 1    use Locales encoding
ENC_UTF8 = 2    UTF-8
ENC_UTF16L = 3    UTF-16 little-endian
ENC_UTF16B = 4    UTF-16 big-endian
ENC_UTF32L = 5    UTF-32 little-endian
ENC_UTF32B = 6    UTF-32 big-endian
ENC_ASCII = 10    ASCII , Extended ASCII (up to 255)
ENC_CP932 = 11    CP932 DBCS, Japanese characters
ENC_CP936 = 12    CP93 6DBCS, simple Chinese characters
ENC_CP949 = 13    CP949 DBCS, Korean characters
ENC_CP950 = 14    CP505 DBCS, Chinese Big5 characters
ENC_LATIN1 = 30    West Europe Latin-1, ISO 8859-15
ENC_LATIN2 = 31    Middle Europe Latin-2, ISO 8859-15
ENC_LATIN9 = 32    West Europe Latin-9, ISO 8859-15
ENC_WIN874 = 50    Windows Codepage 874, Thai characters
ENC_WIN1250 = 51    Windows Codepage 1251, Middle Europe
ENC_WIN1251 = 52    Windows Codepage 1251, Cyrillic
ENC_WIN1252 = 53    Windows Codepage 1252, West Europe
ENC_WIN1253 = 54    Windows Codepage 1253, Greek
ENC_WIN1254 = 55    Windows Codepage 1254, Turkish
ENC_WIN1255 = 56    Windows Codepage 1255, Hebrew
ENC_WIN1256 = 57    Windows Codepage 1256, Arabic
ENC_WIN1257 = 58    Windows Codepage 1257, Baltic
ENC_WIN1258 = 59    Windows Codepage 1258, Vietnamese

matching algorithms (block and nearest)

PATTERN_LEVEN_WORD = 1    Levenshtein distance match, use for matching words
PATTERN_LEVENPLUS_WORD = 2    optimized Levenshtein distance match, use for matching words
PATTERN_LEVEN = 3    Levenshtein distance match, use for matching phrases with any characters
PATTERN_JARO = 100    Jaro distance, use for matching phrases and text
PATTERN_JAROWINKLER = 101    Jaro-Winkler distance, use for matching phrases and text
PATTERN_JAROWINKLER_WORD = 102    Jaro-Winkler distance, use for matching words

result position types (set in config, element posType)

POS_UTF8 = 1    positions relate to the position in the UTF-8 encoded text
POS_OFFSET = 2    result position relates to character encoding of the text and is byte offset!
POS_COUNT = 3    result position relates to character (UNICODE code point) count