Scripal language

terminology

All operations are of the form

  |condition| operator |attributes| |operand| |(...)|  

where |element| denotes, the element is optional or element1|element2 means, one of many can be chosen

operators

will match 'dog' at the current position in text

will match 'dog' at the current position in text and finalize result

will store the current result as label 'code' (much like a variable) and can be used for further matches

will store the current result under tag 'test tag' to denote the result text and position

or

match 'pre'; ifMatch {match find('test'); loop;}

must match at least 2 times - moveon number : move position pointer in text by number of code points, if number is omitted -> move ahead one code point
does not influence match logic

operator conditions

operands

value or number ranges are embedded in [x, x] where @ = infinity, example: [0, @]

many operands push the position pointer further

text operand

'text', "text", `text`: match text

example: match ('test')

match the word 'test'

So any of the three delimiters '', "", `` may be used. If you have text with double quotes, embed in quotes.
example: ' it is no "surprise" '

special attributes: ~

name operand

#name# : match text held in given name

example: match (#ph#)

match the word held in placeholder called 'ph'

special attributes: ~

number or number range

attribute number or

attribute [number min,number max]

example: match [3,5]

match all numbers >= 3 and <= 5
The number should stand isolated like a word, as in '100 times' or 'I was born in 1989.' To find numbers in phrases use the pure attribute. Example '0xff6a' or 'hgf-300-GHT' etc. Here the embedded numbers may be found by using the pure attribute. Decimal point in Scripal is always '.', example: 0.355. A decimal point in text to scan is set in the config, may be locale specific.
special attributes: hex, oct, bin, pure

logical operand

logical operands don't set results themselves but combine or test other operands

will match only if all terms in this order are given 'my friend Greg'

will match if any one of the terms occur

will match if 'my', 'my cat', also 'my cat is' etc. is found. 'each' matches as long as possible in the order given

will match 'dog', also 'cat' and 'dog', also 'dog' and 'cat', the order is not relevant, the 'every' operand matches as long as possible in any order

will move the position pointer until 'dog' is found

will move the position pointer until 'dog' is found, only if it the term is found 1 up to 5 characters from the current position on

will look for the term 'dog' , but if 'cat' is found, the find operation will be aborted

will succeed if at least one and up to three spaces haven been found

will succeed if the entire match is an integer number
see attribute description further down

will succeed if the sole words 'grape' (eow), 'grapes' or 'grapefruit' have been found
without eow the word 'grape' would not be found

will succeed if the sole word 'GRAPE' is all in upper case

will succeed if the sole word 'grape' is all in lower case

block operand

block operands set matches and results

match any single character

example: match (char[65])

match character 'A'

example: match (char[65,67])

match character 'A', 'B' or 'C'

operand attributes

attributes further specify the behaviour of operands

will match 'dog' case insensitive, 'Dog' will match as well

will match any text but 'dog'

will match the number 145, not 145.13 or -145'

will match 'a145b' or '.145.' at the end of a sentence

will find the word 'test' and set match condition to true, but no result
the position pointer is at the first 't' of 'test'

will find the word 'test' and set match condition to true, but no result
the position pointer is after the last 't' of 'test'

find the word 'contained' only if anywhere in the text the word 'is' was found, only match condition is set to true

will match 'dog', but will also match if 'dog' ist not found
try will include text but only if possible, in any case the condition is fulfilled

will match all contiguous letters, the last letter must be 's' or 'S'

will match all contiguous letters, the last letter cannot be 's' or 'S'

will match the word 'bee' if not preceded by 'honey '

controls

controls have various functions, grouping operator blocks, ending lines etc.

results

Scripal is UTF-8 based, all internals use this encoding. Results are byte positions! not character positions, since they are quite useless for further processing in many programming languages. Results regions may be open!, that is only one side is a byte position, denotig every character from here [byte pos, nPos] or up to here [nPos, byte pos]. nPos represents the NULL position. Result positions are related to the position in the UTF-8 representation of the text.
If the character encoding is different, positions cost time to be calculated, they are set to NaN (not a number). Set config.positionType to POS_OFFSET for byte positions in the native encoding or POS_COUNT for character count (0...). These last two settings produce extra proccesing time, use if positions are needed. nPos and is denoted as -1 in JSON results, NaN as -2. In other result types , the words are written out.

Example: position [2,7] means the result expands over bytes 2 up to 7, so 7 would by the last byte of the last code point of the result region.
Virtual positions are open ranges: For example: In case of operand 'bow' (begin of word), only the byte position is returned with an end, no region is involved: [8,nPos] would indicate, the word starts at byte position 8.
Results are held in arrays or may be obtained in JSON or CSV format.

templates

\<name = template> : denote template

Templates are used for repetitive expressions, like macros. By specifying the name of the template you use a certain code fragment several times. A change in the template causes a change in all instances.

< roadMarker = { any( ~'avenue' ~'ave.' ~'road' ~'street' ~'boulevard'  ~'drive'  ~'lane' ) } >  
match find( int[1,10000] blank repeat[1,3]( !( < roadMarker > ) word ) blank < roadMarker > ) 
ifMatch { 
  matchEnd ( ',' blank int[1,@] repeat[1,3]( blank word ) at any(',' eol eot ))
} 

defines roadMarker as a template to match any road type
the entire expression will match an address like 1007 Mountain Drive, 63527 Gotham City

define a template in the source by using

< name = { code } >

templates may have arguments

 <1> <2> ..
example: < person  = { match find ( <1> space <2>) } > 

which will be substituted by caller arguments in:

< person {'mike'}{'myers'} >

<1> and <2> in template person will be substitued with 'mike' and 'myers' so template person will match for first name and surname

configuration

The configuraton data can be set by special methods. The configuration is thread bound, all Scripal objects in a thread share the same configuration. Configuration data may be specified at the beginning of source in %xxxxx%, and this configuration is only used for the given object.

example: % = { "showCode" : true, "posSign" : "+"  } 

If a config file exists at the default location, it is used implicitly. Create by calling

scripal -c reset

default path is:
Linux : ~/.config/scripal/scripal.cnf
MS Windows : .\scripal.cnf

config values:

encodings

Scripal uses UTF-8 internally, so matching a UTF-8 string or file is the fastest operation. It does support various other encodings: UTF-16BE, UTF-16LE, UTF-32BE, UTF-32LE, Latin and Windows codepages.

The internal encodings are: