Getting started with dictionaries
Dictionaries provide a simple, highly productive method for creating text references in documents, and more generally creating links on text graphs.
Entity Extraction Scripts (EESs) work much faster when Dictionaries are use to create the initial text graph literals (which avoids using matching pattern elements generic matching pattern elements like Token as in Token<string=XXX>, which are common and so give slow running EES rules).
Dictionaries contain word lists
At its most basic, the Dictionaries comprise Word Lists which are lists of words and phrases that Sintelix can then recognise in documents. Each word or phrase in a word list is called an entry.
For example:
#wordlist tag:Weapon
artillery
bombs
missiles
aircraft carrier
Sample Text :
During the attack, artillery, bombs and missiles were used to disable the aircraft carrier.
Result :
During the attack, artillery, bombs and missiles were used to disable the aircraft carrier.
White space
All white space is consider equal except for line feeds, which are used to separate command phrases and word list entries. Blank lines can be added for readability without changing the meaning of a Dictionary.
Comments
Use C++/Java comments style:
// this is a comment until end of line
/* this is a comment
which spans multiple lines
*/

Dictionary commands
The core commands available in Dictionaries include:
Command |
Syntax |
---|---|
Defining word lists |
#wordlistwordlist_name |
Defining columns |
#cols column1, column2, ...// where “column1” etc. are column definitions |
Clearing state |
#clear[attribute|cols|all] |
Asserting attributes |
#attribute:typevalue// where "attribute" can be "feature" or "cond" or "generalise" (Feature type and value are variables.) |
Additional conditions that need to be met before the snippet is labelled |
#cond:type// where "type" can be "case" or "context" |
generalising singular words and phrases to detect plural versions |
#generalise:type[true|false]// where "type" takes the value "plural" |

Simple examples
Here are some simple examples of using dictionary to markup words and phrases (see the Sintelix Dictionary demonstrationnstration" Demo 1. Basics"):
/*
#EXAMPLE
The Chief Executive Officer John Smith asks his secretary to send a letter.
#EXAMPLE END
*/
// starts a Word List with #wordlist "Demo1-jobtitle"
#wordlist Demo1-Basic
// add entries by writing each as an ordinary line
chief executive officer
project manager
// you can add blank lines or comments for readability
manager
team leader
secretary
The Text Graph analyser is included as part of the Dictionary development page in Sintelix. In this case it provides the following diagnostic information:

Creating text references in documents
To create a visible text reference the link is created in the name space "tag" with the output phrase.
#wordlist tag:tag_name
For example:
/*
#EXAMPLE
The victim was rushed to Hospital.
Police secured the area, searched for weapons and other evidence. The
suspected assailant was apprehended carrying a
recently discharged pistol.
#EXAMPLE END
*/
#wordlist tag:Mywordlist
police
police officers
weapons
pistol
firearm
assailant
victim
This generates the following output in the document view:

Escaping special characters
Certain characters and character sequences have special meaning within the text of Word Lists:
# , * // /* */
They need to be "escaped" when they are used in the text of a word list entry. In Sintelix Dictionaries, the text to be escaped is surrounded with double quotes (").
Within escaped text, the back slash (\) is used to escape the back slash and double quote characters.
For example:
/*
#EXAMPLE
#BP, one, zero, x * y, a // b, double " quote, back \ slash
number of = #
He said, "The wildcard is *"
#EXAMPLE END
// note that [double ["] quote] contains nested matches
*/
#wordlist Demo8-Escape
hello world // phrase is ok
"#BP" // escaping hash (#)
"one, zero" // escaping comma (,)
"x * y" // escaping asterisk (*)
it's // apos (') is ok
x-mas // dash (-) is ok
N\A // back slash (\) is ok
single slash / is ok
"a // b" // escaping (//)
"a /* b" // escaping (/*)
"a */ b" // escaping (*/)
number of = "#"
// Escaping " and \ within double quotes
"double \" quote" // escaping (") becomes (double " quote)
"\""// escaping (") by itself becomes (")
"back \\ slash" // escaping (\) becomes (back \ slash)
He said", \"The wildcard is \*\""

Testing entity extraction scripts
The most productive method for testing EESs involves using the Text Graph Analyser (see Text Graph analyser and Testing scripts with the Text Graph analyser).

Including entity extraction scripts in the document processing workflow
Dictionaries can be inserted into the Sintelix workflow - so that they can be used to process documents in bulk.
Dictionaries are inserted into the Sintelix workflow via the Document Processing Configuration - which can be accessed via the Configuration menu or directly from the Text Graph analyser. The image below shows the document processing configuration needed to run three demonstration Dictionaries (Demos 1, 2 and 4).
Dictionaries are run over documents during processing. They are listed in the table in the Document Processing configuration, illustrated below.