EES Basics

White space

All white space is considered equal. For optimum readability, use new lines, tabs and spaces to indent and align rules.

Comments

Use C++/Java comments style.

Copy

// this is a comment until end of line
/* this is a comment which can span multiple lines */

Simple EES rule

Each rule has two parts:

The matching pattern.
One or more output phrases (statements beginning with ">").

The basic syntax for a matching pattern matches a sequence of pattern elements on the graph. Each pattern element usually matches a link. For example, a matching pattern might have three elements:

Copy

pattern_element1
pattern_element2
pattern_element3

Links in a matching pattern can be listed across the page without changing the meaning:

Copy

pattern_element1 pattern_element2 pattern_element3

Example:

The token string "Paris is fun" is matched by the following sequence:

Copy

Token<string="Paris">
Token<string="is">
Token<string="fun">

Pattern element conditions

Pattern elements may contain pattern element conditions, which serve to make the pattern element more selective in matching graph elements:

Copy

 pattern_element1<conditions1> pattern_element2<conditions2> pattern_element3<conditions3>

Conditions are expressed in relation to the features of a graph element. The conditions are contained between angled brackets ("<" and ">") and there may be several:

Copy

pattern_element<value1_left=value1_right, value2_left=value2_right, ....>

Each condition takes the form of an equality or an inequality

Copy

value_left=value_right

or

Copy

value_left!=value_right

...where left or right values can be constants or functions.

Output phrases - creating a link

The most common output phrase when matching a sequence is by creating a new link.

Syntax:

The syntax for link creation is:

Copy

value_left=> new_link_name<feature1=value1, feature2=value2...>

The features within the angle brackets ("<>") are then added to the newly created link.

Example:

Let's look at a full EES rule for this:

Copy

Token<text()="hello">
Term.symbolic
Token<text()="world">
Term.symbolic
> Exclamation<welcome=>

The Sintelix Text Graph view (above) shows that the link "exercise:Exclamation" has been created over the text "hello, world!" with the feature "welcome" set to "true".

If you want to see the output of your rules, use the "tag:" name space, for example:

Creating Text References in documents

To create a visible text reference, use the namespace "tag" with the output phrase.

Copy

> tag_name<...> \\ where "..." are feature settings

Example:

We have some text below:

The blue car, which was leaking oil, drove west.

We apply the following matching pattern:

Copy

Token<text()="blue"> 
Token<text()="car">
Term.symbolic
>tag:Blue_car

This rule will fire on the text segment "blue car" and generate a text reference with tag name "Blue_car", as shown here.

Set a Speaker

Within the output phrase use the @setspeaker command to rename the speaker channel containing the current pattern. Set the name label to a "string" value (usually the name of the entity).

Copy

> @setspeaker<name=$speaker.text()>

Example:

Copy

(lookup:Person-title?
lookup:Person.name+) = $speaker
> @setspeaker<name=$speaker.text()>
> tag:Speaker

The Text Editor

The text editor provides code highlighting.

Obtain the list of programming primitives by placing the cursor in the text area and pressing ctrl-space:

Testing and debugging Entity Extraction Scripts

The most productive method for testing EES involves using the Text Graph analyser (see Text Graph analyser and Testing scripts with the Text Graph analyser).

There are some useful capabilities described in Debugging Entity Extraction Scripts.

Including Entity Extraction Scripts in the Document Processing workflow

EES can be added into the Sintelix workflow so that they can be used to process documents in bulk.

You have the choice of two stages in the Sintelix workflow for your EES - Early and Late. Built-in learned entities (with link types such as tag:Person, tag:Organisation and tag:Location) are only available for use in matching patterns when the script is inserted for Late Stage processing.

EES are inserted into the Sintelix workflow via the Document Processing Configuration - which can be accessed via the Configuration menu or directly from the Text Graph analyser. The image below shows the document processing configuration needed to run the example EES.

EES are run over documents in the order they occur (from top to bottom) shown in the tables here.

Acronym detection is enabled only for a text references created by an EES added at the early stage. Acronyms will not be identified for text references created by a late script, therefore acronym detection also should be handled by the script.

EES Basics

EES Rules and their execution

Basic syntax of EES rules

EES Basics

EES Rules and their execution

Basic syntax of EES rules

White space

Comments

Simple EES rule

Matching patterns

Example:

Pattern element conditions

Output phrases - creating a link

Syntax:

Example:

Creating Text References in documents

Example:

Set a Speaker

Example:

The Text Editor

Testing and debugging Entity Extraction Scripts

Including Entity Extraction Scripts in the Document Processing workflow