Address Other Textblocks from EES

Defining how to find a different textblock

The first step necessary to address a different textblock is to define how to find it in relation to the textblock which is being analysed. This is done with a #scope directive.

Each scope has a name, which enables the script to refer to it later, and a sequence of rules that describe how to navigate within the document structure.

A simple example

Copy

#scope PreviousTextblock
prev-textblock
#

This directive defines a new scope called ‘PreviousTextblock’. The definition has a single instruction ‘prev-textblock’ which instructs the system to move from the current textblock to the one before it.

Using #scope directives - the #with syntax

After a #scope has been defined, it can be used in an entity extraction script rule using the #with directive as follows:

Copy

#with name-of-scope
[matching pattern in the other textblock]
#
[matching pattern in the current textblock]
> [output]

Without the #with, matching pattern is always in the "current" textblock. If there is one or more #with, the matching pattern within applies to the other textblock. The #with directives can be before or after the main pattern.

A simple example

Copy

#scope PreviousTextblock
prev-textblock
#
#with PreviousTextblock
Token<text ()="CODE">
#
Token.number
> tag:Code

The above example has a main rule that matches any number and tags it as Code. However, the rule is executed with a #with condition that uses PreviousTextblock scope (defined above) and requires that the previous textblock matches a token with text "CODE".

This ensures that the only numbers marked up as tag:Code are the ones that have the correct header.

The output (here tag:Code) is only created on top of the main matching pattern. However, an output can be created on top of the #with pattern by using labels:

Copy

#with PreviousTextblock
Token<text ()="CODE"> = $heading
#
Token.number
> tag:Code
> $heading = tag:CodeHeading

However, no annotation can span more than one textblock, so there can't be an output that's on top of main and #with pattern at the same time.

Debugging the #scope directives

To see how individual instructions of the #scope directive are executed:

Create and open an Entity Extraction Script configuration.
Create a #scope directive, such as the example above.

Click Save & Test.
Switch the Document Input Method to Document ID and enter the ID of an existing document to use as test. Using the ‘Enter Text’ method is not helpful because a practical test requires a full document structure.

Click Submit.
In the document view, find a textblock for which you would like to write an EES rule.
Right click on the magnifying glass icon.
Select ‘Test #scope from this textblock’.

A panel with the #scope testing tool is displayed.

The panel consists of the following parts:

The dropdown is a selector for the #scope rule to debug. In this example there is only one rule called ‘PreviousTextblock’.
‘starting from’ shows which textblock was chosen as the starting point. You can click on the link to highlight it in document view.

The ‘location’ shows where the starting textblock is located in its document. In this example, the textblock belongs to a cell, which belongs to a row, table, and the document root. Those structures can be highlighted in document view by clicking on them.

The table that follows shows each instruction and the result of executing this instruction. In this example, there is only one instruction ‘prev-textblock’ and the result of executing it is a single textblock. You can click on this textblock to highlight it.

Finally, there is a list of textblocks that were matched by this #scope rule. If there is more than one textblock matched, the list will be tried in order from the textblock that is closest to the starting point, to the textblock that is furthest away. The list displayed here reflects this order.

For a more advanced example of a #scope directive, see ‘Full example’ below.

For a full list of instructions available in the #scope directive, see the reference table below.

The #scope syntax reference

The following table lists available instructions to use within #scope directive.

Textblock manipulation instructions

Textblock manipulation instructions match consecutive textblocks and ignore the structure.

Instruction	Meaning
prev-textblocks(min-max)	Textblocks before current selection, between [min] and [max] distance
next-textblocks(min-max)	Textblocks after current selection, between [min] and [max] distance
first-textblocks(min-max)	First textblocks within current selection, between [min] and [max] from the beginning
last-textblocks(min-max)	Last textblocks within current selection, between [min] and [max] from the end

All the textblocks matching instructions have shortcuts that follow this pattern:

Instruction	Meaning
prev-textblocks(n)	same as prev-textblocks(1-n)
prev-textblock(n)	same as prev-textblocks(n-n)
prev-textblock	same as prev-textblocks(1-1)

Examples:

Copy

prev-textblock //a previous textblock
first-textblocks(2) //the first two textblocks
first-textblock(2) //the second textblock

Expand instruction

Instruction	Meaning
expand-structure(name)	Finds a structural element of a given name that contains the current selection

Example:

Copy

expand-structure(cell) //find the table cell that contains the current selection
expand-structure(table) //find the table that contains the current selection
expand-structure(list) //find the list that contains the current selection

The expand-structure instruction has the following shortcut:

Instruction	Meaning
expand-structure(name)	same as expand-structure(xxx)

Structure manipulation instructions

Structure manipulation instructions match consecutive structure elements.

Instruction	Meaning
prev-structures(name)(min-max)	Structures of given name before current selection, between [min] and [max] distance
next-structures(name)(min-max)	Structures of given name after current selection, between [min] and [max] distance
first-structures(name)(min-max)	First structures of given name within current selection, between [min] and [max] from the beginning
last-structures(name)(min-max)	Last structures of given name within current selection, between [min] and [max] from the end

All the structure matching instructions have shortcuts that follow this pattern:

Instruction	Meaning
prev-structures(xxx)(n)	same as prev-structures(xxx)(1-n)
prev-structure(xxx)(n)	same as prev-structures(xxx)(n-n)
prev-structure(xxx)	same as prev-structures(xxx)(1-1)
prev-xxxs(range)	same as prev-structures(xxx)(range)
prev-xxx(range)	same as prev-structure(xxx)(range)
prev-xxx	same as prev-structures(xxx)(1-1)

Examples:

Copy

first-cell //first table cell
first-cells(3) //first three table cells
first-cell(3) //third table cell

Text matching instruction

Instruction	Meaning
contains-text("substring")	find the first textblock of the current selection with the given sub-string text
equals-text("text")	find the first textblock of the current selection with the given text

Those instructions match up to one textblock. The pattern is a sub-string of the textblock, or the entire textblock. Matching is case-sensitive. If the pattern contains a quotation mark, use sequence \" instead. If it contains backslash use \\ instead.

Table manipulation instructions

Table manipulation instructions are dedicated to finding data within a table. Table is analysed with column spans taken into account, and the resulting match can be multiple cells of one column which are otherwise not direct neighbours.

Table column is not a structure within a document and as a result, selecting a column is actually implemented as a selection of individual table cells of that column.

Note that it's not possible to select multiple columns. As a result, instructions like first-column don't take a range of numbers, like first-structure would.

To avoid ambiguity when selecting columns, it's recommended to use those instructions when the current selection is one cell, or one textblock. If the current selection is made of multiple cells, the first selected cell is used to derive current column number.

Instruction	Meaning
expand-column	select all the cells belonging to current table column
first-column(n)	select the nth column of the current table (use expand-table if necessary to make sure that current selection is a table)
first-column	same as first-column(1)
last-column(n)	select the nth column from the right of the current table (use expand-table if necessary to make sure that current selection is a table)
last-column	same as last-column(1)
upper-cells(min-max)	select the table cells above the current selection, in the same column. See next-textblocks for how min/max are used and how shortcuts operate
lower-cells(min-max)	select the table cells below the current selection, in the same column. See next-textblocks for how min/max are used and how shortcuts operate
top-cells(min-max)	select the table cells on top of currently selected column. See next-textblocks for how min/max are used and how shortcuts operate
bottom-cells(min-max)	select the table cells on the bottom of the currently selected column. See next-textblocks for how min/max are used and how shortcuts operate
same-column	limit the current selection to only match the cells in the same column as the starting point
same-row	limit the current selection to only match the cells in the same row as the starting point

allow-self instruction

Instruction	Meaning
allow-self	allow the output to contain the starting textblock

The result of textblock matching can contain the initial textblock, which is used to match the main pattern. However, by default, that textblock is filtered-out.

If you include the instruction "allow-self" anywhere in the #scope directive, initial textblock is allowed.

Examples:

Copy

#scope MyCellButNotMyTextblock
expand-cell
#
#scope MyCell
expand-cell
allow-self
#

Full examples - simple analysis of a table

The following #scope shows how to markup text in table cells based on their column heading:

Copy

#scope MyHeading
expand-column // go from the current textblock to the entire column
top-cell // consider only the very first cell of this column
#
#with MyHeading
Token<text()="Project">
#
Token+
> tag:ProjectName

The following #scope shows how to access the header of a table:

Copy

#scope MyTableHeader
expand-table // expand to the full table
prev-textblock // get the textblock before current table
#
#with MyTableHeader
Token+ = $tableIdentifier
#
// match the single cell "Project"
>Node<isBreak=true>
Token<text()="Project">
Node<isBreak=true>
> @create<ns="tag", name=$tableIdentifier.text()>

The following #scope shows that this capability can be used to make links:

Copy

#scope MyRow
expand-row // find it anywhere in the same table row
#
#with MyRow
tag:ProjectName = $pro
#
tag:Priority = $prio
> @connect<project=$pro, priority=$prio> tag:PriorityLink