Full Page Document pane

Background
Once you have created a Rule Set and added documents to a Gold Standard Collection, you can create and modify the rules.
Refer to Create a Rule Set: Sintelix Extension, Harvest to a Gold Standard Collection and Evaluate and Modify the Rule Set for information.
When you select Configurations > Harvester Rule Sets, the Rules Set panes are displayed: The Full Page document pane is the middle pane.
What you can do
The Full Page pane displays the Full Page document containing all the original html.
Using the Full Page document, you can:
-
Visualise the effect of selected rules on the Full Page document
-
Inspect an element by selecting the element so you can:
-
create a new rule for selected element
-
add the selected element to the Gold Standard document
-
remove the selected element from the Gold standard document
-
-
Change the zoom on the pane from small, medium to large
-
Switch Views between visual mode and plain document mode, which can be useful when the visual mode is not displaying correctly.
Visualise the effect of selected rules
You can see the effect of the rules on the original Full Page document. The elements affected by rules are highlighted with colour coding.
Links are highlighted with a slightly darker tone and images have a shaded overlay.
If no rules are selected in the Rules tab, the effect of all rules are shown.
If one or more rules are selected in the Rules tab, the effect of the selected rules are shown.
You can also select the Gold Standard tab in the right pane to compare the marked up original document with the harvested Gold Standard document.
Colour Coding
Elements are colour coded to indicate their status:
Colour | Gold Standard | Rule | Corrective Action |
---|---|---|---|
Correct (Green) elements Correct Negative |
Included | Covered | None Required |
Spurious (Orange) elements Spurious Negative |
Missing | Covered |
Add to Gold Standard or modify/remove rule to no longer capture spurious element |
Missing (Pink) elements Missing Negative |
Included | Missing |
Remove from Gold Standard or create/modify rule to capture missing element |
Unselected (no colour) elements |
Not Included |
None Included |
None Required |
A quick trick to remember the colours: The Golden colour (orange) is missing from the Gold Gold Standard document.
Inspect an element
When you click on an element, the Path in Document dialog is shown.
The Path in Document dialog identifies:
-
the element path - with the elements covered by a rule coloured with the same colour coding
-
a description of the element, e.g. text, number of characters, and a short excerpt of the text
-
the first rule that affected this element
-
whether the element is in the Gold Standard document or not
The options displayed on the dialog change depending on the status (colour) of the element selected.
Path in Document dialog: Actions
The options available on the Path in Document dialog depends on the colour of the element selected.
A quick way to view and fix spurious and missing elements is to go to the Errors tab. The Errors tab lists all spurious and missing elements, so you quickly fix them all at one time. See Errors tab: Quickly Fix Errors.
You can choose the action required on the selected element.
-
Apply selection to similar elements determines the effect of the change. For example, you would have this:
-
checked if you wanted to add or remove all similar elements from the Gold Standard document.
-
unchecked if you wanted to create a separate rule to differentiate this element from similar elements
-
-
Correct (Green) or Missing (Pink) .
- shown if element is-
Removing a Pink element (or a black element with Pink text) from the Gold Standard will remove all colour coding (this element will not be harvested).
-
Removing a Green element from the Gold Standard will change it to Orange (spurious).
-
-
Spurious (Orange) .
- shown if element has no colour or is-
Adding an element with no colour to the Gold Standard will change the element to pink - indicating it is missing a rule.
-
Adding an Orange element to the Gold Standard will turn it Green - the Rules and the Gold Standard are aligned.
-
-
Spurious (Orange) - this will add the element to the Gold Standard and open the Rule dialog so you can create a new rule to act on this element.
- shown if element has no colour or is -
Rule dialog so you can create a new rule to act on this element.
- this will make no change to the Gold Standard document. It will open the -
- close the dialog with no action taken.
You can have multiple rules acting on the same element. They are implemented in the order they are listed in the Rule Set. For example, you may want to follow links in an element with one rule, and then remove the element from the final document with another rule.
Change the zoom
You can change the zoom on the Full Page document by selecting small, medium or large at the bottom right of the pane.
This can be useful when used in combination with resizing the panes.
Switch Views
You can switch the Full Page document view by checking or un-checking the Plain Doc checkbox in the top right of the pane.
The Plain Doc view displays the Full Page document stripped of all styles and CSS classes.
In some circumstances, this can improve visibility of the elements, making it easier to select the elements you want.