Collection Evaluation tab

Background
Once you have created a Rule Set and added documents to a Gold Standard Collection, you can create and modify the rules.
Refer to Create a Rule Set: Sintelix Extension, Harvest to a Gold Standard Collection and Evaluate and Modify the Rule Set for information.
When you select Configurations > Harvester Rule Sets, the Rules Set panes are displayed.
The Collection Evaluation tab is displayed in the left pane.
Collection Evaluation tab
When you open a Rule Set, the last Collection linked to the Rule Set will be displayed in the Collection Evaluation tab.
On this pane, you can perform three key tasks:
-
Selecting a Document to Evaluate and refine the rules.
Selecting the Gold Standard Collection
On the Collection Evaluation tab, confirm that the Gold Standard collection associated with the Rule Set you are evaluating and modifying is selected.
If the Rule Set was originally copied or imported, you may need to:
-
create a new Gold Standard Collection, and then select the Collection
-
select and/or change the Collection linked to the Rule Set.
Updating the Evaluation Table
You need to select
when you:-
change the collection or
-
add/remove documents from the collection.
When you change a rule, you can select
to evaluate the updated rules against the Gold Standard Collection of documents.Selecting a Document to Evaluate
To select a document to evaluate, select the document name from the Evaluation Table.
Result: The Full Page document will be displayed in the centre pane and the Gold Standard document will be displayed in the right pane, under the Gold Standard tab.
Once you have selected the document you want to use to evaluate and refine the rules, you can select the collapse icon to collapse the Collection Evaluation pane. This gives you more screen space as you create and modify the rules.
Understanding the Evaluation Table
The Evaluation Table:
-
lists the documents in the selected Gold Standard Collection
-
evaluates the rules against each document to count the correct, spurious or missed elements.
-
calculates a summary score for each column and an overall score.
Element Status: Colour Coding
Elements are colour coded to indicate their status:
-
correct means that the elements are in the gold standard and have been selected by the rule set.
-
spurious means that the elements are not in the gold standard but have been selected by the rule set.
Corrective Action: To correct this error, you either:
-
add the element to the Gold Standard, or
-
change/remove the rule to avoid capturing the spurious element.
-
-
missed means that the elements are in the gold standard but have not been selected by the rule set.
Corrective Action: To correct this error, you either:
-
remove the element from the Gold Standard document, or
-
change/add a rule to capture the missed element.
-
The Errors tab provides a list of each error summarised in the Evaluation Table. The Errors tab provides a quick and easy way to clear errors. See Errors tab: Quickly Fix Errors
Rule Set Scoring
The F1 score indicates the precision with which the rule set is selecting the text you want and the level of recall it is achieving (that is, whether it’s missing a few or many elements). An F1 score of 1 indicates perfect precision and recall.
Click on a document in the table to display the Full Page document with correct, spurious and missed elements highlighted.