Harvest via Sintelix Extension
What is it
You can accurately select specific content to harvest from web pages using Sintelix Extension.
You can manually select elements or choose the advanced option to apply a Rule Set to select elements.
For more information, see About the Sintelix Extension for Harvesting and Install Sintelix Extension.
Access the Manual Harvest Feature
On the web page you want to harvest, to access the Manual Harvest feature, you can either:
-
select the Sintelix Extension icon
from the address bar, and then select the Manual Harvest option.
OR
-
press + to open the Manual Harvest screen (see Keyboard Shortcuts).
Result: The Harvest This Page - Screen 1 is displayed.
Summary Process
This is a summary of the high level process:
-
Choose how you want to select elements from the page:
-
One at a time, or
-
as you select elements, include Similar elements on the page
-
automatically by applying a selected Rule set.
-
-
Select the elements you want to harvest, then select Next.
-
Choose the Project and Collection where you want to save the harvested page.
-
Select Harvest.
Detailed instructions
Follow the steps below, to Harvest manually using the Sintelix Extension:
-
Choose to select elements from the page:
-
One at a time, or
-
have Sintelix include Similar elements from the page
-
-
(Advanced Option) Select the Advanced checkbox to choose a Rule set to automatically select page elements to harvest.
Apply Rule Set to automatically select elements:
Select the Advanced check box, and in the Automatically Select Page Elements :
-
Select/change the project.
-
Select the rule set you want to apply.
-
Click Apply.
Result: Sintelix applies the Rule Set and automatically selects elements on the page.
Due to the varying nature of web pages and the effectiveness of individual rule sets, you may need to manually select and cancel elements to ensure that you harvest exactly what you want.
-
-
Keyboard Shortcuts on the page.
Result: As you select and unselect elements, the Sintelix Extension displays a message showing the number of elements selected.
For example,
.
Click on the Shortcuts link
to view a list of available Keyboard Shortcuts
-
Select the Next button.
Result: The Harvest This Page - Screen 2 is displayed.
-
Select/change the Project and Collection, if required.
Select the Refresh icon
to update the list of projects.
(Optional) Add a New Collection
-
Select the Add new
link to add a new Collection to the project.
Result: The Collection dropdown changes to a data entry field.
-
Enter a collection name.
-
Select the tick icon
beside the field to create the collection.
Result: The collection is created, and the Collection dropdown is displayed showing the new collection name.
-
-
(Advanced Option) Leave checked to harvest the Full Page document in addition to the harvested document (two documents are collected).
The Full Page document is used for evaluating Gold Standard documents and refining Harvester Rule Sets.
Uncheck Full Page to harvest the content only.
-
Select the Harvest button, to harvest the page.
Result: The dialog displays a message confirming the page has been harvested.
Click on the name of the document to view the harvested document in Sintelix.
Keyboard Shortcuts
Select the Shortcuts link to view the keyboard shortcuts.
Select and Unselect Elements
There are different techniques available to select and unselect elements for harvesting:
Selection mode
Using keyboard shortcuts, you can easily switch between selecting elements:
-
one at a time (+ ), or
-
similar elements at the same time (+ ).
Select elements using the mouse
Move your mouse to elements of interest and click on them to select or unselect them:
-
Hover over an element: the element(s) will be highlighted in yellow - change the position of the mouse until you are only highlighting the elements you want to select.
-
Click on the element to select it (and similar elements if in similar elements mode): the element will be highlighted in green with a green border.
Elements highlighted green will be included in the harvested document.
Unselect elements using the mouse
-
Hover over an already selected element: the element will be highlighted in red.
-
Click on an already selected element to unselect it.
The element will have no highlighting, indicating it is not selected and will not be included in the harvested document.
Select or unselect active elements
without triggering an action
-
Select +
-
Click on the element to select/unselect.
Clear ALL selections
To clear all current selections select the Clear button.
Alternatively, press
.