Harvest via Sintelix Extension

What is it

You can accurately select specific content to harvest from web pages using Sintelix Extension.

You can manually select elements or choose the advanced option to apply a Rule Set to select elements.

For more information, see About the Sintelix Extension for Harvesting and Install Sintelix Extension.

Access the Manual Harvest Feature

On the web page you want to harvest, to access the Manual Harvest feature, you can either:

  • select the Sintelix Extension icon from the address bar, and then select the Manual Harvest option.

OR

Result: The Harvest This Page - Screen 1 is displayed.

Summary Process

This is a summary of the high level process:

  1. Choose how you want to select elements from the page:

    • One at a time, or

    • as you select elements, include Similar elements on the page

    • automatically by applying a selected Rule set.

  2. Select the elements you want to harvest, then select Next.

  3. Choose the Project and Collection where you want to save the harvested page.

  4. Select Harvest.

Detailed instructions

Follow the steps below, to Harvest manually using the Sintelix Extension:

  1. Access the Manual Harvest Feature

  2. Choose to select elements from the page:

    • One at a time, or

    • have Sintelix include Similar elements from the page

  3. (Advanced Option) Select the Advanced checkbox to choose a Rule set to automatically select page elements to harvest.

  4. Keyboard Shortcuts on the page.

    Result: As you select and unselect elements, the Sintelix Extension displays a message showing the number of elements selected.

    For example, .

    Click on the Shortcuts link to view a list of available Keyboard Shortcuts

  5. Select the Next button.

    Result: The Harvest This Page - Screen 2 is displayed.

  6. Select/change the Project and Collection, if required.

    Select the Refresh icon to update the list of projects.

  7. (Advanced Option) Leave checked to harvest the Full Page document in addition to the harvested document (two documents are collected).

    The Full Page document is used for evaluating Gold Standard documents and refining Harvester Rule Sets.

    Uncheck Full Page to harvest the content only.

    See Concept: Harvester Gold Standard

  8. Select the Harvest button, to harvest the page.

    Result: The dialog displays a message confirming the page has been harvested.

    Click on the name of the document to view the harvested document in Sintelix.

Keyboard Shortcuts

Select the Shortcuts link to view the keyboard shortcuts.

Select and Unselect Elements

There are different techniques available to select and unselect elements for harvesting:

Selection mode

Using keyboard shortcuts, you can easily switch between selecting elements:

  • one at a time (ALT+A), or

  • similar elements at the same time (ALT+S).

Select elements using the mouse

Move your mouse to elements of interest and click on them to select or unselect them:

  • Hover over an element: the element(s) will be highlighted in yellow - change the position of the mouse until you are only highlighting the elements you want to select.

  • Click on the element to select it (and similar elements if in similar elements mode): the element will be highlighted in green with a green border.

    Elements highlighted green will be included in the harvested document.

Unselect elements using the mouse
  • Hover over an already selected element: the element will be highlighted in red.

  • Click on an already selected element to unselect it.

    The element will have no highlighting, indicating it is not selected and will not be included in the harvested document.

Select or unselect active elements

without triggering an action

  1. Select ALT+W

  2. Click on the element to select/unselect.

Clear ALL selections

To clear all current selections select the Clear button.

Alternatively, press ESC.