Harvest via Sintelix Extension
The Sintelix Extension is a Chrome Extension that enables you to harvest the page that you are on. It is particularly useful when you only need to harvest a handful of webpages and want to select the specific content to harvest from those pages. The Extension applies the most relevant rule set to the page.
- Sintelix Extension is available from Sintelix 7.0 and above.
- This extension plugin requires Google Chrome or Chromium.
- Sintelix Extension is not installed by default and must be installed via the chrome web store. See Install Sintelix Extension Online.
- You can create a gold standard for harvesting similar pages. See Harvest Web Pages for Gold Standard > Using Sintelix Extension for more information.
- You can harvest in incognito mode. See the steps under Enable incognito mode in this topic for more information.

Harvest Manually via Sintelix Extension
- Using Google Chrome or Chromium, go to the web page you want to harvest.
- Click the Sintelix Extension icon
at the top right of the Google Chrome screen, and select Manual Harvest.
The wikipedia page on Great Expectations, is used as an example in this topic.
- On the Sintelix Harvester dialog, select one of the following:
- One at a time - Use this to manually select specific content on a web page. See step 5 of this topic for more details.
- Similar elements - Use this select content of similar element type. You would prefer this when you want to select content of the same style. For example, if you want the Heading content of a web page to be picked up, select this option and click a heading in the web page, this automatically selects all the headings in the page. See step 5 of this topic for more details.
- Select the Advanced check box, and in the Automatically Select Page Elements section do the following:
- Select your project.
- Select the rule set you want to apply to the page.
- Click Apply.
- The results are displayed are similar to the following example. Note that the body content is only harvested while the other parts of the page are ignored.
- The elements selected by the rule set are highlighted in green.
- Text shaded in black (such as superscript references to footnotes and hyperlinks to edit text), or surrounded by a black border (such as sidebars and tables of contents), will be excluded from the harvest.
- The black border indicates that even though text within the border has been selected for harvesting (green), the rule set states that the block containing the text is to be excluded from the harvest.
Due to the varying nature of web pages and the effectiveness of individual rule sets, you may need to manually select and cancel elements to ensure that you harvest exactly what you want.
- If you are not satisfied with the selection made by the rule set you can do the following:
- retain some of the selections made by the rule set and manually select/clear other elements
- clear all selections and manually select the elements you want
- apply a different rule set
- Do one or more of the following:
Select individual element
- Click One at a time under the Select the Page Elements you Want section
- Hover over the element (it is highlighted in yellow) then click on it.
The element is highlighted in green to indicate that it is selected, as displayed in the following screenshot:
Clear the selection
- Select One at a time (if it is not already selected).
- Hover over the element (it is highlighted in pale red) then click on it.
The highlight is removed from the element, as displayed in the following screenshot:
Select the text on a button or an element that will trigger an action without triggering the action
- Hover over the element (so that it is highlighted in yellow).
- Select Alt+W on your keyboard.
The element is highlighted in green.
Select all similar elements (for example, all paragraphs or headings):
- Select Similar elements
- Hover over one of the elements.
All similar elements are highlighted in yellow.
- Select the element. All similar elements are highlighted in green to indicate that they have been selected.
Clear current selection or all similar elements
To clear all current selections click Clear.
To clear similar elements:
- Select Similar element (if it is not already selected).
- Hover over one of the elements (all similar elements are highlighted in pale red).
- Select the red pale highlight.
The green highlight is removed from the similar elements.
Apply a different rule set (to see which elements another set would select or exclude)
- Select the set from the rule set dropdown.
- Select Reapply.
The number of elements selected by the rule set is shown under the rule set dropdown list.
- When you are satisfied with the text selected for harvesting, do one of the following:
- Add the harvested text to an existing Sintelix Collection - Select the target project and its collection from the corresponding dropdown lists.
The number of documents in the collection is shown below the Selection modes.
- Create a new Sintelix collection for the harvested text:
- Select Add beside the Sintelix Collection dropdown list.
- Select the project you want to add the collection to.
- Enter a name for the collection then select the tick icon
beside the field.
- Add the harvested text to an existing Sintelix Collection - Select the target project and its collection from the corresponding dropdown lists.
- Do one of the following:
- To send only the selected elements to the Sintelix collection click Harvest.
- To send the full page (that is, the content In Sintelix Harvester, content is the text you want to harvest from a web page such as headings, authors, dates, captions and paragraphs (as opposed to the text you want to ignore from menus, sidebars and other boilerplate elements). and boilerplate element Elements on websites other than the content, such as navigation bars, side bars, footers, menus and advertisements.) to the collection select Full page, then select Harvest.
A message confirms that the document has been sent to the Sintelix collection.
- To close the dialog click Close.

Add URL to Store
After you have harvested using URL (see Harvest Using URL for more information), you may want to add the URL to store for future reference.
To add the URL to store:
- Open a web page using Google Chrome or Chromium.
- Select the Sintelix Extension
icon at the top right of the Google Chrome screen, then select Add URL to store.
The URL is stored for future reference. Whenever you perform a URL harvesting, the stored page is displayed, click the stored URL to harvest the page.
- If you want to clear the store, open the Sintelix Extension, Select delete
against the URL .

Enable Incognito mode
If you plan to use the Harvester Extension in incognito mode you must enable the incognito setting.
Go to the Google Chrome/Chromium Extensions tab, under Sintelix Harvester, select Allow in incognito as shown in the following screenshot: