Harvest via Search Engines
Using the two preconfigured search engines in the Harvester, Google and Duck Duck Go, you can perform an instant search within Sintelix. Additional search engines can be added by creating new rule sets. See Creating a rule set for more information. Note that Sintelix Extension must be installed to create rule sets, see Harvest via Sintelix Extension for more information.
To harvest using the search engines:
- Select the project into which you want to harvest the documents.
- Navigate to Harvester, and in the Query tab, ensure Harvest Search Query is selected.
- In the Text Queries box, enter the search items as you would normally do in a search engine.
The search engines are enabled by default. You may clear the check box against the search engine to disable it.
- Select a collection from the Select Collection dropdown.
If no collection is available, you can create a Collection at this stage by clicking the create icon
. See Create a Collection for more details.
- Click Harvest

Optional Tasks
- Advanced - Expand the Advanced section to set up the Search Engine parameters An option that helps narrow your search on the basis of selected criteria: Region, Language, Country, Indexed Date, and Results per Page..
For information on how to configure the Search Engine parameters, see Configuring Harvester rule sets.
- Online Persona - If a persona is created, select one from the Online Persona dropdown list.
A persona is needed only if you are harvesting content from sites that requite you to login. See Create a login Persona for more information.
- rule set Options - Expand the rule set Options to view the default the rule sets.
- You can define the Depth of the rule set by entering a number in the field. This will override the default depth.
- You can clear the Enabled check box against the rule sets to disable it.
- Harvester rule sets are configurations and can only be modified or created by users with the required permissions. See Configuring Harvester rule sets for more information.
- Save - To save this search query for future reference, click Save.
To view the saved search, open the project, click Harvester > Saved > Open Query.
- Copy into Batch - To add this search to a batch job, click Copy into Batch, do one of the following, and select Save.
- Select a job from the Existing Batch Job dropdown.
- Type the new batch job name in the Create a new Batch Job text box.
See Batch Harvest for more information.
- Preview and Refine - Select Preview & Refine.
The results displayed in the Preview & Refine window will expire after 8 hours.
- Selecting Preview & Refine at this stage, only displays the URLs, these are not a part of your Collection yet.
- To avoid adding some of the URLs to your collection, clear the check box against them. The URLs are automatically dropped into the Ignore URLs list.
- Note that ignoring the URLs applies to the current search level only. To apply this to the project level, click Add Unselected to Project Blacklist. See Blacklist for more information.
- Harvest Parameters - To set up a parameter, select one or all the check boxes from the Harvester Parameters section. The following are the parameters and a description of each of them:
- Harvest Full Page - Harvests the complete web page, its content and the boilerplate elements. Each full page is saved in a separate document in the same collection. Select this option if harvesting a web page to create a gold standard.
- Capture Screenshots - Creates the screenshots of the websites that are harvested.
- Disable Adblocker - Disables the installed adblock capability.
- Random Wait - To add a delay between the pages requested to the same domain, click Random Wait. On the Random Wait Time (per Domain) dialog, do the following:
- Enter the domain name in the Domain Group section. Use a separate line for each domain entry.
- Enter the number of minimum and/or maximum delay in second, and click Save.
Adding a delay at this stage overrides the delay settings added by your Administrator.
- Set as Default - To set your selections made in the Search Engines and their parameters, Online Persona, rule set Options, and Harvest Parameters, to default, click Set as Default. The default settings apply to your current project only.