Configure Harvester Settings

 

From Admin > Harvester Configuration you can:

On installation, default Harvester Settings and Global Harvester Rule Sets are applied.

You can modify the Harvester settings to suit the requirements of your system.

Changes will be applied only to new Harvester jobs and not to the jobs that are running.

Update Harvester Settings

Accessing the Harvester Settings

Depending on how the Sintelix Agent was installed, the Harvester Settings may be accessed locally within Sintelix, or remotely in the Sintelix Agent.

The Harvester Settings are exactly the same, no matter how they are accessed.

Local Harvester Settings

To update the local Harvester Settings:

  1. Select Admin > Harvester Configuration
  2. Change the System Harvester Settings as required (seeSystem Harvester Settings).
  3. Select Save Settings.

Remote Harvester Settings

If the Sintelix Agent was installed remotely, the harvester settings are accessed from within the Sintelix Agent.

  1. Log in to the Sintelix Agent (see Login to the Sintelix Agent)
  2. Select Harvester Settings in the top right of the screen
  3. Change the System Harvester Settings as required (seeSystem Harvester Settings).
  4. Select Save Settings.
System Harvester Settings
Maximum Concurrent Harvest Jobs Enter an integer between 0 to 65 .

It is recommended that the maximum number of concurrent browsers should be set to half the number of the Server’s cores on which Sintelix is running. For instance, if Sintelix is installed on a server with 8 cores, the maximum number of concurrent browsers should be set to 4. Anything above this will impact the performance.

Maximum Workers Set the maximum number of workers available to the system. A running harvest job is backed by workers. Workers increase the speed of harvest. You may set a limit for each harvest job to reserve capacity for multi-user environment.
Same-domain Wait Time Enter the number of minimum and/or maximum delay in seconds . This will add a delay between page requested to the same domain

Adding a delay prevents an IP being blocked due to high traffic.

Worker Screen Size Type the width and height, in pixels, to define the screen size of the browser

The screen size can be set to larger than the monitor.

Proxy Set up a proxy:
  1. Under Proxy, select the Use a proxy check box
  2. Under Socks Proxy Configuration, enter your Host, Port and Version details
  3. Under Http Proxy Configuration, enter your Host and Port details

    If you are running the TOR locally and want to perform a dark web harvest, set the socks proxy host to 127.0.0.1 and the port to 9150. Refer to Harvest content from the Dark Web for more details.

Extra Browser Parameters Enter any additional parameters, such as disable web-security for Chrome and so on.

Update Harvester Rule Sets

Harvester rule sets are periodically updated to keep up with front end changes made to websites that Harvester rule sets are designed to harvest content from. This option allows you to check and update your global default Harvester rule sets automatically.

To update your Global Harvester rule sets:

  1. In Sintelix, navigate to Admin > Harvester Configuration
  2. Scroll down to the Global Harvester rule sets section
  3. Under System Wide Updates, if there are any updates available, select Update Global rule sets.

    If successful, a notification will appear with the message ‘No need to update your harvester rule sets’