Harvester Rule Sets

What is it?
Requirements

To create/modify Rule Sets, you need to install the Sintelix Extension.

See About the Sintelix Extension for Harvesting and Install Sintelix Extension.

Default Rule Sets

Sintelix provides a number of default rule sets A rule set is a group of rules designed to select the elements on a web page that are most likely to contain useful content—such as headings, authors, dates, captions and paragraphs—and not the boilerplate elements..

Default Rule Sets will have a DEF symbol next to the Rule Set name.

If you modify a default Rule Set, a MOD symbol is shown next to the Rule Set.

You can restore the default Rule Set using the Revert option.

An Admin user can update the global default Harvester Rule Sets (See Configure Harvester Settings).

Apply Rule Sets

When you create a harvesting task, you can select which Rule Sets to apply. By default, all Rule Sets are selected.

From the rule sets selected by the user, Sintelix automatically applies the most relevant rule set to the web page you want to harvest.

They are applied in the order of Rule Set priority, set in the Rule Set Configuration.

Manage Rule Sets

You can:

  • All Rule Sets: copy, export, import and modify

  • Default Rule Sets: If a default Rule Set has been modified, you can revert back to the system default.

  • Created Rule Sets: create, rename, and delete Rules Sets created for this Project.

See Manage Rule Sets.

Effective Rule Sets
How Upgrades affect Rule Sets?

When you upgrade Sintelix:

  • default rule sets that you have not modified will be upgraded.
  • default rule sets that you have modified will be retained and not upgraded or overwritten.
  • rule sets you have created will be retained.
  • new default rule sets may be added to the Harvest Default rule sets configuration.

An Admin user can update the global default Harvester Rule Sets (See Configure Harvester Settings).