Rules Tab: Modify Rules

What you can do

When you open a Rule Set, the Rules tab is displayed in the right pane.

The Rules tab lists the rules in the Rule Set. In the Rules Tab you can:

Copy a rule

To copy a rule, select the copy icon next to the rule.

Result: The copied rule will be added just below the current rule, and will have the same name.

This can be useful when you want to:

  • test alternative settings on a rule while keeping a backup of the original rule

  • create rules that are similar with slight variations in tags or classes

  • create a negative rule to work with the copied rule.

Edit a rule

To edit a rule, simple click on the rule to open the Rules dialog. See Rules: Fields and Options.

Change the Order

The position of the rules in a Rule Set Configuration is important, rules at the bottom will overwrite those at the top if the Rule Paths overlap.

You can change the order of the rules, by clicking and dragging the rule up or down the list.

Order Rules are Applied

The order in which rules are applied to a URL is:

  1. If the setting Wait random time before harvesting is selected, the rule set waits a random time.
  2. The page is loaded.
  3. If the Pre-click before other rules option is selected in any rules, the buttons to which these rules apply (for example 'Show more') are clicked and Harvester waits for the duration of the 'rule set wait' period to allow this content to be loaded.
  4. There are two 'wait' settings related to rule sets:

    • 'rule set wait', where you can configure individual rule sets to wait a specified amount of time (up to 60 seconds) before harvesting to enable pages to load completely
    • 'random wait', where Sintelix Harvester goes to the websites in the Harvest Queue then waits a random amount of time (up to 60 seconds) before it begins harvesting text to mimic patterns of human interaction with websites
  5. Positive rules are applied.
  6. Negative rules are applied (overriding positive rules where there is contention).
  7. Rules that require a previous element to be selected (h1, h*, p etc) are applied.
  8. Rules that require (any) previous element to be selected are applied.
  9. Links to be pushed to the search queue, if the current depth <= harvest depth, are selected.
Test Selected Rules

You can test the effect selected rules by selecting the checkbox next to the rule(s) you want to test.

This can useful when it is not clear which elements are affected by a rule.

Result: This will:

  • make all unselected rules inactive, and

  • update the Full Page document to colour code only those elements impacted by the selected rule(s).

Delete Selected Rules

You can delete one or more rules:

  • select the checkbox next to the rules you want to delete, and

  • select the Delete Selected Rules button (which is only displayed when at least one rule is selected).

Result: The selected rule(s) are removed from the list.

Auto Simplify All Rules

To automatically remove extraneous tags and classes from the path of every rule, select Auto Simply All Rules.

Simplified rules are more generic and run faster than rules with more complex paths.

However, Auto Simplify may make the rule too generic.

If the rule selects too many elements when simplified, you can choose to delete the rule and recreate the rule by clicking on the element you want to harvest and selecting Add Rule or Add Both and then manually deleting excess tags or ignoring unnecessary classes to determine the most effective combination.

Update all Entity Tags

Update all Entity Tags by selecting the Mark-up Entity Tags button - This will update the full collection with all Entity Tags which have been associated with a Rule.