Process and Reprocess Documents

Process Documents

Documents are processed automatically whenever a new collection is created or documents are added into an existing collection.

See Add (Ingest) Documents for more information about adding documents for processing.

Reprocess Documents

However, you may want to Reprocess documents after they have been added, for example, to apply changes to the Ingestion Configuration or to apply a different Ingestion Configuration.

Reprocessing documents removes all the markup, and regenerates new markups using the selected Ingestion and Document Processing configuration.

This also means that all the manual markups made to a document will be lost, unless you choose to preserve them. See Preserve Markups and Document Tags for more information.

Reasons

The following are some reasons you may want to reprocess all documents in a collection A collection is a container for storing and organising ingested files and documents. Only the textual content is stored in collections, not the original files and documents.:

  • The Ingestion Configuration A Sintelix configuration responsible for managing ingestion settings. has changed since the documents were processed, and you want to pick up the changes.
  • The Document Processing Configurations referred by the Ingestion Configuration has changed since the documents were processed, and you want to pick up the changes.
  • You want to use a different Ingestion Configuration.
  • You want to remove all of the manual markup from the collection.
Ingestion Configuration

When reprocessing documents, you can choose the Ingestion Configuration to apply.

Select the dropdown to choose the required ingestion.

Result:

If the Ingestion Configuration includes:

  • deduplication, deduplication will have a green tick, otherwise it is greyed out .

  • network update, network update will have a green tick, otherwise it is greyed out .

If you want to view the configuration, select the open icon.

If you have the required permissions, you can edit the Ingestion Configuration. See Ingestion for more information.

Reprocess the Collection

In the Collections tab:

  1. You can either:

    • reprocess all the documents in the collection, by clicking the Reprocess button:

    • reprocess a specific document, by clicking the Reprocess icon available for the document in the Action column.

    Result: The Reprocess Documents dialog is displayed.

  2. Select the required Ingestion Configuration from the Select an Ingestion Configuration dropdown.

  3. If manual markups are created, they are displayed in Reprocess Document dialog.

    1. To remove the markup, select them, and click Reprocess.

    2. To preserve them, clear them and click Reprocess.

Preserve Markups and Document Tags

The following types of markups are automatically preserved during reprocessing, as they are fact categories (that is, name-spaces), with protected status. Document processing The Sintelix configuration responsible for managing the extraction of information from your documents. is not allowed to add, change, or delete facts that belong to these categories.

However, if you have manually added a document tag A configuration used for automatically adding document tags to Sintelix documents based on a pre-trained model, the tag is displayed in the Reprocess Documents dialog.

To preserve the tags while reprocessing, clear the selection and select Reprocess: