Process and Reprocess Documents
Process Documents
Documents are processed automatically whenever a new collection is created or documents are added into an existing collection.
See Add (Ingest) Documents for more information about adding documents for processing.
Reprocess Documents
However, you may want to Reprocess documents after they have been added, for example, to apply changes to the Ingestion Configuration or to apply a different Ingestion Configuration.
Reprocessing documents removes all the markup, and regenerates new markups using the selected Ingestion and Document Processing configuration.
This also means that all the manual markups made to a document will be lost, unless you choose to preserve them. See Preserve Markups and Document Tags for more information.
Reasons
The following are some reasons you may want to reprocess all documents in a collection A collection is a container for storing and organising ingested files and documents. Only the textual content is stored in collections, not the original files and documents.:
- The Ingestion Configuration A Sintelix configuration responsible for managing ingestion settings. has changed since the documents were processed, and you want to pick up the changes.
- The Document Processing Configurations referred by the Ingestion Configuration has changed since the documents were processed, and you want to pick up the changes.
- You want to use a different Ingestion Configuration.
- You want to remove all of the manual markup from the collection.
Ingestion Configuration
When reprocessing documents, you can choose the Ingestion Configuration to apply.
Select the dropdown to choose the required ingestion.
Result:
If the Ingestion Configuration includes:
-
deduplication,
deduplication will have a green tick, otherwise it is greyed out
.
-
network update,
network update will have a green tick, otherwise it is greyed out
.
If you want to view the configuration, select the open icon.
If you have the required permissions, you can edit the Ingestion Configuration. See Ingestion for more information.
Reprocess the Collection
In the Collections tab:
-
You can either:
-
reprocess all the documents in the collection, by clicking the
button: -
reprocess a specific document, by clicking the Reprocess icon
available for the document in the Action column.
Result: The Reprocess Documents dialog is displayed.
-
-
Select the required Ingestion Configuration from the Select an Ingestion Configuration dropdown.
-
If manual markups are created, they are displayed in Reprocess Document dialog.
-
To remove the markup, select them, and click
. -
To preserve them, clear them and click
.
-
Preserve Markups and Document Tags
The following types of markups are automatically preserved during reprocessing, as they are fact categories (that is, name-spaces), with protected status. Document processing The Sintelix configuration responsible for managing the extraction of information from your documents. is not allowed to add, change, or delete facts that belong to these categories.
- Native - metadata from the original document.
- Metadata - document properties added by Sintelix as part of document ingestion.
- External - source of the document, for example the URL of the uploaded document.
- System - document entities A highlighted text in a Document which is represented as a node in a Network., for example language and time when the document was processed.
However, if you have manually added a document tag A configuration used for automatically adding document tags to Sintelix documents based on a pre-trained model, the tag is displayed in the Reprocess Documents dialog.
To preserve the tags while reprocessing, clear the selection and select
: