Ingestion Report Example
Introduction
This example shows how you can manage ingestion using a combination of Pre-processing rules and Document Processing rules in the ingestion configuration to better control Ingestion and obtain information about successful, excluded and failed ingested documents.
Pre-processing configuration example
The following example of a Pre-processing configuration.
-
The first rule checks if the ingestion failed. If it failed, it creates a document tag Failed under the Category Ingestion_Report.
-
The second rule checks if the file hash code matches any hash codes listed in the excluded_files word list in the Excluded Files Dictionary. If it does, it creates a document tag Excluded File Hash under the Category Ingestion_Report.
-
The third rule checks if the file extension matches any extensions listed in the excluded_extensions word list in the Excluded Files Dictionary. If it does, it creates a document tag Excluded File Extension under the Category Ingestion_Report.
-
If none of the rules above apply, it creates a document tag Successful under the Category Ingestion_Report.
Pre-processing configuration example
This pre-processing configuration refers to the following Excluded Files Dictionary.
#wordlist excluded_files
c90af0175239f9317460818cfddba22d
#wordlist excluded_extensions
.bat
.bin
.dat
.dll
.exe
.jar
.msi
Pre-processing configuration example
The following is an example of a Document Processing configuration.
-
If ingestion failed, document processing will extract metadata and text without extracting entities. This allows you to take a look at where the ingestion failed and potentially identify the cause.
-
If the document was excluded, only the metadata is extracted.
-
For all other documents, they are fully ingested applying the Default document processing configuration.
Viewing the Ingestion Report
Since document tags with the Category Ingestion_Report were attached to all documents ingested, you can view the Ingestion Report using the Collection Summary tab.
You can then click on the Count number to view the matching documents and explore the ingestion results.