Add (Ingest) Documents

Introduction

You can add many different types of files to Sintelix: text, images, audio/video, mail servers, databases, etc.

Files can be added to Sintelix through Collections or Harvester.

Once files have been added to Sintelix they are automatically ingested and stored as Documents in Collections.

On Demand or Scheduled

Documents can be added to a collection at any time, or use the Ingestion Scheduler to automatically check for new or updated files in a specific library, or new emails from a server, then add them into a collection. For more information, see Scheduling Ingestion.

Document Sources

You can add documents to a collection from the following sources:

  • Server library - fastest and therefore recommended for large volumes of data
  • Local Files (via upload) - slowest, each file must not exceed 100 MB
  • From URLs - where you can save a list of URLs
  • From a Data Source, for example a database.
  • Using an External API to connect to a data source, for example, Facebook or Twitter. See Ingest tweets from Twitter for an example
  • AWS S3 Bucket, if configured (see Configure and Secure Libraries). Behaves in a similar manner to a Server Library.
  • Mail Server, if configured (see Configure and Secure Libraries)

The Administrator can add additional libraries, including the AWS S3 Bucket and Mail Server libraries. You can have more than one Server, From URLs, AWS S3 Bucket and Mail Server libraries.

Document Formats

Sintelix can ingest data in many different formats. Some formats need to be installed and/or configured, including:

Ingestion Configuration

When you add documents, you can choose which Ingestion Configuration to apply. The Ingestion Configuration defines what information is extracted from the documents. The currently selected Ingestion Configuration is displayed to the right of the Add Documents pane.