Advanced Search

Sintelix also supports additional Advanced search capabilities including Boolean, Wildcard, Fuzzy and Proximity Searches as well as Term Boosting.

Sintelix offers two modes of search - exact and key word - they have different capabilities (key word search is more powerful). Key word search fields have a white background, while exact search fields have a blue background. In the image below you can see that the Query Text for a Tag search has a blue background, indicating an exact search.

The Query Text for Properties is also usually an exact search, the image shows the default Category and Property Name of "Metadata" and "title", which happen to allow a key word search.

 

Simple Searches

A simple list of key words entered into the search field returns all the items containing at least one of the key words.

Wildcard Searches

·         Sintelix supports single and multiple character wildcard searches.

·         To perform a single character wildcard search use the '?' symbol.

·         To perform a multiple character wildcard search use the '*' symbol.

The single character wildcard search looks for terms that match that with the single character replaced. For example, to search for text or test you can use the search:

Multiple character wildcard searches looks for 0 or more characters. For example, to search for test, tests or tester, you can use the search:

You can also use the wildcard searches in the middle of a term, such as:

You cannot use a '*' or '?' symbol as the first character of a search.

Fuzzy Searches

Sintelix supports fuzzy searches based on the Levenshtein Distance, or Edit Distance algorithm. To do a fuzzy search use the tilde, '~', symbol at the end of a single word term. For example to search for a term similar in spelling to roam use the fuzzy search:

This search will find terms like foam and roams.

An additional (optional) parameter can specify the required similarity. The value is between 0 and 1, with a value closer to 1 only terms with a higher similarity will be matched. For example:

The default that is used if the parameter is not given is 0.5.

Proximity Searches

Sintelix supports finding words that are a within a specific distance away. To do a proximity search use the tilde, '~', symbol at the end of a phrase. For example to search for microsoft and windows within 10 words of each other in a document use the search:

Escaping special characters

Sintelix supports the escaping of special characters that are part of the query syntax. The current list special characters are

+ - && || ! ( ) { } [ ] ^ " ~ * ?

To escape these character add a \ before the character.

"Exact Search" Fields - Tag and Property

Untokenised fields (for example, tag, most properties) can only be searched by looking for exact matches, or a set of possible exact matches.

Sintelix Exact Search is case sensitive.

For example, searching the metadata "file_owner" field with the query:

james

would only yield results if there was a document with a title "james" but it would miss "james long", "henry james", "a critical assessment of henry james", etc.

If you want to find a set of possible exact values you can write them as follows:

james OR willi* OR ben

or simply,

james willi* ben

Where exact match query items have spaces, use double quotes in the query, for example,

"henry james" "william shakespeare" "ben jonson"

Forgetting to use quotes is an easy way to miss results.

"Key Word Search" Fields - Text and Text Reference

tokenised fields (for example, document text and text reference) provide a rich set of search options described below. Tokenization splits the text field up into tokens (words), so that the search query can match just some of the tokens and doesn't have to match the query text exactly.

Sintelix Key Word Search is case insensitive.

Boolean Searches

Boolean operators allow terms to be combined through logic operators. Sintelix supports AND, OR, NOT as Boolean operators (Note: Boolean operators must be ALL CAPS).

Operator: OR

The OR operator is the default conjunction operator. This means that if there is no Boolean operator between two terms, the OR operator is used. The OR operator links two terms and finds a matching document if either of the terms exist in a document. This is equivalent to a union using sets. The word OR can be omitted or replaced by the symbol ' ||' .

To search for documents that contain either the word windows or the phrase "mac os", use the query:

Operator: +

The '+' operator requires that the term after the '+' symbol is found somewhere in a the field of a single document.

To search for documents that must contain microsoft and may contain windows use the query:

If only one term is provided in the query, this implies the term is required without the use of '+':

Operator: AND

The AND operator matches documents where both terms exist anywhere in the text of a single document. This is equivalent to an intersection using sets. The symbol '&&' can be used in place of the word AND.

To search for documents that contain both windows and server use the query:

Operator: NOT and -

NOT operator excludes documents that contain the term after NOT. The symbol '-' can be used in place of the word NOT.

To search for documents that contain "microsoft windows" but not the word license, use the query:

If terms prepended by "+" are mixed with simple unprepended terms, at least one of the unprepended terms must still be present in the returned item. For example, in the following example one of "one", "two" and "three" must be present along with "four":

one two three +four

Grouping

Sintelix supports using parentheses to group clauses to form sub queries. This can be very useful if you want to control the boolean logic for a query.

To search for either "adelaide" or "semantic" and "website" use the query:

(adelaide OR semantic) AND website

This eliminates any confusion and makes sure you that website must exist and either term adelaide or semantic may exist.

Boosting a Term

Sintelix provides the relevance level of matching documents based on the terms found. To boost a term use the caret, '^', symbol with a boost factor (a number) at the end of the term you are searching. The higher the boost factor, the more relevant the term will be.

Boosting allows you to control the relevance of a document by boosting its term. For example, if you are searching for:

and you want the term microsoft to be more relevant boost it using the '^' symbol along with the boost factor next to the term. You could type:

This will make documents with the term microsoft appear more relevant. You can also boost phrase terms as in the example:

By default, the boost factor is 1. Although the boost factor must be positive, it can be less than 1 (for example, 0.2 ).