How to make complex search queries

When making searches, you can specify multiple filters simultaneously. When you specify two or more filters, the effect is that you retrieve content that matches ALL types of filters. For example, if you specify a keyword and a category, then the resulting articles will be in the specified category AND mention the keyword.

Sometimes you however want to have an OR between different filters. Most common use case for this is that you want to search using some concepts and keywords and want to get results that mention any of those, for example:

keyword 1 OR keyword 2 OR concept 1

The only way to do this is to build the query using the advanced query language. The query that you will build in this case is in a form of a JSON object and the way how you will use it, depends on the language used.

If you're using Python, then you will use it by calling something like:

q = QueryArticlesIter.initWithComplexQuery("""
{ 
    "$query": { ... } 
}
""")

If instead you're using REST API, you can specify the complex query as:

{
    "query": { "$query": { ... } },
    "articlesPage": 1,
    "apiKey": "YOUR_API_KEY"
 }

How to build a complex query

If you look at the grammar of the complex query, you can see that each complex query has a "$query" part and potentially a "$filter" part.

The $query block

The $query part can have a nested list of conditions with filters that are also otherwise available - keywords, concepts, sources, etc. The new thing is that you can have these conditions grouped into Boolean AND and OR lists of blocks.

As an example, let's assume that we want to find news that mentions the concept Bitcoin or keywords Ethereum or Litecoin. The query in this case should look like this:

{
    "$query": {
        "$or": [
            {
                "conceptUri": "http://en.wikipedia.org/wiki/Bitcoin"
            },
            {
                "keyword": {
                    "$or": [
                        "Ethereum",
                        "Litecoin"
                    ]
                }
            }
        ]
    }
}
An example query where the results will contain articles about Bitcoin or Ethereum or Litecoin

As we can see, on the outer side we have an OR block that has two base queries in it - one with a concept and one with two keywords, where we have also specified an OR between them.

What if now, we wish to extend the query and limit the articles on to business related news which are about Bitcoin, Ethereum or Litecoin? In that case, we have to again extend the query and add an AND block like this:

{
    "$query": {
        "$and": [
            {
                "$or": [
                    {
                        "conceptUri": "http://en.wikipedia.org/wiki/Bitcoin"
                    },
                    {
                        "keyword": {
                            "$or": [
                                "Ethereum",
                                "Litecoin"
                            ]
                        }
                    }
                ]
            },
            {
                "categoryUri": "news/Business"
            }
        ]        
    }
}

As we have demonstrated you can easily nest queries in this way and create arbitrarily complex queries with several AND and OR groups of queries.

In the above example, we just had simple base queries - each just had either conceptUri, keyword or categoryUri condition in it. This of course doesn't have to be so - you can have several types of filters in each base query. In that case, there will be an AND considered between the individual types. For example, the query:

{
    "$query": {
        "conceptUri": "http://en.wikipedia.org/wiki/Bitcoin",
        "sourceUri": "coindesk.com"
    }
}
A query, where there is a Boolean AND between the two conditions

would return you the articles about Bitcoin, published by CoinDesk.

Additional part that we haven't mentioned yet is that you can also specify what you don't want to get in the results. The base query as well as the $and and $or blocks can also have an associated $not value, where you can specify conditions and the content matching these conditions will be excluded from the final results.

The $filter block

In addition to saying what content you want using the $query block, you can optionally also specify the $filter block, where you can specify what data types you want to search, what you want to do with duplicates, what is the sentiment that you want to limit the results to, etc.

If, for example, you'd like to skip articles that are duplicates (=copies of other articles, that were already previously published) and only retrieve articles with positive sentiment, then you can put the following $filter:

{
    "$query": { ... },
    "$filter": {
        "minSentiment": 0.35,
        "isDuplicate": "skipDuplicates"
    }
}
An example of a filter where we limit the results to articles with positive sentiment and skip the article duplicates

A note about using query parameter in REST calls

One thing you should note about using the "query" parameter in REST API calls. If you specify the "query" parameter, you should make sure that you don't also specify any search parameters or filters outside of the query. A REST call with parameters like:

{
    "keyword": "Bitcoin",
    "isDuplicateFilter": "skipDuplicates",
    "query": { "$query": { ... } },
    "apiKey": "YOUR_API_KEY"
 }

will be rejected as an invalid query as it provides search parameters directly as well as in the "query". This makes it impossible for us to understand which parameters are correct, which is why such a query cannot be processed.