Simplifying the data access with iterators
This blog post describes the recent update of the Python library that can be used to access the using our News API. The update significantly simplifies the way in which you can iterate through the articles or events that are a result of a search query.
We will illustrate the change in an example and describe the old and the new way of iterating over the results. We will assume that you’d like to find news articles about “George Clooney”.
Old approach
Assuming that there are hundreds of news articles about Clooney, the code that would download all articles about him would look something like this:
from eventregistry import *
er = EventRegistry()
q = QueryArticles(conceptUri = er.getConceptUri("George Clooney"))
page = 1
while True:
q.setRequestedResult(RequestArticlesInfo(page = page))
res = er.execQuery(q)
for article in res["articles"]["results"]:
print(article) # do something with the article here
if page >= res["articles"]["pages"]:
break
page += 1
With QueryArticles()
we say that we want a search over the news articles (and not events) and with the conceptUri
parameter we specify what to search for. q.setRequestedResult()
specifies in what form do we want to obtain the results of the query – in our case, this is RequestArticlesInfo()
which is simply the list of matching articles (different options are described in the documentation page). The code becomes tedious because we have to query results per page like we’re used to when we Google things. For that reason, we create a loop and we exit it once we’ve gone through all the available pages of the results.
New approach
Now let’s have a look at how we can do this more nicely, without the manual iteration through the pages of results. We’ve added some new classes that can be used as iterators, namely QueryArticlesIter
, QueryEventsIter
and QueryEventArticlesIter
.
The top example can now simply be written as follows:
from eventregistry import *
er = EventRegistry()
q = QueryArticlesIter(conceptUri = er.getConceptUri("George Clooney"))
for art in q.execQuery(er):
print(art) # do something with the article here
We have replaced the QueryArticles
with the QueryArticlesIter
class. By calling execQuery()
on it, we make the search for articles and return an iterator. Since the iterator has to automatically query different pages of results it also expects the EventRegistry
class instance as the argument.
When calling the execQuery()
method you can, of course, also specify the order in which you would like to obtain the articles as well as the details of the articles to be returned. All details about the class and the parameters are available in the documentation.
Other Iterators
In the above example, we have only described the QueryArticlesIter
that can be used to iterate over the results of a search for articles. As mentioned, we have added two more iterators.
The QueryEventsIter
is an iterator class that can be used to iterate over the events that are results of an event search. It can be used as a replacement for the QueryEvents
class and its details with an example are described here.
Lastly, the QueryEventArticlesIter
class enables one to simply iterate over the list of articles that are associated with a particular event. Again, the details and an example are described here.