News sentiment analysis with News API and GCP Natural Language API

In this article we will write about how to make sentiment analysis using Google Cloud Platform Natural Language API.

...
Vasyl Teliman
...
Google Cloud Platform Logo

News sentiment analysis with News API and GCP Natural Language API

Introduction

In this post, we will perform a sentiment analysis of news articles from different publishers, and then we will compare them to see which ones have more positive coverage, and which have more negative. We will use Datanews News API and Google Cloud for news extraction and analysis.

News API overview

Datanews provides an API for retrieving and monitoring news from various sources. For our task, we will use the /headlines end-point to fetch 25 most recent articles from the following sources:

  • cnn.com
  • techcrunch.com
  • nytimes.com
  • theguardian.com

Google Cloud Natural Language API

Google Cloud provides computing resources to solve various engineering problems. Its Natural Language API allows one to extract entities from the text, analyze its syntax, perform classification and, of course, run sentiment analysis. The latter computes two values for a document:

  1. score - a number in the range [-1.0; 1.0] that represents an average emotional leaning of the text.
  2. magnitude - a number in the range [0.0; inf) that represents the overall strength of the emotion.

The following example is one way to interpret those values:

  • Clearly positive: "score": 0.8, "magnitude": 3.0
  • Clearly negative: "score": -0.6, "magnitude": 4.0
  • Neutral: "score": 0.1, "magnitude": 0.0
  • Mixed: "score": 0.0, "magnitude": 4.0

This particular example was taken from the official documentation. Be sure to check it out to learn more about the API.

Implementation

First of all, let's install all the necessary dependencies:

pip install --upgrade google-cloud-language datanews

We also need to set up a Google Cloud account and download credentials file. You can find more info about it here.

We also need an API key to use Datanews which you can get at Datanews.io.

With this out of the way, let's get down to coding.

First, let's import some of the Python libraries we will use later.

import os  # We need this to retrieve credentials from environment variables
from google.cloud import language_v1 as lang  # This is the Natural Language API
import datanews  # This is the official Datanews library for Python

Let's check that we've provided all the necessary credentials before we proceed any further.

assert 'GOOGLE_APPLICATION_CREDENTIALS' in os.environ, 'Google Cloud credentials are not specified'
assert 'DATANEWS_API_KEY' in os.environ, 'Datanews API key is not specified'

datanews.api_key = os.environ['DATANEWS_API_KEY']

We use environment variables GOOGLE_APPLICATION_CREDENTIALS and DATANEWS_API_KEY to specify the Google Cloud credentials and the Datanews API key respectively.

Let's list all the publishers, whose articles we will analyze.

sources = [
    'cnn.com',
    'techcrunch.com',
    'nytimes.com',
    'theguardian.com'
]

Now, we can fetch all the necessary articles using news API:

articles = {}

for source in sources:
    most_recent = datanews.headlines(source=source, page=0, size=100, sortBy='date')
    articles[source] = [article['content'] for article in most_recent['hits']]

As you can see, we are retrieving 100 most recent publications for every source. 100 articles will give us a good approximation for our task. The last step is to perform the sentiment analysis.

client = lang.LanguageServiceClient()

for source, news in articles.items():
    magnitude, score = 0, 0

    for article in news:
        document = lang.Document(content=article, type_=lang.Document.Type.PLAIN_TEXT)
        sentiment = client.analyze_sentiment(request={'document': document})
        magnitude += sentiment.document_sentiment.magnitude
        score += sentiment.document_sentiment.score

    avg_magnitude = magnitude / len(news)
    avg_score = score / len(news)
    print(source)
    print(f'\taverage magnitude: {avg_magnitude}')
    print(f'\taverage score: {avg_score}')

This piece of code queries Natural Language API for each article and computes an average magnitude and score for each news source. It will produce an output, similar to the following (note that your computed scores may be different):

cnn.com
    average magnitude: 10.219999933242798
    average score: -0.23600000351667405
techcrunch.com
    average magnitude: 17.51199999809265
    average score: -0.0040000006556510925
nytimes.com
    average magnitude: 12.667999992370605
    average score: -0.15600000381469725
theguardian.com
    average magnitude: 13.84400001525879
    average score: -0.06800000250339508

We can see that techcrunch.com and theguardian.com are close to neutral, while cnn.com and nytimes.com have strong negative direction. Which makes sense when looking on their audience and content, Techcrunch focuses on tech, while New York Times and CNN are more general.

Conclusion

In this article we have briefly discussed Google Cloud's Natural Language API, Datanews News API and performed sentiment analysis on some of the most recent news articles. Then, made a small comparison of publishers to see their general sentiment approximation.

Sentiment analysis is only one of several analysis models that GCP provides, you can easily extend the code to include keyword extraction, text classification and entity analysis.

...
Vasyl Teliman

Get our stories delivered

From us to your inbox weekly.