In this article we will write about how to make sentiment analysis using Google Cloud Platform Natural Language API.
In this post, we will perform a sentiment analysis of news articles from different publishers, and then we will compare them to see which ones have more positive coverage, and which have more negative. We will use Datanews News API and Google Cloud for news extraction and analysis.
Datanews provides an API for retrieving and monitoring news from various
sources. For our task, we will use the /headlines
end-point to fetch 25 most recent articles from
the following sources:
cnn.com
techcrunch.com
nytimes.com
theguardian.com
Google Cloud provides computing resources to solve various engineering problems. Its Natural Language API allows one to extract entities from the text, analyze its syntax, perform classification and, of course, run sentiment analysis. The latter computes two values for a document:
score
- a number in the range [-1.0; 1.0]
that represents an average emotional
leaning of the text.
magnitude
- a number in the range [0.0; inf)
that represents the overall
strength of the emotion.
The following example is one way to interpret those values:
"score": 0.8, "magnitude": 3.0
"score": -0.6, "magnitude": 4.0
"score": 0.1, "magnitude": 0.0
"score": 0.0, "magnitude": 4.0
This particular example was taken from the official documentation. Be sure to check it out to learn more about the API.
First of all, let's install all the necessary dependencies:
pip install --upgrade google-cloud-language datanews
We also need to set up a Google Cloud account and download credentials file. You can find more info about it here.
We also need an API key to use Datanews which you can get at Datanews.io.
With this out of the way, let's get down to coding.
First, let's import some of the Python libraries we will use later.
import os # We need this to retrieve credentials from environment variables
from google.cloud import language_v1 as lang # This is the Natural Language API
import datanews # This is the official Datanews library for Python
Let's check that we've provided all the necessary credentials before we proceed any further.
assert 'GOOGLE_APPLICATION_CREDENTIALS' in os.environ, 'Google Cloud credentials are not specified'
assert 'DATANEWS_API_KEY' in os.environ, 'Datanews API key is not specified'
datanews.api_key = os.environ['DATANEWS_API_KEY']
We use environment variables GOOGLE_APPLICATION_CREDENTIALS
and DATANEWS_API_KEY
to
specify the Google Cloud credentials and the Datanews API key respectively.
Let's list all the publishers, whose articles we will analyze.
sources = [
'cnn.com',
'techcrunch.com',
'nytimes.com',
'theguardian.com'
]
Now, we can fetch all the necessary articles using news API:
articles = {}
for source in sources:
most_recent = datanews.headlines(source=source, page=0, size=100, sortBy='date')
articles[source] = [article['content'] for article in most_recent['hits']]
As you can see, we are retrieving 100 most recent publications for every source. 100 articles will give us a good approximation for our task. The last step is to perform the sentiment analysis.
client = lang.LanguageServiceClient()
for source, news in articles.items():
magnitude, score = 0, 0
for article in news:
document = lang.Document(content=article, type_=lang.Document.Type.PLAIN_TEXT)
sentiment = client.analyze_sentiment(request={'document': document})
magnitude += sentiment.document_sentiment.magnitude
score += sentiment.document_sentiment.score
avg_magnitude = magnitude / len(news)
avg_score = score / len(news)
print(source)
print(f'\taverage magnitude: {avg_magnitude}')
print(f'\taverage score: {avg_score}')
This piece of code queries Natural Language API for each article and computes an average magnitude and score for each news source. It will produce an output, similar to the following (note that your computed scores may be different):
cnn.com
average magnitude: 10.219999933242798
average score: -0.23600000351667405
techcrunch.com
average magnitude: 17.51199999809265
average score: -0.0040000006556510925
nytimes.com
average magnitude: 12.667999992370605
average score: -0.15600000381469725
theguardian.com
average magnitude: 13.84400001525879
average score: -0.06800000250339508
We can see that techcrunch.com
and theguardian.com
are close to neutral, while
cnn.com
and nytimes.com
have strong negative direction. Which makes sense when
looking on their audience and content, Techcrunch focuses on tech, while New York Times and CNN are more
general.
In this article we have briefly discussed Google Cloud's Natural Language API, Datanews News API and performed sentiment analysis on some of the most recent news articles. Then, made a small comparison of publishers to see their general sentiment approximation.
Sentiment analysis is only one of several analysis models that GCP provides, you can easily extend the code to include keyword extraction, text classification and entity analysis.
From us to your inbox weekly.