Oct 6, 20202 min read

Introducing StreamEA - an Entity Analyzer on Steroids! 🚀

Updated: Oct 10, 2020

Today I’m excited to launch StreamEA, a Python app with NLP superpowers! 🐍🔥

I’ve been busy with that one for a while, I’m so pleased to finally share it with the world!

> Click here to try the app

The app combines the power of the Google Natural Language API with Python Pandas to extract entities from web pages, along with their salience scores!

You only need to upload your Google Language credentials, and you're off!

The app is still in Beta, so your feedback (bug spotting and suggestions) is appreciated! My Twitter DMs are open. :)

Below’s a quick tour of what it does and how to use it.

Step 1 - Upload your GCP credentials

First, you need to upload your JSON key. If you haven’t got one yet, you can follow the instructions here.

Once you’ve downloaded your key, upload it (or drag and drop it) in the file uploader - as follows:

Step 2 - Compare 2 URLs

Currently, StreamEA allows you to compare two web pages (bulk upload is coming ;))

You simply need to paste one URL in each field, e.g.:

Some interesting use cases:

Find entities that exist on competitor’s pages which outrank you, yet are missing from your pages
Differentiate pages on your website
Research topics discover synonyms, alternative lexical fields
Find how well you've covered a specific topic

Step 3 - Estimate API call costs 💰

You can check how much API calls will cost before going ahead. Some useful tidbits regarding pricing:

The usage of the Language API is calculated in ‘units’
1 unit per 1,000 characters
Below's a cost overview - in US dollars:

You can also find more information on how pricing is calculated here.

Step 4 - Send the request to the Google Language API

If you’re happy with the cost, click on “Proceed” to send a request to the API:

Note that the app has yet to work for excessively long articles (like this one). Hopefully, I’ll get that sorted soon.

Now here comes the fun part: getting the results! 🙌

Step 5a - Spot the Top 15 missing entities in your content

That section is great to find entities that exist on a competitor’s page outranking you, yet are missing in your page.

You'll get two tables:

The left table shows the Top 15 entities in URL 01 not in URL 02.
Similarly, the right table shows the top 15 entities in URL 02 not in URL 01
These entities are sorted by salience scores, so only the 15 most relevant are shown

Don't worry, you can also download *full* lists as CSVs - more on that below.

Step 5b - Check the Master table

The master table gathers *ALL* results from the API call:

Column #1: Entity name
Column #2: Salience score in URL#1
Column #3: Salience score in URL#2
Column #4: Entity count in URL#1
Column #5: Entity count in URL#2

A column showing Salience score differences between page 01 and 2 will be added soon.

Step 6 - Export the output data to CSV

Last but not least, you can export these 3 tables independently to CSV:

Shout outs & support

Kudos to BritneyMuller’s recent MozCon talk for inspiring me to create this app! Kudos also to Sascha and the Streamlit community, these folks are always here to help!

Lastly, this app is free and should remain that way. Buy me a coffee if it’s useful to you! 🙏

Drop me a line if questions, bugs or suggestions!