I was browsing a few of the usual subreddits that I frequent the other night and came across a question on how to programmatically find word counts for ranked articles at scale from user u/Cocopoppyhead.
As anyone who has come across my blog before will know, when I’m not tinkering with electronics and bits of code, I work as an SEO—and hands-down my favorite SEO-related topic is robotic process automation, so I figured I’d have a go at answering the question.
I’ve embedded the post below for anyone who might be interested in the original question.
One thing worth mentioning is that while word count isn’t a ranking factor and/or an indication of thin or quality content, as confirmed by Google—it’s still something that many of us in SEO use to benchmark our own content against, and something SEO tools like SurferSEO, Frase, and other similar optimisation tools include in the range of competitor metrics they consider when rating and scoring content.
I won’t go into to much of debate on the merits of word count as a metric—but that said, it’s really about content completeness these days and not length. Not every question or topic needs a 2,000+ word post, as from a user perspective, it’s really best to get to the point and avoid all the typical fluff where you can.
With the intro out of the way, let’s get into answering the question—and most importantly, explaining how my script works to provide word count of ranked competitors, both programmatically and at scale.
How it works
I’ll try to break down and explain as simply as possible how this script works and what it does in the list of bullets that follows:
- The script takes a keyword from an array of keywords and uses SerpStack’s API to get a list of URLs of content ranked within the top 10 positions for a given keyword using Axios.
- Each of these URLs is then scrapped using Puppeteer, with all heading (H1, H2, H3, H4, H5, H6) and paragraph tags pushed into an array.
- This array is then processed, getting word count for each respective element then adding these together to provide a total.
- The total word count—along with the article URL—are written to a CSV file. Each file also contains the keyword, article URLs, and their respective word counts, which can then be easily imported and used in something like Microsoft Excel or Google Sheets.
Once it’s finished running, you’ll end up with an output that looks something like this:

What you’ll need
To use the script as-is, you’ll need a SerpStack account—which you can create for free here—and to setup Node JS on your machine, if you don’t have it installed already. To install Node, head over to the official site to grab the installer and follow the onscreen instructions to set it up.
Given that we won’t be trying to scrap Google or the same competitors hundreds of times, there’s not really any need for a rotating proxy service, as it’s you’re not going to run into the same sorts of issues and blocks as you would when performing these sorts of tasks when running this script.
That’s it really—not much to it. It’s worth noting that with the SerpStack free account, you get 100 queries each month—so you can run this script for 10 keywords p/month without any cost. If you want to increase that limit, you’ll need to upgrade your account. As well as limits, there’s also a lot more functionality offered by SerpStack that we don’t make use of here, so it’s worth having a read of their documentation to see what else you can do with it.
There are plenty of other SERP API tools to choose from—many of which have made it to my round-up of the best no credit card SEO free trials, which you can view here.
The code
Like many of my other SEO-related posts, this isn’t written as a tutorial—instead, it’s more about explaining what the script does and how you can easily get up and running with it. That said, I’ll briefly go over each part of the script and what it does—although in all honesty, it’s pretty self-explanatory (as you’ll see!)
The script is made up of five functions—these, and what they do, is summarised below:
- getURLs()
Makes an API request to SerpStack using Axios, returning the URLs of the first 10 organic results for a given keyword. - getWordCounts()
Uses Puppeteer to scrape each of these URLs, pulling out elements that contain content as strings, explodes these into arrays, and returns the length of each respective element. These are then added up, giving a total word count. - getDateString()
This is used to create a date stamp for naming the output CSV files. - writeCSVFile()
This creates the CSV files—these can be found in the folder from which the script was installed and ran. - run()
This just starts everything.
The script is given below, so you can either copy and paste it or alternatively download the index.js file from my Google drive here.
index.js
const axios = require('axios')
const puppeteer = require('puppeteer')
const fs = require ('fs')
const access_key = `XXXXXXXXXXXXXXXXXXXX` //SerpStack API key
const keywords = [
`keyword 1`,
`keyword 2`,
`keyword 3`,
]
const location = `US`
async function getUrls(params) {
try {
let results = await axios.get('http://api.serpstack.com/search', { params })
.then(response => {
let organic_results = response.data.organic_results
let urls = [];
for(const result of organic_results) {
urls.push(result.url)
}
return urls
})
return results
}
catch(e) {
console.log(e)
}
}
async function getWordCounts(urls) {
try {
let wordCountArray = []
for (const url of urls) {
const browser = await puppeteer.launch()
const page = await browser.newPage()
await page.goto(url)
let totalWordCount = await page.evaluate(() => {
const elems = Array.from(document.querySelectorAll('h1, h2, h3, h4, h5, h6, p'))
let wordCount = 0
for (const el of elems) {
let elWords = el.innerText.split(" ")
let elWordCount = elWords.length
wordCount += elWordCount
}
return wordCount
})
wordCountArray.push([url, totalWordCount])
browser.close()
}
return wordCountArray
}
catch(e) {
console.log(e)
}
}
async function getDateString() {
try {
const date = new Date();
const year = date.getFullYear()
const month = `${date.getMonth() + 1}`.padStart(2, '0')
const day =`${date.getDate()}`.padStart(2, '0')
return `${year}${month}${day}`
}
catch(e) {
console.log(e)
}
}
async function writeCSVFile(keyword, location, fileDate, data) {
try {
fs.writeFile(`${keyword.replaceAll(' ', '_')}_${location}_${fileDate}.csv`,
data.map(function(v){ return v.join(', ') }).join('\n'),
(err) => {
if (err) throw err
console.log(`"${keyword}" complete!`)
}
)
}
catch(e) {
console.log(e)
}
}
async function run(access_key, keywords, location) {
try {
for (const keyword of keywords) {
let params = {
access_key: access_key,
query: keyword,
gl: location
}
let urls = await getUrls(params)
let data = await getWordCounts(urls)
let fileDate = await getDateString()
await writeCSVFile(keyword, location, fileDate, data)
}
}
catch(e) {
console.log(e)
}
}
run(access_key, keywords, location)
How to use it
The first thing you’ll need to do is create a SerpStack account—they have a forever free tier which offers 100 API queries p/month, which is the main reason I opted to use them. That said, there are plenty of alternative SERP API products to choose from, all of which work similarly and the script can be easily changed to plug in whichever vendor you decide to use.
Once you’ve setup your account, you’ll need to navigate to your dashboard and copy the API key—referred to as “access_key”—ready to paste into the script. Along with that, you’ll also want to set your location for your search, using the country’s two-digital ISO alpha-2 code (ie. US, UK, CA, etc.), and of course, add your list of keywords. This is all done in the follow part of the code, with the relevant snippet included below for reference:
const access_key = `XXXXXXXXXXXXXXXXXXXX` //SerpStack API key
const keywords = [
`keyword 1`,
`keyword 2`,
`keyword 3`,
]
const location = `US`
You’ll also need to install the required dependencies—these being Axios, Puppeteer, and FS. You can do these manually using the command line or by downloading the package.json for the script here, using the command “npm install” to do so automatically.
Once that’s all done—open up a terminal and navigate to the folder where you put the project the use the command “npm start” to run the script. The outputted CSV files will be saved to this same folder, so once the script is complete, simply navigate there to open or import these files into your spreadsheet software of choice.
Summing it up
That’s it really—the point of this post was to answer a very specific question on how to get word count of ranked competitors programmatically, and this is exactly what the script provided above does.
While this wasn’t my idea, as I didn’t pose the question, I may add more functionality to it, in particular implementing a couple of extra functions that can help to identify thin content, as this is a topic that could be useful to most people—and a good follow up from my “automate the alphabet soup method” post from a month or so ago.
Well that’s all for now. If you have any questions or comments on the post, whether improvements, suggestions, or just help getting it up and running, feel free to drop these in the comments below and I’ll do my best to respond. As always, thanks for reading!
I’m definitely going to have to learn how to use node. Thanks again.
If you’re completely new to coding, I’d go the Python route—it’s way easier!