Competitor Site Audits using Free Tools & Data

Sophie Gibson Senior Technical SEO Manager

16 min read

Monday, 14th February 2022

Presented at Brighton SEO on 2nd October 2020. There are many reasons why someone might be limited on tools and/or data - perhaps you’re pitching to a prospective client that you have little to no information on, or you’re a freelancer or start-up without a big budget behind you - but whatever the reason, we’ve got a case study full of examples.

This post is going to cover how to use free tools and data to audit a competitor site from a technical point of view, how to put monetary figures on the issues found, and how to use this information to your advantage - using no enterprise solutions and only free tools and free online data.

Today, we are going to be looking at the next.co.uk website. Just a caveat, they aren’t a client (yet…. but let’s talk?) so we don’t have access to any internal data, but using easily accessible data available online, we estimated that the Next.co.uk website is missing out on potentially £314,000 worth of revenue every month.

How did we do this? The first step is to take a good look around the site - click around, go find some items and generally navigate around, as if you were a customer.

On the homepage, I noticed that the Next site has these collection banners that lead to specific trends of the time - this example is from November 2019, so this was on the run up to Christmas - a great idea, covering off some more specific searches that people might use when shopping for a party outfit. When you click into the page you get to where you’d expect, a collection page. But what is this strange URL string here?

Here you can see that the URL path has /0-homepage at the end. What happens when you take this part of the URL off? Well, you get to a very familiar page: Yes, that looks pretty much exactly the same as the main category page - they have almost the same amount of the same products, and there is only a slight difference in the on-page title. So, if these are two almost identical pages, how big of an issue is this for the site overall? Let’s dig a little deeper.

Indexing

If we want to find out if duplicate pages are causing a problem for a website, one of the first things that we want to look at is indexing. The first step is taking a look at how many pages are in the sitemap versus the number of pages being indexed overall.

To find the amount of pages in the sitemap, we can head to the robot.txt file and grab the sitemap URL. For the next.co.uk website this is a sitemap index and you can see they've split their site into a bunch of separate sitemaps for different site areas.

It’s kind of hard to figure out at a glance how many pages there are in total, but there is a way to find out.

Free Tool - Screaming Frog

You can use the free version of Screaming Frog to find out how many pages are in the sitemap. There are some limitations, for example you can crawl only 500 URLs with this, but we actually don't need to crawl the site - we just need to see how many pages are in the sitemap. You can do this by going to Mode > List Mode. Here you can either select download sitemap, or download sitemap index - as we have a sitemap index file URL, we'll select that option.

A popup box will appear, so go ahead and plug in your URL.

Now, we just need to see how many pages are in the sitemap, so we don't need to press OK because we don't actually need to crawl the site. But, it will tell you how many individual page URLs are within the sitemap index - so we’ve got a number here 194,237.

The first question is; does the sitemap roughly reflect the amount of URLs indexed by Google? So to do that, we compare with index pages using Google and search operators.

Free Tool - Google & Search Operators

The next tool we can use is on Google itself; the site: search operator, which limits the search results to those from a specific website. If we put in the URL we want to look at after the site: command (in this case site:next.co.uk), it will show us roughly how many pages are indexed.

A caveat - this data isn't super accurate here, only Google Search Console will give you a more accurate number, but it will give you a rough estimate - and currently we have zero information, so any information we do have is a bonus.

Here you can see Google has pulled up roughly 569,000 results. Does the amount of pages within the sitemap reflect the amount of URLs being indexed? Our Screaming Frog data showed us that there are 195,000 URLs in the sitemap, but 569,000 in the index. This is three times as many URLs, which is a very big difference. Now we need to ask, why is this happening?

Going back to the first potential issue we discovered - if the Next website had these category links from the homepage go to a duplicate page with /0-homepage in the URL, we can look into how much that contributes towards this large amount of index bloat by using search operators again.

This time we’re going to use the inurl: search operator, which looks for a word or phrase in the URL.

Doing this, we can see there are potentially over 4,000 pages indexed with this issue. When you consider how many pages would be linked to on the home page in general, it's going to be a tiny amount, way less than 4,000. We can then conclude that these pages could have built up over time, that the issue is widespread across the site, or both.Let's dig a little deeper - if they're getting around 4000 pages from the featured categories duplicated, where else might we find potential duplicates? Let’s have a click around and take a look - something that caught my eye was a brand section within the menu. After visiting the page, you can see that this is a single page which acts like a master list of all the different brands that they stock, and they all have the word brand in the URL.

Now, there is a big list of pages here, this is just covering brands beginning with A - how many potential duplicate pages are there?

Free Tool - Small SEO Tools Link Count Checker

The Small SEO Tools link count checker will tell you how many links are on a page, and it tells us that there are just over 1000 different pages linked on this page.

We can also see that these URLs follow a pattern, with the word brand at the beginning. We can go back to our search operators, using the inurl: modifier to find out roughly how many pages this actually affects.

Here it shows there are about 165,000 results for page URLs on the Next website with the word ‘brand’ in - that indicates brand duplication could be a big issue because when you look at how many results are out for the whole site, this is 30% of all indexed URLs.

The next step is to check our assumptions - let's look at a specific brand, for example, Mela. They have nearly 500 results for just this brand, and we can also see that only the first two actually have custom metadata; the rest look auto-generated.

If a site has a lot of index bloat, what are the potential reasons why we might have multiple categories coming up? One issue which tends to pop up quite often in e-commerce sites is how filters are handled.

Going back to our partywear page, we've got some category selections where you can filter by product type. Let’s take this jumpsuits filter for example - we can see that the URL changes, and whilst that doesn't mean anything on its own, we can use another free tool to look into this.

Free Tool - SeeRobots Chrome Extension

So we need help from SeeRobots, a Chrome extension that will show you visually at glance what the indexable state is of a particular page using coloured squares to indicate whether a page is indexable, and if it is followed, or has a nofollow index tag on the page.

Not only does the URL change, this green square from SeeRobots tells us it’s indexable - which means that the URL is also accessible and indexable for Google. How big of a problem is this? If every single category of filter has this tag, this could mean there are thousands of additional URLs being indexed.

Let's check our assumptions - here we will go back to our inurl: search operators, and use the word isort, which appears in the URL when using search filters.

Here we can see that in reality, this issue isn’t as big as it could potentially be.

Also Google states that they have no information for this page in the meta description, which indicates that these terms have been excluded in the robots.txt file - which means a large majority of these pages may not be crawled. However, if they are linked internally they do have the potential to become an issue later down the line, as Google can sometimes decide to index them anyways. Thankfully, it's not a big issue for them now.

Going back to our partywear category page, we can see the word promotion in the URL - and clicking through to the other linked category pages, all the URLs have the same structure. Let's check our assumptions - we'll go back to our inurl: search operator to find how many pages Google shows with the word promotion in, which we can see is around 25,000 pages.

Also, there is a quirky title pattern where it says “from the next UK online store”. Again, we can look to see how big of a problem this is. We need a different search operator this time, intitle:, which can search in the page title for a word or a phrase.

This returns a whopping 123,000 results. When you look at how many results there are showing from a Google site search, compared with how many URLs have this particular issue, this is 20% of all pages. This could mean that the Next site is losing out on traffic in two ways - it’s not optimised from a keyword perspective which will affect site impressions, plus it will also have a negative effect on click through rates from the search results page when the Next site does appear for a search.

Now we've got all this information about the potential issues on their site, what do we need to do? We need to show how this hits them where it hurts - right in the money. This means we need to try and attribute a monetary figure for these issues. To do this, we're going to need to find some statistics and values for these issues - this is where case studies come in.

Where can I find case studies?

Think With Google has a nice selection of case studies, so you can have a look there to see if there are any which are relevant to the issues you've found. You can also do a regular Google search for this. Because you can find a lot of pages that aren't particularly relevant, it might be useful to narrow down the selection using exact match quotes, or use search operators such as inurl: or intitle:For this example, we're going to put in "index bloat case study" as this seems to be the biggest issue facing the website.

Here I found an article from inflow, which said that this specific site had achieved a 22% increase in traffic and 7% increase in revenue from organic traffic from fixing an index bloat issue. But what does this mean for the Next site?

Free Tool - SpyFu

SpyFu is a competitor analysis tool, which gives you some great information for free - we'll be using it to find out estimated monthly SEO click values. Click on the monthly estimated clicks section, and it will pull you through to the SEO overview, where you can see the trends of how many keywords a site is ranking for.

Here you can see a drop-off in the number of keywords ranking in February vs May, we can infer that something may have changed on site to cause this drop.

We can also see that the estimated value of this organic traffic is £1.78 million.

Going back to the case study, if the Next site has a 7% increase in organic revenue, this could mean with a similar technical fix they could be making a further £124,000 a month. Another issue is the 404 page - looking around the site, I stumbled upon the 404 page, which wasn't great - there was no menu on the page to navigate to a different category page, and frustratingly the logo at the top didn't actually take you to the homepage.

There are just three static links for category pages on the page. There are so many things you could do with the 404 page to make it a much better experience - could this be an issue? Let's find out!

We're going to need to find some statistics - this study by ImpactBND showed that only 22% of visitors that encounter 404 pages will make a second attempt to find the page, which means 78% of traffic is lost here. How can we find out how many people might reach a 404 page when we've not got internal data? Free Tool - SimilarWebSimilarWeb will give you information on where roughly a site sits in comparison with other sites in the same sector and data on website traffic overviews.

Here we can see that the Next site has an estimated total visits of 26.3 million people. If even only 1% of the traffic gets taken to the 404 page from things like broken links, that's potentially 260,000 visitors that you could be losing here. Going back to our statistics from earlier, if we're losing 77% of people here, that would be just over 2000 visitors that are lost every month.

Looking at some additional statistics, we can see that the industry average conversion rate for fashion eCommerce sites is around 2.35%, so we can then estimate that there are potentially 4,700 sales lost each month roughly. Looking at the general price of the products themselves, we can take an estimate of about £30 for the average order value, so this comes to a gigantic £141,000 per month that they could potentially be losing out on if these people reach a 404 page.

Free Tool - Google Tag Assistant

Another common issue amongst larger sites is third-party script bloat or tracking issues - this is where the Google Tag Assistant Chrome extension can help. It will tell you how many tracking tags are on the page and how they have been implemented.

According to this, the Next site had non-standard implementation across the board, and site tags were firing four page requests each when there should only be one. Looking at other pages, not just the homepage, we can see that the category page has 13 different tracking tags on it, including multiple analytics and Google Tag Manager codes, many doing the same thing. This can cause inaccurate analytics data, which tends to affect bounce rate and page view data the most, but can cause wide ranging attribution problems too.

Free Tool - SEMrush

With SEMRush, you need to create an account to get 10 free checks per month, but here you can see that 70% of traffic has zero referral data at all, going into the direct bucket.

This means that potentially they have 11 million sessions without attribution. Again, using statistics from case studies, we can estimate how much this is affecting Next. We found this study, which found that every script added to a page increases load speed by 34.1 milliseconds. If there are 13 tracking scripts on the category pages, that is 442 milliseconds in total. If we are able to reduce the number of tags on the page by 50%, we are still looking at a 221 millisecond reduction in page speed. But what does this mean in terms of revenue? Let's find a case case study

There are a few caveats to keep in mind with these figures, as we are making A LOT of assumptions.

It's not going to be that accurate when compared with the real data, case studies tend to only showcase successful benefits from interventions, so you can't be sure you'd see the same level of improvements from a similar fix. You also can't trust all large scale studies as they can not account for bias and their methodologies tend to have at least some flaws.

Plus, it's not technically a loss of revenue, more that it's potential revenue that they are currently missing out on. But that being said, it's still a really good way to tell a story, and illustrate what money could be left on the table - which is what the stakeholders are really going to care about.

But what can I actually do with this information?

It's all well and good finding these issues - but what can you do with this information? Firstly, you can bolster your client pitches. Nothing gets attention more than telling someone how much money they are potentially losing out on.

It shows you're able to find problems - and if you can figure out what issues the client might be having from external data sources, imagine what you'd be able to do with actual access to internal data. You can also use this data to make assumptions on their internal processes or timing. From our earlier data found via SpyFu, we can say something happened around mid February, because we saw a drop in keywords.

From this, we can also draw some conclusions about their internal processes. Because lots of duplicate homepage categories are being created and they've got automated metadata across a large proportion of the site, we could infer that perhaps the marketing team isn't speaking to the SEO team - and if this is something that you could help them with, you can then take these assumptions and give additional recommendations or add in extra services to a pitch. Being able to pick up on these can make a client feel like you really understand how they work and what struggles they are facing, which can swing a pitch in your favour.

You can also use this information to convince your boss or client to give you some extra resources, whether that's budget or development time, just because you can say "this particular issue is costing £100,000 per month"

Top Takeaways

You don't need an arsenal of paid tools to be able to audit larger or enterprise level sites.
Don't be intimidated by the size of a site - take it section by section. Check the common areas where issues crop up in your sector or vertical.
You are able to find a lot of information on a website without actually having access to their data.
You can find a lot of issues just from clicking around a website.
Once you've found the one main thread, it's much easier to find related issues from the same area.
When you're looking at large websites, don't worry about getting it all 'right' - rough and ready with considered insights can get you far.
Hit 'em where it hurts - their pocket. Putting a monetary value on issues can get you buy-in from management during pitches, and also when it comes to getting technical fixes implemented if you win a client. But don't pluck figures out of thin air, show your working for clarity.
To go even further, give benefits and drawbacks for leaving and fixing these issues to really convince a client.
If you can draw some conclusions from the information you find, you can add services and it shows you have an understanding of what a client’s business needs are.