Interested in our FREE Tech SEO Course? GO NOW
Improving Your Site’s Crawlability and Indexability
In the next few videos, I’m going to walk you through the following topic areas to improve your Crawability and Indexability. First, I’ll cover Crawl Budget then Internal Linking and URL Management from there HTML Tags, Structured Data and Schema Markup, Information Architecture and UX and finally, Mobile Friendliness, Loading Speed and Data Security.
So let’s start with the first item: Google Crawl budget. To understand this concept, we need to first understand a more basic one: How does Googlebot actually conduct a crawl? And that’s pretty simple. It takes a pre-existing list of page URLs—coming from previous crawls and added to from various other sources—and proceeds to visit each one of them looking for links. As Googlebot visits each page, if it detects a link, it adds it to a list to be crawled later. Granted this is an overly simplified explanation. There are a lot more factors that go into which pages get crawled, how often they do, how many pages from a given site get crawled and how Google interprets the contents of those pages. But it should do.
Ok, so back to what Crawl Budget is. The web is massive and Google, while also huge, is comparatively small. While it might be hard to conceive given how big they are, they do have limited resources when compared to the millions of new webpages being published everyday. So to tackle this challenge of indexing the entire web, Google sets limitations on Googlebot’s interactions with websites. These are called “Crawl Budgets”. Google establishes Crawl Budgets on a site-by-site basis. Crawl Budgets, as stated, make the best use of Google’s resources, as well as those of the site owner. For the owner, the Crawl Budget is meant to not overtax the site’s web server. They do this by setting a rate for how many requests per second Googlebot will make to that site limiting the necessary processing power. Clearly Crawl Budget is one of the most important SEO concepts you need to know. Understanding your Crawl Budget, and how to work within its confines, is critical if you want to excel.So let’s look at a high level definition of this concept and its various component parts. Crawl Budget is defined as: The maximum number of pages that Google will crawl on a given website. In essence, it dictates how deep the Google crawler will go beyond the homepage on a given site. And, as I already mentioned, that number varies from site-to-site.
Crawl Budget is broken down into two basic things: Crawl Rate Limit and Crawl Demand.
The first, Crawl Rate Limit, is that aforementioned rate: how many requests per second Googlebot will make to your site when crawling it. It governs: how many simultaneous connections Google will use to crawl your site and the amount of time between requests. In other words, how long it’ll wait on a given page before moving on to another page. Page loading speed, a topic we’ll cover later, can greatly influence this. The speed of your pages can determine how much of your site gets crawled and indexed. If your webpage loads quickly, then more of its content will get crawled and indexed. So in practice Crawl Rate is determined by how quickly Google thinks it can reasonably crawl any page on your site.
The second component, Crawl Demand, is basically how much Google wants to crawl your pages. If your pages are popular, Google will want to crawl them more. They will feel greater urgency to keep their index for those pages fresh so people will come back more often. Bottomline, Google must prioritize based on what delivers the most value to the most users. This is essentially crawl prioritization. And that means, Google will visit popular pages more often. But if there’s little or no demand for a given webpage. It averages low traffic. Or the content is pretty static. Google will de-prioritize it and focus, instead, on more popular ones. And subpages, usually even lower in priority, will be neglected the most. If they aren’t crawled then of course, they can’t be indexed.
So factoring in the two things that go into it—Crawl Rate and Crawl Demand—Crawl Budget will profoundly determine how your site shows and ranks in Google. While the concept of Crawl Budget seems simple, in truth, it’s not. In fact, there’s a whole section of Tech SEO devoted to solving challenges related to it. However, it is something you can influence given the right strategy. The bottom line is, we want to steer Googlebot to make the very best use of the existing Crawl Budget. Expending it on our highest quality pages. Those paramount to SEO. And get more of our content indexed. This is crawl optimization. That is, over a period of time, get more and more of your important pages properly crawled and indexed. And, through that, improve the rankings for those pages. And pages with better rankings are crawled more frequently, which in turn, brings benefits. This is why we talk about all these SEO factors being complementary, an ecosystem. Improving one area can lead to improvements in others, resulting in the creation of a virtuous cycle of overall improvement.
Lastly, to improve your popularity score, you of course, want to solicit backlinks to your pages. This topic is a whole tutorial in and of itself and won’t be covered in-depth in this series. But if you’re interested in diving into that topic, check-out my series on link building.
So back to the topic at hand: crawl optimization and ways we might tackle it.
First let me say that a choice to implement one solution over another is never done in a vacuum. It is always in the larger context of your business, your goals, the technologies and designs employed and so on.
That said, let’s look again at what we ultimately want to accomplish.
#1, Steer Googlebot to make the very best use of the existing Crawl Budget. And #2, accomplish #1 by getting Googlebot through your site most efficiently and incentivizing it to come back more frequently. So if the success of #1 is doing #2 well, then what measures can we take to help with #2.
The first is organizing our sites strategically so Googlebot hits the pages we want it to hit and avoids the pages that would waste its time. Keep in mind, your internal link structure should support, first and foremost, your most important pages and make it easy for Google to get to any page on your site in a minimum of clicks.
Next on our crawl optimization list, for those pages that aren’t important to SEO, it’s critical to block them in your robots.txt file. Or alternatively, set meta robots, like “noindex” and “nofollow”. This keeps them from being crawled and indexed and focuses Google on your important pages. The unimportant pages usually include: log-in pages, contact forms, error pages, you get it, those kinds of pages. So after we block the unimportant pages, we want to offer an XML sitemap that includes the most important pages, excluding those ones you’ve already blocked. This last part is key because people make the mistake all the time—they block a page in robots.txt but then include it in their XML sitemap. And that’s a big mistake, it gives Google inconsistent signals What you’re doing with these two actions—blocking the unimportant pages with robots.txt and highlighting the important ones with the XML sitemap—is focusing Google’s interaction with your site. It signals to Google where the important SEO content resides and makes good use of crawl budget. One note here, be sure when defining what constitutes your “unimportant” pages, to look at the site holistically. There might be ones that you think are essential and wouldn’t otherwise think to omit from the crawl until you view them through the lens of: the total volume of pages on your site and your business goals.
Ok, so next on our crawl optimization list is optimizing your page speed. A topic we get into later on. And after that, content freshness. Affecting your site popularity directly in SEO can be elusive, but content freshness? Well, that’s something we can affect directly. When content is fresher it prompts Google to re-prioritize a page from the perspective of crawl demand, which can lead to it being re-crawled and indexed, which, in turn, can lead to: increased organic search visibility, which can lead to new users discovering the page, which can, lead to increased popularity, which can, prompt other sites to link to it. See how that works. So content freshness is vitally important to ranking. Each of these topics I just went through, I will cover in greater detail later in this series.