You may or may not have heard of the term “Crawl Budget” but just know that it is affecting your site’s SEO. So what is it? To understand that concept, we need to first understand a more basic one: how Googlebot actually conducts a crawl, which is pretty straightforward, it takes a pre-existing list of page URLs—coming from previous crawls and added to from various other sources—and proceeds to visit each one of them looking for links. As Googlebot visits each page, if it detects a link, it adds it to a list to be crawled later. Granted, this is an overly simplified explanation but it should do. Now back to Crawl Budget…
The web is massive and Google, while also huge, is comparatively small. While it might be hard to conceive given how big they are, they do have limited resources when compared to the millions of new webpages being published everyday. So to tackle this challenge of indexing the entire web, Google sets limitations on Googlebot’s interactions with websites. These are called “Crawl Budgets”
Google establishes Crawl Budgets on a site-by-site basis. Crawl Budgets, as stated, make the best use of Google’s resources, as well as those of the site owner. For the owner, Crawl Budget is meant to not overtax their site’s web server. Google does this by setting a rate for how many requests per second Googlebot will make to that site, limiting the required processing power.
So clearly Crawl Budget is one of the most important SEO concepts you need to know. Understanding your Crawl Budget, and how to work within its confines, is critical if you want to excel.
Let’s dig a little deeper with the Crawl Budget concept and its various component parts.
Crawl Budget is defined as the maximum number of pages that Google will crawl on a given website.
In essence, it dictates how deep the Google crawler will go beyond the homepage on a given site and, as already mentioned, that number varies from site-to-site. Crawl Budget is broken down into two basic things:
1. Crawl Rate Limit
The first, Crawl Rate Limit, is that aforementioned rate: how many requests per second Googlebot will make to your site when crawling it. It governs how many simultaneous connections Google will use to crawl your site and the amount of time between requests. In other words, how long it’ll wait on a given page before moving on to another page. Page loading speed, a topic we’ll cover later, can greatly influence this. The speed of your pages can determine how much of your site gets crawled and indexed. If your webpage loads quickly, then more of its content will get crawled and indexed. So in practice, Crawl Rate is determined by how quickly Google thinks it can reasonably crawl any page on your site.
2. Crawl Demand
The second component, Crawl Demand, is basically how much Google wants to crawl your pages. If your pages are popular, Google will want to crawl them more. They will feel a greater urgency to keep their index for those kinds of pages fresh. Bottomline, Google must prioritize based on what delivers the most value to the most users. This is essentially crawl prioritization which basically means Google will visit popular pages more often. in other words, if there’s little or no demand for a given webpage—it averages low traffic or the content is pretty static—Google will de-prioritize it and focus instead on more popular ones. And subpages, usually even lower in priority, will be neglected the most. If they aren’t crawled then of course, they can’t be indexed.
So factoring in the two things that go into it—Crawl Rate and Crawl Demand—Crawl Budget will profoundly determine how your site shows and ranks in Google.
While the concept of Crawl Budget seems simple, in truth, it’s not. In fact, there’s a whole section of Tech SEO devoted to solving challenges related to it. However, it is something you can influence given the right strategy.
The bottomline: we want to steer Googlebot to make the very best use of the existing Crawl Budget, expending it on our highest quality pages—those paramount to SEO—and get more of our content indexed. This strategy is Crawl Optimization.
That is, over a period of time, get more and more of your important pages properly crawled and indexed and, through that, improve the rankings for those pages. Pages with better rankings are crawled more frequently, which in turn, brings benefits. This is why we talk about all these SEO factors being complementary, an ecosystem. Improving one area can lead to improvements in others, resulting in the creation of a virtuous cycle of overall improvement.
So back to the topic at hand: crawl optimization and ways we might tackle it. First let me say that a choice to implement one solution over another is never done in a vacuum. It is always in the larger context of your business, your goals, the technologies and designs employed and so on. That said, let’s look again at what we ultimately want to accomplish:
1. Use Crawl Budget Most Efficiently
Steer Googlebot to make the very best use of the existing Crawl Budget.
2. Steer Googlebot and Incentivize It to Come Back
Accomplish number one above by getting Googlebot through your site most efficiently and incentivizing it to come back more frequently.
So if the success of number one above is doing number two well, what measures can we take to help with two?
Here are four:
1. Smart Site Organization
The first is organizing your site strategically so Googlebot hits the pages your want it to hit and avoids the pages that would waste its time. Keep in mind, your internal link structure should support, first and foremost, your most important pages and make it easy for Google to get to any page on your site in a minimum of clicks.
2. Draw Attention to Your Top Pages & Weed Out the Unimportant Pages
Next on our crawl optimization list, for those pages that aren’t important to SEO, it’s critical to block them in your robots.txt file. Or alternatively, set meta robots, like “noindex” and “nofollow”. This keeps them from being crawled and indexed and focuses Google on your important pages. The unimportant pages usually include: Log-in pages, Contact forms, Error pages etc. So after we block the unimportant pages, we want to offer an XML sitemap that includes the most important pages, excluding those ones you’ve already blocked. This last part is key. People make the mistake all the time—they block a page in robots.txt but then include it in their XML sitemap and that’s a big mistake because it gives Google inconsistent signals. What you’re doing with these two actions (blocking the unimportant pages with robots.txt and highlighting the important ones with the XML sitemap) is focusing Google’s interactions with your site. They signal to Google where the important SEO content resides and make good use of crawl budget. One note here, be sure when defining what constitutes your “unimportant” pages, to look at the site holistically. There might be ones that you think are essential and wouldn’t otherwise think to omit from the crawl until you view them through the lens of: crawl budget, the total volume of pages on your site and your business goals.
3. Page Loading Speed Optimization
Next on our crawl optimization list is optimizing your page speed. Here are few ways to tackle that challenge:
- Prioritize Your Visible Content—Optimize those high priority page components to load first (e.g. the above-the-fold components, those items the user sees first). This tactic is known as progressive loading, which is just another way of saying, focusing the loading priority on the above-the-fold content, and then loading the below the fold stuff afterward. This gives the user the illusion of a quick loading page which serves to reassure them and, overall, improve bounce rates.
- Optimize Your Images—Another important area to look at when optimizing page speed is your images. First and foremost, make sure your images are the right size. Believe it or not, we see pages all the time using huge images when a much smaller size will work just as well. This should be the lowest of low hanging fruit. So compress and resize your image dimensions to fit the need. This can save a ton of page weight.
- Minify Your Code—What does it mean to minify code? Think of it this way, those elements that aren’t critical to running the software program, like comments, unnecessary lines, spaces etc, can actually contribute to overall size of the codebase, making it bigger and take longer to run. To improve this situation, you can minify your code which is the process of removing all those extra things and shrinking it down to reduce loading time substantially.
- Leverage Browser Caching—Another way to improve page speed is to leverage browser caching which basically just means instructing the browser to collect some of your page files from the browser cache instead of requesting them from the server every time they’re needed. We won’t go into depth on the technical aspects of how you do this but the quick answer is: you implement a piece of code that tells the browser to collect certain previously downloaded files from the local drive instead from over the network, and because those items aren’t being collected from the server, they come much faster which boosts the overall speed of your page. Now that’s a very simple explanation but know that this is a good technique for improving page speed so we recommend looking into it further or discussing it with your web developer.
- Reduce Your Overall Page Delivery Time—There are outside resources you can use to speed up your page loading, services like Content Delivery Networks (CDN) or Google AMP (an acronym for Accelerated Mobile Pages).
4. Content Freshness
Affecting your site popularity directly in SEO can be elusive, but content freshness? Well, that’s something we can affect directly. When content is fresher it prompts Google to re-prioritize a page from the perspective of crawl demand which can lead to it being re-crawled and indexed which, in turn, can lead to increased organic search visibility which can lead to new users discovering the page which can lead to increased popularity which can prompt other sites to link to it. See how that all works? An ecosystem. Remember, content freshness is vitally important to ranking.