The Complete Guide to WordPress Robots.txt Optimization

Creating a brand new WordPress site requires paying attention to tons of things, from site design to content strategies. Still, all your effort put into the website will fail to bear fruit if you don’t optimize it for SEO correctly.

One of the most critical SEO factors comes to robots.txt. It helps guide search engines to crawl your WordPress site so your content can quickly rank on search engines.

Although WordPress already auto-creates a robots.txt file for each of your pages and posts, you need to optimize it to level up your website’s SEO. In fact, optimizing WordPress robots.txt is not as easy as you think. Making too many changes to the file may harm your site SEO.

Our article today will show you 2 ways to create your WordPress robots.txt file and how to optimize it effectively. But before digging into the step-by-step guide, let’s discuss what a robots.txt file really is, why you need one for your site, and some WordPress robots.txt examples.

What Is a WordPress Robots.txt File?

Robots.txt refers to a text file created to let search engine bots know which of your WordPress site content they can/can’t crawl and index. You can use it to block search engine access to certain WordPress files and folders.

The robots.txt file is located in your WordPress root directory. You can find the robots.txt file of any website by typing ‘domain.com/robots.txt’ on the search engines.

 pda-robots-txt-example

Here is the format of a robots.txt file:

User-agent: [user-agent name]
Disallow: [URL string not to be crawled]
Allow: [URL string to be crawled]

User-agent means the search engine that can read your robots.txt file. If it’s marked with an asterisk, your file can be read by all search engines. It’s possible for you to specify certain search engines such as Google, Bing, or Ask.com.

You can even pick specific bots of a search engine to crawl and index your site. For instance, ​​Google has different bots of news, images, and videos, including Googlebot-News, Googlebot-Image, Googlebot-Video.

Allow and Disallow tell the bots which content they can and cannot crawl. If you want the search engine to crawl all content on your site, simply add a slash (/) after ‘Allow.’

Importance of Robots.txt File in WordPress Site

Without a robots.txt file, search engines still crawl and index your WordPress website. Still, it includes pages that should be hidden from the public, Thank You page, the Admin login page, or the Author archive pages, just to name a few.

A well-structured robots.txt file gives you a helping hand in optimizing search engines’ crawl resources. You can direct them to focus on essential pages that bring more value to your business rather than wasting time on pages that shouldn’t be indexed.

You can also save your crawl quota thanks to an optimized robots.txt. A crawl quota, aka a crawl budget, defines the number of pages a search bot can crawl in a session. If you don’t list which pages and files bots mustn’t crawl, it’ll keep crawling and showing them on search results.

Should You Use Robots.txt to Hide Content?

The answer is NO. Robots.txt is not an ideal solution to control no index pages. You may think that if you disallow a page or post, that content will no longer appear on result pages. Instead of telling search engines not to index your content, it simply blocks bots from indexing.

Consequently, it still gets indexed when your content is linked to another site. Check out other myths about WordPress robots.txt file here.

We recommend using other methods to discourage search engines from indexing your content. You can either add a meta no-index tag to HTML pages or password-protect content with the PPWP Pro plugin. You should also make use of the PDA Gold plugin to prevent search engines from indexing your private media files.

What Content to Disallow or Noindex

As a webmaster, you can use robots.txt to hide the readme.html file block malicious attacks. Your WordPress version is available there, making it easier for hackers to find vulnerability holes and attack your site.

You may consider using robots.txt to hide low-quality content like category and tag pages on your blog. It’s is not a smart decision, though.

Plus, don’t try to disallow our WordPress login page, admin directory, or registration page. WordPress already creates a no-index tag for them automatically.

How to Create and Submit a Robots.txt File in WordPress

You can take two different paths to generate a robots.txt file for your WordPress site. You can add one manually using FTP or make use of a WordPress plugin. Let’s start with the simpler method first.

#1 Edit Robots.txt File Using Yoast SEO Plugin

Yoast, without a doubt, is the most popular SEO plugin for WordPress sites at the moment. With over 5 million active installations, it wins the first position in the SEO niche.

Aside from main SEO improvement tasks such as canonical URLs, XML sitemaps, and ​​SEO analysis, it helps create a robots.txt file for your site as well.

Upon installation and activation, you can go to SEOTools in your admin menu. Now scroll down and find the ‘File editor’ option. Click here and you’ll see a ‘Create robots.txt file’ button for you to customize your robots.txt file.

   pda-robot-txt-wordpress-yoast

You can modify your robots.txt file by adding or removing rules. Once done, remember to save all your changes.

#2 Edit Robots.txt file Manually Using FTP

You can easily create the WordPress robots.txt file via FTP. All you need to do is connect to your WordPress hosting account using an FTP client. After that, find the robots.txt file in the website’s root folder. In case you don’t have it, create one by right-clicking and choosing ‘Create new file.’

 pda-create-new-robot-file-ftp

Next, download the file to your computer and edit it with a plain text editor, for example, Notepad or TextEdit. Now, re-upload it back to your website’s root folder.

How to Optimize Your WordPress Robots.txt File

You’re able to customize your robots.txt file in various ways. You should create a sitemap and add it to your robots.txt so search engine bots can easily crawl your content. While some site owners want to disallow a specific folder or file, a few others prefer blocking access to the entire site.

Include a Sitemaps in Your Robots.txt File

Your sitemap will contain all the web pages you want crawlers to find and crawl. It proves effective in prioritizing the content that should appear on search engines. A website sitemap can be found at ‘domain.com//sitemap.xml.’ Learn how to optimize your WordPress sitemap to boost SEO here.

When the sitemap is ready, simply add it to your robots.txt file.

User-agent: *
Sitemap: http://www.example.com/post-sitemap.xml
Sitemap: http://www.example.com/page-sitemap.xml

Restrict Access to the Entire Site

Blocking search engine bots from crawling your whole website sounds weird in most cases. But for development sites, it comes in handy. Your site won’t get indexed and appear on search results.

User-agent: *
Disallow: /

‘User-agent: *’ apply this rule to all search engines while ‘Disallow: /’ means that you disallow their access to all your pages under that same domain.

Prevent a Specific Bot from Crawling Your Site

You want to block Bingbot and some other unimportant search engine bots to save the crawl budget. To achieve that, simply replace the *asterisk with ‘Bingbot’ and you’re done.

User-agent: Bingbot
Disallow: /

Block or Allow Access to a Certain Folder or File

You have a private upload folder consisting of clients’ files and credential information so you don’t want crawlers to find them. Another common example is blocking access to the entire wp-admin folder or wp-login.php. Then, your robots.txt file will look something like this:

User-agent: *
Disallow: /wp-admin/
Disallow: /wp-login.php
Disallow: /wp-content/uploads/_pda/*

What if you plan to restrict a folder but wish bots to crawl a specific file inside it, you can use this sample robots.txt file.

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php

Create Different Rules for Single Bots

“Can I set block a folder access in Bing but still allow crawling it on Google search?” What such a simple task! You just need to add a different set of rules for each User-agent.

User-agent: *
Disallow: /wp-admin/
User-agent: Bingbot
Disallow: /

WordPress Robots.txt Examples

Bellow are robots.txt file examples you can review to see what big websites are doing with their robots.txt file.

WPBeginner stops all search engines from reading their readme.html file, refer folder, and wp-admin folder. At the same time, it enables bots to find their upload folder, pages, posts, and deals.

The website creates separate sitemaps for pages, posts, and the deal post type. There is a hosting sitemap included in the robots.txt file too.

 pda-wpbeginner-robots-txt

TechCrunch robots.txt is a bit different. It blocks bots from crawling search result pages by adding the ‘Disallow: /search/’ rule.

 pda-techcrunch-robots-txt

How to Test Your WordPress Robots.txt File?

It’s necessary to check your robots.txt file to make sure they work correctly. You can use the Google Search Console tool to test your file.

After connecting your website linked with Google Search Console, open ‘robots.txt tester’ in the Search Console menu, under the Crawl section. Now select your website from the ‘Please select a property’ dropdown menu.

pda-test-robots-txt-wordpress

It will then show you a robots.txt file with the highlighted errors and warnings. Before hitting the ‘Submit’ button, you should fix all errors and warnings highlighted by the tool.

Optimize WordPress Robots.txt Properly

We all understand the importance of the robots.txt files in telling search engines bot what they can or cannot crawl on your site.

Above is the complete guide on how to create a robots.txt file for WordPress sites using Yoast SEO plugin and FTP client. We’ve also shown you options to optimize the robots.txt file and some robots.txt examples of big websites.

Once submitting your robots.txt, you need to test it to ensure there are no crawling errors. Check out our blog for more useful WordPress resources.