It’s very important to know that the “Disallow” command in your WordPress robots.txt file doesn’t function exactly same as the
noindex meta tag on a page’s header. Your robots.txt blocks crawling, but not necessarily indexing with the exception of website files such as images and documents. Search engines still can index your “disallowed” pages if they’re linked from elsewhere.
So Prevent Direct Access Gold no longer uses robots.txt Disallow rules to block your website pages from search indexing. Instead, we make use of
noindex meta tag which also helps Google and other search engines distribute its inbound links value for your content across your website correctly.
What to include in your WordPress robots.txt?
Yoast suggests keeping your robots.txt clean and not blocking anything including any of the following:
WordPress also agrees saying the ideal robots.txt shouldn’t disallow anything at all. As a matter of fact, the
In short, disallowing your WordPress resources, uploads and plugins directory, which many claim to enhance your website’s security against anyone targeting vulnerable plugins to exploit, probably does more harm than good especially in terms of SEO. You shouldn’t install those plugins in the first place.
That’s why we’ve removed these rules from your robots.txt by default. However, you might still want to include them anyway with our WordPress Robots.txt Integration extension.
While Yoast also highly recommends that you manually submit your XML sitemap to Google Search Console and Bing Webmaster Tools directly, you may still want to include a
sitemap directive on your robots.txt as a quick alternative guiding other search engines where your sitemap is.
Sitemap: http://preventdirectaccess.com/post-sitemap.xml Sitemap: http://preventdirectaccess.com/page-sitemap.xml Sitemap: http://preventdirectaccess.com/author-sitemap.xml Sitemap: http://preventdirectaccess.com/offers-sitemap.xml
Block access to Readme.html, licence.txt and wp-config-sample.php files
Security-wise, it’s recommended that you block access to your WordPress readme.html, licence.txt and wp-config-sample.php files so that unauthorized people won’t be able to check and see which version of WordPress you’re using.
User-agent: * Disallow: /readme.html Disallow: /licence.txt Disallow: /licence.txt
You may also use robots.txt to block specific bots from crawling your website content or specify different rules for different types of bots.
# block Googlebot from crawling the entire website
User-agent: Googlebot Disallow: / # block Bingbot from crawling refer directory User-agent: Bingbot Disallow: /refer/
This is how to you can stop bots from crawling WordPress search results
User-agent: * Disallow: /?s= Disallow: /search/
Crawl-delay are other robots.txt directives that you may consider using albeit less popular. The first directive lets you specify the preferred domain of your website (www or non-www):
User-agent: * #we prefer non-www domainhost: preventdirectaccess.com
The later tells crawl-hungry bots of various search engines to wait for a number of seconds before each crawl.
User-agent: * #please wait for 8 seconds before the next crawlcrawl-delay: 8