Blocking search indexing of protected files is one of the key features that Prevent Direct Access Gold offers to our customers. Here’s exactly how we do it:
- Block protected files from appearing on search results with HTTP
- Stop indexing of protected file attachment page with NOINDEX meta tag
- Prevent search engine bots from crawling your protected files with robots.txt
Let’s quickly dive into details of these methods.
#1: Use robots.txt Disallow Rules
This is probably the most common yet also misused method of all. Many users including Web developers misunderstand that using robots.txt could prevent indexing of their personal files and documents. But in fact, it doesn’t. The Robot.txt disallow rules only block search engine bots from crawling the files.
So, in case someone else links to your files from their own websites, your files would still be indexed and listed on search results.
There are also “Noindex:” directives in robots.txt which work with Google, i.e. it does block your files from appearing on Google search results. However, Google itself calls that feature “experimental” and that you shouldn’t rely on the directives.
Our Free version of Prevent Direct Access only uses this method when protecting your files, which is apparently not enough.
#2: Use Robots Meta Tag
By default, WordPress creates an attachment page for every file upload – be it an image, video or document – into the Media library. These attachment pages with relevant caption and description are supposed to improve the overall SEO of your files and website.
In case of protected files, these attachment pages actually do more harm than good. As these pages display or link to your private files, if it’s indexed, your files will be as well. That’s why our Prevent Direct Access Gold have to instruct Google and other search engines not to index and show these pages on search results.
We place this robots “noindex” meta tag on the <head> of every attachment page.
<meta name="robots" content="noindex">
Please note that Yoast SEO plugin redirects these attachment pages to the media file URLs by default as it deems these pages are not that helpful – “empty” and hardly used by many people.
X-Robots-Tag HTTP header
In case of file URL, you simply can’t place the robot meta tag in the <head> section as there is no HTML page. That’s when the
X-Robots-Tag HTTP header comes in handy:
The “none” value is equivalent to noindex, nofollow.
This method is the most advanced yet most effective method to prevent your personal and private files from appearing on Google and other search results.
You can also include these rules on your .htaccess. However, this kind of rule will apply to all PDF files compared to only protected (PDF) files as in our Gold version.
<FilesMatch "\.pdf$"> header set x-robots-tag: none </FilesMatch>
It’s even better if you use
rel="nofollow" on links to those private files:
<a href="protected-file-url.pdf" rel="nofollow">Download My Protected File</a>
Once a file is protected, our Gold version will automatically block search crawling and indexing of that file – you don’t have to do anything else. These rules will also be automatically removed when it’s unprotected.