Don’t want to be in Google?

Do you have a web page or part of a web site that you’d like to keep hidden from Google (and other prying web crawlers and robots)?

The main way that Google finds stuff on the web is with a “web crawler” called Googlebot. Googlebot is a program that trundles around the web, reading pages and following links - mapping the web as it goes. Other search engines have similar tools.

So if you want to stay hidden on the web a solution to your problem could be to put the brakes on Googlebot and its mates.

robots.txt

Your first option is to put a file called robots.txt in the root of your web site. That is, the URL will be http://www.yoursite.com/robots.txt.

The simplest entry to put into this file is:

User-Agent: *
Disallow: /

This stops Googlebot and other crawlers from poking around in your site. See the Robots.txt Tutorial for more information.

The Robots Meta Tag

You can also lock-out Googlebot on a page-by-page basis using the robots meta tag. Place the following tag in the section of your page:

< meta name="ROBOTS" content="NOINDEX, NOFOLLOW" />

Well behaved bots will not index this page or follow any links on the page.

Cache-flow problem?

Google also keeps a backup store or cache of web pages. This is often useful when the page you’re after is out of action for awhile, if the host site is temporarily offline for example. But if you want your pages kept out of the Google cache, it’s meta tags again:

< meta name="ROBOTS" content="NOARCHIVE" />

Is this what you really want?

Think carefully before you cut yourself off from the search engines, but if you’re sure you want to go it alone then these methods should see your site left in peace.

More information

First published: PC Update August 2004 (online version updated)