It is easy to become overwhelmed by the sheer size of the Internet, especially when we do a web search and are told that two million pages meet our requirements. Of course, few of those will be useful.

How big is the web?

A typical search engine now claims to index more than a billion documents. Of those, many will be duplicates or obsolete. It's still a huge number, but the truth is that the web is actually much bigger than that. Enthusiasts for what is called the "Hidden" or "Invisible" web insist that it is either 10, 100 or 500 times the size of the "normal" web.

The "invisible" parts of the web are those which cannot be "crawled" or automatically indexed by the search engines. There are various reasons why that might be so. They might not be accessible by links from outside. They might use types of file that can't be read by the search engines. Until recently, that included Acrobat files, known as PDFs. Some of these contain extremely useful and important documents, and it is good news that Google and AllTheWeb searches will now include them.

Still more documents are not available because they are held in databases and can only called up or generated if you actually ask for them from the site itself. The search engine can't do that, but you can, if you know where to look. There is a useful overview of the topic by Chris Sherman at Search Engine Watch.

The numerous competing "Invisible" or "Hidden" web tools are really offering access to those databases. Most now have a searchable front-end, but they don't offer a search of the database contents, only of their names and other descriptive text.

The outright champion here is Gary Price's Direct Search. This is an unsophisticated-looking site but a rich source of all manner of information. It has now spawned a string of additional pages and has acquired a search box that has enhanced its usefulness. (Of course, you can always use the command-F (Mac) or ctrl-F (Windows) on your keyboard to keyword search a large page.) Be aware that it is sometimes difficult to connect, probably because of the site's popularity.

More commercial versions of the unindexed web are available through various rival operations, of which the best known is perhaps Complete Planet. Complete Planet is the originator of the "500 billion page" version of the Invisible web, called the "Deep Web".

There is a great deal of valuable material buried in the unindexed parts of the web, but no-one should form the idea that the Invisible web can replace the part the search engines know about. It can't: learning to use search engines efficiently should come first.

CONTENTS