Search

 

 

Informative Articles

Check List for Linux Security
Check List for Linux Security Linux is an amazing operating system considering how it was originally created. It was a modest program written for one person as a hobby - Linus Torvald of Finland. It has grown into a full-fledge 32-bit operating...

Ezines, Ezines Everywhere!
There are literally thousands of ezines being published online these days. Sometimes it is hard to know which ones are best for advertising in, reading, subscribing to and so on. How do we know which ones are good and which ones...

Internet Privacy: Opting Out
Europeans sometimes seem to be much more ethical than Americans (and I am an American; well, a Californian, a state which, like Texas, sometimes views itself as a separate country), at least when it comes to privacy. This higher standard of...

Submit form conditionally - JavaScript
In the long-gone days of the early Internet, having a form on the website was tantamount to dabbling with "cutting-edge" technologies. Intricate Perl scripts and esoteric CGI scripts were required to process those forms and people used to suffer...

What To Look For In A Web Host
The first questions you have to ask yourself are: how much space and bandwidth do I need; do I want one site or several sites; how much will I have to spend; am I going to use an SQL database; how much uptime does the web host offer; does this site...

 
The proper way to use the robots.txt file

When optimizing your web site most webmasters don’t consider using the robots.txt file. This is a very important file for your site. It let the spiders and crawlers know what they can and can not index. This is helpful in keeping them out of folders that you do not want index like the admin or stats folder or content that they can not index.

Here is a list of variables that you can include in a robots.txt file and there meaning:

1)User-agent: In this field you can specify a specific robot to describe access policy for or a “*” for all robots more explained in example.
2)Disallow: In the field you specify the files and folders not to include in the crawl.
3)# the number sign represents comments

Here are some examples


Crashing On Couches To Talk To Musicians
Jason Crane of <em>The Jazz Session</em> interview podcast is touring the U.S. via Greyhound bus.

Around The Jazz Internet: May 18, 2012
Ten albums for newbies, the hated Cabaret Card and composer/arranger Gil Evans' centennial.


of a robots.txt file for redball.com

User-agent: *
Disallow:

The above would let all spiders index all content.

Here another example

User-agent: *
Disallow: /cgi-bin/

The above would block all spiders from indexing the cgi-bin directory.

User-agent: googlebot
Disallow:

User-agent: *
Disallow: /admin.php
Disallow: /cgi-bin/
Disallow: /admin/
Disallow: /stats/

In the above example googlebot can index everything while all other spiders can not index admin.php, cgi-bin, admin, and stats directory. Notice that you can block single files like admin.php.

About the Author

Jimmy Whisenhunt is the owner of VIP Enterprises